general

AI Agents Judge Stock Picks Better Than Reddit

A developer built a multi-agent AI system using Claude Code to evaluate stock analysis posts from r/ValueInvesting, comparing AI-scored analytical merit

AI vs Reddit Upvotes: Stock Analysis Showdown

What It Is

A developer built a multi-agent system using Claude Code to test whether AI could evaluate stock analysis quality better than Reddit’s voting mechanism. The experiment scraped posts from r/ValueInvesting during February 2025, removed all upvote counts, and deployed Claude agents to score each stock pitch based purely on analytical merit - thesis clarity, risk assessment, and valuation logic.

The system created three portfolios: stocks from the most upvoted posts, stocks Claude rated highest for reasoning quality, and the S&P 500 as a benchmark. After tracking performance for a year, the results challenged conventional assumptions about crowd wisdom. Claude’s selections returned +5.2%, while Reddit’s most popular picks lost -10.8%. The S&P 500 gained +2% during the same period.

What makes this particularly interesting is that the AI’s reasoning scores remained predictive even for market data after September 2025, well beyond Claude’s training cutoff. The model couldn’t have “memorized” those market conditions, suggesting it genuinely identified analytical quality rather than pattern-matching historical outcomes.

Why It Matters

This experiment reveals a fundamental disconnect between popularity and analytical rigor in online investment communities. Upvotes measure social validation, not argument quality. Posts that confirm existing biases or tell compelling stories tend to accumulate votes regardless of their logical foundation.

Quantitative researchers and portfolio managers could benefit from this approach. Rather than manually sifting through hundreds of posts to find contrarian insights, AI agents can systematically evaluate reasoning quality at scale. The methodology filters for analytical discipline instead of narrative appeal.

The implications extend beyond stock picking. Any domain where crowds vote on technical content - code reviews, research papers, technical proposals - faces similar dynamics. Popular doesn’t equal correct. The experiment demonstrates that language models can assess argument structure, identify logical gaps, and spot confirmation bias in ways that complement human judgment.

For retail investors, the findings suggest caution around consensus plays. When everyone on a forum agrees something is brilliant, that unanimity might reflect groupthink rather than sound analysis. The most valuable insights often come from posts that challenge prevailing sentiment, which typically get buried under more emotionally satisfying content.

Getting Started

The full methodology and code are available at https://www.youtube.com/watch?v=tr-k9jMS_Vc. Developers interested in replicating this approach would need to:

Set up a Reddit scraping pipeline using PRAW (Python Reddit API Wrapper):


reddit = praw.Reddit(
 client_id="your_client_id",
 client_secret="your_secret",
 user_agent="stock_analyzer"
)

subreddit = reddit.subreddit("ValueInvesting")
posts = subreddit.new(limit=100)

Build a Claude agent system that evaluates each post against specific criteria. The scoring rubric should focus on falsifiable claims, risk acknowledgment, and logical consistency rather than persuasive language or confidence level.

Track the recommended stocks over time using a portfolio tracking API or manual logging. Compare performance against both the crowd favorites and a broad market index.

Context

This approach has clear limitations. A year of data represents one market cycle, and the February 2025 starting point may have introduced timing bias. The experiment also assumes Reddit posts contain enough information to make investment decisions, which professional analysts would dispute.

Alternative approaches exist for filtering investment ideas. Quantitative screens based on financial ratios, momentum indicators, or value factors have decades of academic backing. Hedge funds employ teams of analysts to evaluate opportunities through primary research rather than social media posts.

The real value here isn’t replacing traditional analysis but augmenting human filtering. AI agents can pre-screen large volumes of user-generated content to surface posts worth deeper investigation. They won’t replace due diligence, but they can make the initial triage more efficient than relying on upvote counts alone.