Agent-Driven Discovery
Multi-agent system that explores datasets and finds interesting insights.
$ cat story.md
I love data. I used to pull random datasets from Kaggle and visualize them just for fun. What if AI agents could do the exploration part?
The idea: point a team of agents at a dataset and let them decide what's interesting. An Explorer proposes insights, a Validator pushes back. The back-and-forth is where the magic happens.
This is a showcase project. I run it locally, publish the interesting findings as blog posts with visualizations of how the agents got there. Practice with multi-agent orchestration, monitoring, and the pandas-to-LLM pattern.
$ cat pipeline.txt
Pandas tool design, agent-callable functions
Conversation history, JSON parsing, error recovery
Persona design, structured output instructions
Multi-agent coordination, rejection loops
Structured logging, run capture for visualization
Mandatory feedback loops, structured questioning
Column validation, grounding agents in reality
Measuring agent behavior at scale, approval rates
Visualizing agent decision paths, responsive design patterns
Different Explorer personalities (Statistician, Storyteller, Detective, Contrarian)
Turn-based multi-agent discussions, consensus mechanisms, adversarial voting
SVG visualization, playback UX, animating multi-agent conversations
Cut - project goals achieved without it
$ cat current-status.md
Project Complete
Done
Project complete. Built a multi-agent data exploration system with Explorer/Skeptic pattern, multiple personas, collaborative mode, and visualization components for the portfolio.
Key learnings: mandatory feedback loops beat prompt engineering, specialized personas outperform generic ones, adversarial voting requires explicit thresholds, and SVG visualizations need explicit hex colors (Tailwind opacity classes fail).
The Narrator Agent was cut. The core learning goals were achieved, and adding commentary would have been polish without new insights.
Milestones
23 / 23$ cat agent-tools.txt
The Explorer agent has access to these data analysis tools:
Dataset structure: columns, types, row count
Summary statistics: mean, median, std, min, max
Value distribution: histogram or top categories
Pearson correlation between two numeric columns
Filter data with pandas expression, return sample
Statistical outliers using IQR method
Random sample of n rows from the dataset
$ cat exploration-path.txt
Sample exploration run showing the Detective persona analyzing earthquake data. Tool errors (red) are normal - the agent recovers and continues exploring. The Skeptic catches a hallucination mid-run when the Explorer starts discussing "budget and revenue" on earthquake data, forcing a correction.
$ diff personas.txt
Same earthquake dataset, three different personas. The Statistician focuses on distributions and correlations, the Detective hunts for anomalies, and the Storyteller looks for narratives in the data.
Select a persona to view their exploration path
$ ./roundtable --replay
Watch a collaborative exploration unfold. Six personas analyze earthquake data, building on each other's insights through 8 rounds of discussion until reaching consensus. Press play to see their thought process.
- -Multi-agent orchestration patterns
- -Agent monitoring and observability
- -Pandas-to-LLM data summarization
- -Critic/validator agent patterns
- -Persona-driven prompt engineering
- -Collaborative consensus mechanisms
$ ls ./blog/ --project="Agent-Driven Discovery"
Building a Self-Correcting Multi-Agent System
Jan 9, 2026My AI agents wouldn't stop agreeing with each other. Here's what I tried, what failed, and what finally worked.
Five Personas, One Dataset: How Different Agents Find Different Insights
Jan 9, 2026Same data, different perspectives. We built 5 Explorer personas and watched them find completely different insights from identical datasets.
Building Feedback Loops into Multi-Agent Systems
Jan 8, 2026How changing from a Validator to a Skeptic (and making it ask questions) dramatically improved insight quality in our data exploration pipeline.
Built animated visualization for collaborative mode. Shows 6 personas seated around a circular table, with discussion feed playing back events step-by-step. Includes manual navigation and auto-play modes.
Discovered that Tailwind opacity classes (bg-foreground/20) render as pure black in SVG elements. Had to switch to explicit hex colors for all fills and strokes.
Documented the collaborative mode development arc: repetition loops, folding skeptics, quality validation. Real transcript examples show the iteration process.
Fixed 4 major issues: repetition detection (80% word overlap), adversarial voting prompts, filter_expr support in tools, quality validation with correlation thresholds.
After adding explicit correlation thresholds and separate adversarial vote prompts, Skeptic and Contrarian now vote DISAGREE when proposals cite weak evidence. 4-2 votes instead of 6-0.
Built roundtable discussion system: 6 personas share context, take turns, vote on consensus. Includes repetition detection, adversarial voting, and quality validation.
Added mobile-responsive design to exploration paths. Mobile uses flattened sequential view with colored depth indicators (border thickness). Comparison UI unified to tabbed interface with clear active state highlighting. Tested on actual mobile device.
Added tabbed comparison view showing Statistician, Detective, and Storyteller exploring the same earthquake dataset. Users can switch between personas to see different exploration strategies.
Added tools sections to both Agent-Driven Discovery and AI Data Analyst project pages. Shows what capabilities each agent has: 7 pandas tools for Explorer, 4 investigation tools for AI Data Analyst.
Built CLI and React visualization showing Explorer → Skeptic decision trees. ASCII flowcharts for terminal, expandable React component for site. Includes --compare mode for side-by-side persona comparison.
Documented the personas feature with comparison results, efficiency metrics, and the hallucination catch during Detective exploration.
Detective persona started talking about "budget and revenue" on earthquake data. The mandatory questioning architecture saved us: Skeptic rejected, Explorer recovered with proper analysis.
Focused prompts lead to faster convergence. Specialized personas use 3-6 tool calls vs 15 for default. Clear "what makes a good insight" criteria helps the model focus.
Same movies dataset, different findings. Contrarian found low-budget success stories (Blumhouse, documentaries) while Statistician confirmed overall correlation. Different questions lead to different answers.
Built 5 personas: Default, Statistician, Storyteller, Detective, Contrarian. Each has unique personality traits, good/bad insight criteria, and exploration tips. CLI supports --persona all for comparison runs.
Wrote "Building Feedback Loops into Multi-Agent Systems" documenting the Validator→Skeptic journey, with before/after examples, hallucination warts, and lessons learned.
Model was talking about "budget and revenue" when analyzing network logs. Added actual column names to Skeptic prompts with instruction to reject insights referencing non-existent data. Fixed.
Movies, earthquakes, network logs, Netflix, time-series all produce legitimate, dataset-specific insights. No more hallucinations after column validation.
Renamed Validator to Skeptic, made questioning mandatory. Skeptic ALWAYS asks a question before approving. Removed rubber-stamp path. Simple change, dramatic improvement in insight quality.
When Validator challenges, insights go from shallow ("budget correlates with revenue") to nuanced ("Drama/Comedy have 0.82 correlation, Documentaries have none"). Follow-up questions work.
Despite "ALWAYS CHALLENGE" in prompts, Validator only challenged ~50% of obvious insights. Need to enforce rules in code, not prompts. Planning Skeptic refactor.
Built analyze_runs.py to measure Validator behavior at scale. Reduced first-try approval from 93% to 47% through iterative prompt tuning.
Steam dataset had broken Metacritic column (boolean, not score). Explains repetitive Price correlations. Switched to TMDB movies dataset with real numeric relationships.
Two explorations completed on Steam Games dataset. Explorer proposed insights about price/player correlation and popularity/review scores. Both approved by Validator in 4-8 seconds.
Built all 6 core modules: tools.py, llm.py, prompts.py, orchestrator.py, monitor.py, run.py. End-to-end pipeline working with structured JSON logging.
When Explorer failed to produce valid JSON, the loop continued without incrementing the attempt counter. Added explorer_failures limit to prevent runaway loops.
LLM sometimes outputs natural language instead of JSON despite instructions. Added retry logic with stronger prompts, max 3 invalid responses before giving up.
Downloaded Steam Games dataset from Kaggle: 122,611 games, 39 columns. Fixed CSV parsing issue where pandas used first column as index.
Decided: tool-based data access (agent calls pandas), local llama.cpp (Hermes 2 Pro 7B), custom Python orchestration, structured JSON logging, 3-5 rejection limit.
Started planning a portfolio chatbot but pivoted during conversational discovery. Landed on Agent-Driven Discovery: multi-agent system for autonomous data exploration.