January 9, 2026 · 9 min readAgent-Driven Discovery

Building a Self-Correcting Multi-Agent System

My AI agents wouldn't stop agreeing with each other. Here's what I tried, what failed, and what finally worked.

multi-agentllmlocal-llmprompt-engineeringiteration

My "skeptic" agent spent eight rounds questioning the data analysis, then folded completely when it came time to vote. The prompt said "be critical" but that wasn't enough. I ended up building three different feedback mechanisms before one actually worked. This post covers what I tried and why most of it failed.

The Setup: 6 Agents, 1 Roundtable

I'm building a multi-agent system where different AI personas explore datasets collaboratively. The architecture:

ROUNDTABLE PARTICIPANTS

DEFAULT

General exploration

STATISTICIAN

Rigorous, demands p-values

STORYTELLER

Finds the "so what?"

DETECTIVE

Hunts anomalies

CONTRARIAN

Challenges consensus

SKEPTIC

Questions everything

Adversarial personas highlighted

The promise: with a contrarian and skeptic at the table, weak ideas would get challenged. Bad correlations would get rejected. The discussion would converge on genuinely interesting insights.

The reality: everybody agreed with each other. The adversarial personas were adversarial in their discussion contributions, then voted AGREE when consensus was proposed.

The Iteration Loop

What follows is the story of a two-session debugging arc. The loop:

THE ITERATION CYCLE

RUN

Execute pipeline

→

READ

Analyze transcripts

→

IDENTIFY

Find failure mode

→

FIX

Code-level change

9 runs over 2 sessions to get it right

Running locally on a GTX 5080 with llama.cpp meant I could iterate fast. No API costs, no rate limits. Just run, read, fix, repeat.

Failure #1: The Repetition Loop

The first problem was obvious from the transcripts. The contrarian kept making the same point:

Round 2: CONTRARIAN
"The group is converging on a strong positive correlation
between estimated owners and peak CCU, but..."

Round 3: CONTRARIAN
"The group is converging on a strong positive correlation
between estimated owners and peak CCU, but..."

Round 4: CONTRARIAN
"The group is converging on a strong positive correlation
between estimated owners and peak CCU, but..."

Same opener. Same claim. Slightly different ending each time. The model found a pattern that technically counted as "contributing" and stuck with it.

The Prompt Fix (That Didn't Work)

First attempt was adding anti-repetition rules to the prompt:

DO NOT repeat what you or others have already said.
If you've made this point before, PASS instead.

The model ignored this about 50% of the time. I was learning that LLMs don't follow rules the way I expected.

What I Tried Next

I wondered: what if I stopped asking the model to police itself and just checked contributions before accepting them?

def _is_repetitive(self, persona: str, new_contribution: str) -> bool:
    """Check if contribution repeats recent ones from same persona."""
    recent = self._get_recent_contributions(persona, n=3)

    for old in recent:
        # Exact match
        if new_contribution.strip() == old.strip():
            return True

        # 80% word overlap
        new_words = set(new_contribution.lower().split())
        old_words = set(old.lower().split())
        overlap = len(new_words & old_words) / max(len(new_words), 1)

        if overlap > 0.8:
            return True

    return False

I started simple: word-set overlap. No embeddings, no fancy NLP. Just "are 80% of the words the same?" I figured I could add complexity later if needed.

The Result

$ grep "repetition_rejected" run.log

Round 3: CONTRARIAN → repetition_rejected

Round 4: CONTRARIAN → repetition_rejected

Round 5: CONTRARIAN → Tool: correlate(filter_expr="Price < 60")

# Forced to try something new

After being rejected twice, the contrarian tried a different approach entirely. I hadn't expected this, but blocking the easy path seemed to push the model toward actual creativity.

Failure #2: The Folding Skeptic

The skeptic was supposed to be the hardest to please. During discussion, it acted the part:

SKEPTIC: "The claim that higher priced games have lower peak
CCU counts may be misleading without considering the volume
of sales."

SKEPTIC: "The correlation could be influenced by a confounding
variable. Higher-priced games could be from more established
developers..."

Good concerns. Exactly what a skeptic should raise. Then came the vote:

{
  "agent": "skeptic",
  "vote": "agree",
  "reason": "The insight is supported by evidence presented."
}

I dug into the prompts and realized my voting prompt was too generic. It asked "do you agree with this consensus?" without reminding the skeptic of their adversarial role. Maybe that was the problem?

My Second Attempt: Prompts With Numbers

This time, prompt changes helped, but only when paired with explicit thresholds:

def get_adversarial_vote_prompt(persona: str, proposal: dict) -> str:
    """Special voting prompt for skeptic/contrarian."""
    return f"""
    As {persona.upper()}, you are ADVERSARIAL by nature.

    Vote DISAGREE unless ALL conditions are met:
    - Evidence is specific and quantified
    - Correlation >= 0.5 for "high" confidence claims
    - Correlation >= 0.3 for "medium" confidence claims
    - The insight is genuinely non-obvious

    The proposal claims {proposal['confidence']} confidence.
    Check: does the evidence support that confidence level?

    If you raised concerns during discussion, were they addressed?
    """

I think what made the difference was giving concrete numbers. "Be skeptical" is vague. "Reject if r < 0.5 for high-confidence claims" gives the model something specific to check.

The Result

From the actual transcript, Session 4:

{
  "agent": "skeptic",
  "vote": "disagree",
  "reason": "The correlation provided (r=0.2) is weak for a
            'high' confidence level, and the pattern seems
            obvious: higher prices lead to lower peak CCU counts."
}

The skeptic actually cited the threshold I gave it. I was surprised it internalized the numbers that cleanly.

Failure #3: The Quality Gap

The final issue was subtler. Proposals would claim "high confidence" with weak evidence:

{
  "insight": "Higher prices are a contributing factor...",
  "evidence": ["Negative correlation between price and CCU"],
  "confidence": "high"
}

What correlation? The transcript showed r = 0.2. That's weak. Not "high confidence" material.

What I Tried

I added regex-based validation to parse the proposal for correlation values:

def validate_proposal_quality(proposal: dict) -> tuple[bool, str]:
    """Check if stated confidence matches actual evidence strength."""
    confidence = proposal.get('confidence', 'low')
    evidence = ' '.join(proposal.get('evidence', []))

    # Extract correlation values from evidence
    r_values = re.findall(r'r\s*=?\s*(0\.\d+)', evidence, re.I)

    if confidence == 'high':
        for r in r_values:
            if float(r) < 0.5:
                return False, f"r={r} too weak for 'high' confidence"

    return True, ""

The idea: if a proposal claims high confidence, the numbers should back it up. Whether this is the right approach, I'm still figuring out.

Before and After

BEFORE FIXES

✗Final vote: 6-0 unanimous

✗Repetition rejections: 0

✗Skeptic: "Evidence is sufficient"

✗Filter tool calls: all failed

AFTER FIXES

✓Final vote: 4-2 (healthy dissent)

✓Repetition rejections: 6-8 per run

✓Skeptic: "r=0.2 is weak for 'high'"

✓Filter tool calls: 30+ working

I'm taking the 4-2 split as a good sign. It suggests the adversarial personas might actually be doing their job now instead of rubber-stamping consensus. But I'll need more runs to be sure.

This is a Checkpoint, Not a Finish Line

Reading those transcripts, I see progress. The repetition detection works. The adversarial voting works. The quality validation catches overconfident proposals.

But the system still rejects a lot of contributions. The validation checks fire frequently, which means personas are still producing low-quality output that needs to be caught. That's the safety net working, but it's also a sign the underlying behavior needs more tuning.

This is iteration in progress, not a solved problem.

What I'm Taking Away (So Far)

1. When I needed guaranteed behavior, prompts weren't enough. I kept trying to fix things with better instructions. What actually worked was checking in code. Maybe prompts are suggestions and code is enforcement? That's my working theory.

2. Defining an adversarial persona wasn't enough. I thought giving a persona a skeptical personality would make it skeptical. Turns out I needed to remind it at every decision point, with specific numbers to check against. I'd seen hints of this earlier when building feedback loops, but it was more pronounced here.

3. Explicit thresholds seemed to help. "Be skeptical" didn't work. "Reject if r < 0.5" did. I don't know if this generalizes, but it worked for my use case.

4. Local inference let me iterate fast. Nine runs over two sessions, reading transcripts, spotting failures, testing fixes. If I'd been paying per API call, I probably would have stopped at run three.

5. Multi-agent systems have weird failure modes. A single agent producing weak insights is one thing. Six agents collaborating on weak insights, then unanimously approving them? I didn't see that coming.

The iteration continues. Next up: either a web visualization of these roundtable debates, or pointing the system at a new dataset to see what breaks.

$ ls ../blog/ --project="Agent-Driven Discovery"

Five Personas, One Dataset: How Different Agents Find Different Insights Building Feedback Loops into Multi-Agent Systems