In ProgressPythonYAMLllama.cppJSONLSSH

Autoresearch

Autonomous experiment harness that iterates on prompts, configs, and agent workflows overnight on the Dell Pro Max GB10.

$ inspired by karpathy/autoresearch

$ cat story.md

Autoresearch is the harness I use to figure out what is actually worth changing in a prompt, config, or agent workflow. I run 15 iterations overnight, see which dimensions moved, and decide what to try next. Without it I was guessing at what to tune.

First full experiment cycle completed: ran an autopsy on the ai-data-analyst scoring pipeline, found 6 harness bugs (including one that silently capped a scoring dimension via a string-matching vocabulary gap), fixed them, and ran a 15-iteration experiment. The vocabulary fix alone moved the baseline from 9.30/10 to 9.78/10. The harness keeps a changelog now so future experiments can tell whether a metric moved because of a prompt change or because the harness changed underneath.

$ ls ./components

YAML Experiment ConfigDiff-Based Edit EngineLLM-as-Judge EvaluatorScript-Based EvaluatorStructural ValidatorScratchpad Context ManagerJSONL Event LoggerStatus MonitorMarkdown Digest GeneratorSSH Deploy ScriptsDual-Model Experiment Runner

-Autonomous iteration loops that run unsupervised
-Diff-based edits for large targets and reasoning-heavy models
-Script-based evaluation beyond LLM-as-judge
-Scratchpad-style context management over long experiments
-Reusable harness patterns across different target types
-Dual-model orchestration (editor and evaluator on different ports)

$ ls ./findings/ --project="Autoresearch"

Autoresearch Harness Log

evolving

updated Apr 12, 2026

Working notes on how I use the autoresearch harness to probe agent workflows, find design holes, and decide what to experiment with next.

Debugging Experiment Loops

evolving

updated Apr 11, 2026

Running observations from debugging autonomous experiment loops. What I find when I stop guessing from aggregates and trace through scoring code and spans.

View all findings ->

$ cat story.md

$ ls ./components

$ cat learning-goals.txt

$ ls ./findings/ --project="Autoresearch"

Autoresearch Harness Log

Debugging Experiment Loops