Autoresearch
Autonomous experiment harness that iterates on prompts, configs, and agent workflows overnight on the Dell Pro Max GB10.
$ inspired by karpathy/autoresearch
$ cat story.md
Autoresearch is the harness I use to figure out what is actually worth changing in a prompt, config, or agent workflow. I run 15 iterations overnight, see which dimensions moved, and decide what to try next. Without it I was guessing at what to tune.
Still early. Most of what I am doing right now is probing my own agent workflows to find where the design assumptions break down. The trace explorer on the ai-data-analyst project feeds this loop: look at the telemetry, pick a dimension that is lagging, run an autoresearch experiment, see if the scores move. The harness itself evolves as I use it, so I keep the Autoresearch Harness Log for notes on what is working and what is not.
$ ls ./components
- -Autonomous iteration loops that run unsupervised
- -Diff-based edits for large targets and reasoning-heavy models
- -Script-based evaluation beyond LLM-as-judge
- -Scratchpad-style context management over long experiments
- -Reusable harness patterns across different target types
- -Dual-model orchestration (editor and evaluator on different ports)