BuiltClaude CodeVitestPlaywrightReactTypeScriptMCP

Ralph Loops

Learning autonomous AI development through test-driven loops.

$ cat story.md

I wanted to understand what makes AI succeed or fail at completing tasks on its own. The hypothesis: if your tests fully specify what you want, the AI can build it.

The pattern is simple. Write tests that capture exactly what you'd accept AND reject, give the AI a prompt, let it run until tests pass. The magic is in learning what makes good tests.

This isn't about building production software. It's about learning a methodology that makes AI more useful, and discovering that good PRDs and tests are valuable skills even without AI.

$ ls ./experiments/

CompleteParser

TryParse

A parser combinator library built in 2 iterations.

2iterations

201tests

▶playable demo

CompleteComponent Library

CLI Components

A React component library built in 2 iterations.

Photon Forge

A light-beam puzzle game built in 9 iterations.

llama-mcp-server

An MCP server bridging Claude Code to local llama.cpp, built with the "True Ralph" pattern.

44iterations

398tests

$ cat pipeline.txt

Setup

Run Loop

Observe

Document

Iterate

Components Complete16 of 16

Tests Define DoneLevel 1

AI passes whatever tests you write. Shallow tests = shallow code.

Complete

Unit Tests for LogicLevel 1

E2E tests verify "it runs" not "it is correct."

Complete

Test Negative CasesLevel 2

Tests for what should NOT happen catch bugs positive tests miss.

Complete

Fixture TestsLevel 2

Lock critical behavior with known-good states.

Complete

The Solver PatternLevel 3

Build a validator, test all generated content against it.

Complete

Explicit RequirementsLevel 2

If you would reject valid output, that criterion is missing.

Complete

Rejection CriteriaLevel 3

Generators need both acceptance AND rejection tests.

Complete

Design ConstraintsLevel 3

You can not test "looks good" but CAN test "follows rules."

Complete

Build TestsLevel 2

Unit tests pass but build fails. Always test tsc/build.

Complete

Mobile TestingLevel 3

Browser devtools misses real device issues. Test on phones.

Complete

When to CutLevel 4

Debugging non-essential features can be worse than removing them.

Complete

One Iteration Enough?Level 4

For bounded problems with clear tests, the loop is a safety net.

Complete

Exhaustive Success CriteriaLevel 2

If E2E tests matter, they must be in the explicit checklist.

Complete

Verify ScaffoldingLevel 2

Run all test suites once before Ralph to catch config bugs.

Complete

jsdom vs BrowserLevel 3

Code that passes jsdom tests may crash in real Chrome.

Complete

Demo Styling ContextLevel 3

Demos must match their deployment context (dark theme, etc.).

Complete

$ cat current-status.md

llama-mcp-server Complete

Publishing

January 15, 2026

llama-mcp-server validated the "True Ralph" pattern: 44 tasks executed in separate context windows, zero failures, 398 tests. Each Ralph read specs from files, completed one task, and exited. Knowledge persisted in code, not memory.

Key discovery: convention inheritance without shared memory. Ralph #7 created a formatError() helper. Every subsequent Ralph copied the pattern. Institutional knowledge lives in files.

Four subprojects now complete. Prepping llama-mcp-server for npm publish as first open source contribution. Package name confirmed: llama-mcp-server.

Milestones

5 / 7

Photon Forge: 9 loops, 330 tests, puzzle game

TryParse: 2 loops, 201 tests, parser library

CLI Components: 2 loops, 119 tests, React components

llama-mcp-server: 44 tasks, 398 tests, MCP server

True Ralph pattern validated

Publish to npm as llama-mcp-server

Blog post: Building MCP Server with Autonomous AI

-What makes AI succeed/fail at autonomous tasks
-Writing tests that fully specify requirements
-Patterns for different task types (games, parsers, UI, MCP servers)
-Transferable skills: PRD writing, test design
-Multi-context execution (True Ralph pattern)

$ ls ./blog/ --project="Ralph Loops"

Setting Up Your First Ralph Loop: A Practical Guide

Jan 15, 2026

How I set up autonomous AI development with specs, task lists, and a bash loop

Building an MCP Server in 2 Hours with 44 Autonomous AI Tasks

Jan 15, 2026

How fresh context windows per task changed my AI-assisted development workflow

Ralph Loops Work Too Well (Now What?)

Jan 12, 2026

I tried two different approaches to test-driven AI development. Both worked. Here's what I learned about writing tests as requirements.

View all posts ->

Jan 15, 2026

llama-mcp-server ready for npm publishllama-mcp-server

Package name confirmed (llama-mcp-server). First open source contribution. 19 tools bridging Claude Code to local llama.cpp.

Jan 14, 2026

True Ralph pattern validated: 44 tasks, zero failuresllama-mcp-server

Multi-context execution works. Each task in fresh context window, knowledge persists in files. Convention inheritance observed: later Ralphs copy patterns from earlier ones.

llama-mcp-server experiment startedllama-mcp-server

Set up 44 atomic tasks to build MCP server for llama.cpp. Testing "True Ralph" pattern: each task gets own context window, specs live in files.

CLI Components complete: success criteria lessonCLI Components

React component library completed in 2 iterations. Key lesson: E2E tests weren't in the explicit checklist, so Ralph never ran them. Success criteria must be exhaustive.

jsdom vs browser differences discoveredCLI Components

Unit tests passed in jsdom but E2E failed in Chrome with "Illegal invocation." Timer functions need their original context in real browsers.

CLI Components experiment startedCLI Components

Set up 97 unit tests and 22 E2E tests for 4 React components: TypingEffect, ProgressBar, Collapsible, CopyButton. Testing methodology on UI component library.

Jan 11, 2026

TryParse complete: build tests lesson learnedTryParse

Parser library completed in 2 iterations. Discovered that unit tests passing does not mean the build succeeds. Added build test (tsc --noEmit) as Lesson 13.

One iteration can be enoughTryParse

TryParse completed 180 tests in a single Claude invocation. The loop is a safety net for larger projects, not a requirement for bounded problems.

TryParse experiment startedTryParse

Set up 180 tests for a parser combinator library. Testing autonomous approach vs phased approach from Photon Forge.

Photon Forge deployed with mobile fixesPhoton Forge

Fixed mirror removal for touch devices (tap-to-cycle). Cut endless mode rather than debug complex mobile timing issues. Lesson 13: know when to cut.

Jan 10, 2026

UI polish via design constraintsPhoton Forge

Discovered you can not test "looks good" but CAN test design rules (same button widths, min touch targets, no horizontal scroll). Ralph fixed 12 UI issues in 1 iteration.

Rejection criteria validatedPhoton Forge

Added test for non-trivial levels (mirrors >= 1). Fixed generator in 1 iteration. Lesson 11 proven: generators need rejection criteria, not just acceptance.

Endless mode bugs: multiple lessonsPhoton Forge

Three iterations to fix endless mode. Lessons: flaky tests let bugs slip, React hooks need direct tests, generators can produce technically-valid-but-unacceptable output.

Solver pattern enables content validationPhoton Forge

Built a solver to validate all 20 levels are solvable. Lesson: for content that needs to be "valid," build a validator and test against it.

Photon Forge v1: E2E tests not enoughPhoton Forge

First Ralph Loop experiment. E2E tests passed but game logic was buggy. Core lesson: E2E tests verify "it runs" not "it is correct."

Ralph Loops project startedMethodology

Created project to systematically test autonomous AI development. Goal: document what works through experiments, not theory.

--- journey start ---

Ralph Loops

$ cat story.md

$ ls ./experiments/

TryParse

CLI Components

Photon Forge

llama-mcp-server

$ cat pipeline.txt

$ cat current-status.md

llama-mcp-server Complete

Milestones

$ cat learning-goals.txt

$ ls ./blog/ --project="Ralph Loops"

Setting Up Your First Ralph Loop: A Practical Guide

Building an MCP Server in 2 Hours with 44 Autonomous AI Tasks

Ralph Loops Work Too Well (Now What?)

$ git log --oneline