CompleteMCP Server

llama-mcp-server

An MCP server bridging Claude Code to local llama.cpp, built with the "True Ralph" pattern.

44
Iterations
398
Tests

Test Growth

Phase 1██░░░░░░░░░░░░░░░░░░45Infrastructure (6 tasks)
Phase 2██████░░░░░░░░░░░░░░120Server Tools (10 tasks)
Phase 3█████████░░░░░░░░░░░180Token Tools (6 tasks)
Phase 4██████████████░░░░░░280Inference Tools (10 tasks)
Phase 5-7██████████████████░░360Model/LoRA/Process (12 tasks)
Phase 8████████████████████398Integration (4 tasks)

The Prompt

Prompt
You are Ralph. Read these files:
- specs/tools.md (tool specifications)
- specs/conventions.md (code patterns)
- specs/task-list.md (your task list)

Pick the FIRST task marked [ ] (not started). Complete ONLY that task.

After completing the task:
1. Run: npm run typecheck
2. Run: npm test
3. If both pass, mark the task [x] in specs/task-list.md
4. Output: RALPH_WIGGUM_COMPLETE

If tests fail, fix the issue and try again. Do not move to other tasks.

The Journey

This was the first test of the 'True Ralph' pattern: each task gets its own context window. No shared memory between tasks. All knowledge persists in files.

I wrote detailed specs upfront: 19 tools mapped 1:1 to llama.cpp API endpoints, conventions for code patterns, and a task list with 44 atomic tasks. The first task was just 'create types.ts with Tool and ToolResult interfaces'. Ralph created 20+ types. This over-delivery helped every subsequent task.

The loop script was simple: check for uncompleted tasks, spawn Claude with the specs, repeat. Each Ralph read the specs, picked the first incomplete task, implemented it, ran tests, marked it done, and exited.

Something interesting happened around task 7. Ralph created a formatError() helper function to give user-friendly error messages. Every subsequent Ralph copied this pattern. Convention inheritance without shared memory.

44 tasks completed with zero failures. The only hiccup was hitting the API rate limit partway through. Added error handling to wait 5 minutes on failures instead of spinning.

At the end, I discovered the loop wasn't exiting. The task-list.md had a 'Status Legend' section showing `- [ ] Not started` as an example. Grep matched that and thought there were still tasks. Lesson: don't put checkbox syntax in legend sections.

The result: a production-quality MCP server with 19 tools, comprehensive error handling, and 398 tests. All built autonomously while I watched progress bars move.

Lessons Learned

  • True Ralph pattern works: each task in a fresh context window, knowledge persists in files
  • Front-loading types pays off: Task 1 over-delivered 20+ types, helped all later tasks
  • Convention inheritance: later Ralphs copy patterns from earlier Ralphs (formatError helper)
  • Task granularity matters: small atomic tasks beat large multi-step ones
  • Rate limit handling needed: loop script should wait on API errors, not spam retries
  • Checkbox legend gotcha: don't use `- [ ]` in legend sections (grep matches it)
  • Integration tasks work: wiring everything together succeeded first try