Project-Specific Ralph Loops: A Skill-Based Approach for Monorepos

Mario Giancini
Mario Giancini
@MarioGiancini
Published:
Read Time: 16 min

The official Ralph loop plugin is broken.

I discovered this after spending an evening trying to get it working. CVE-2025-54795—a security patch for command substitution—apparently broke the loop mechanism entirely. It runs once and exits. No iteration. No verification feedback.

Most tutorials I found assume you're setting up Ralph at the user level. Install a plugin, add some global config, and you're off. That works great for a single project. But I work in a monorepo with multiple projects, each with different verification commands, different context files, different completion criteria.

I needed something project-specific. Something I could version control. Something that would travel with the codebase, not live in my home directory.

So I built a skill.

What is a Ralph Loop?

If you haven't encountered Geoffrey Huntley's Ralph Wiggum technique, here's the short version: it's an autonomous TDD workflow where Claude iteratively works on tasks until verification passes and a completion signal is detected.

The original vision is elegant:

  1. External bash loop spawns Claude with a task
  2. Claude works on it, then exits
  3. Verification runs (tests, types, lint)
  4. If verification fails or task incomplete, loop spawns a fresh Claude session
  5. Each iteration starts clean—no accumulated context, no polluted transcript
  6. When Claude outputs a completion promise, the loop exits

It's beautifully simple. The agent gets constant feedback. Failures become input for the next iteration. You can walk away and come back to working code.

Two Architectural Approaches

Not all Ralph implementations are equal. There's a fundamental architectural split in how the loop continues.

Fresh Context (Original Vision)

Geoffrey Huntley's original pattern and Gordon Mickel's flow-next use an external bash loop:

while true; do
  # Spawn NEW Claude session each iteration
  # --dangerously-skip-permissions required: --print mode can't handle interactive prompts
  claude --print --output-format text --dangerously-skip-permissions \
    "Read state and continue task"

  # Run verification
  pnpm verify

  # Check completion
  if grep -q "COMPLETE" output.txt; then break; fi
done

Pros:

  • Each iteration starts with clean context
  • Failed attempts don't pollute future iterations
  • True "re-anchoring" from source files every time
  • No transcript bloat

Cons:

  • More complex setup (external script)
  • Loses conversation context that might be useful
  • Higher overhead (spawning new sessions)
  • Requires --dangerously-skip-permissions - The --print mode can't handle interactive tool permission prompts, so it will hang indefinitely without this flag

Same-Session (Anthropic's Pattern)

Anthropic's official plugin uses a stop hook that blocks exit and re-prompts within the same session:

# In stop hook
jq -n --arg prompt "$PROMPT" '{"decision": "block", "reason": $prompt}'

Pros:

  • Simpler implementation (just hooks)
  • Works within Claude Code's existing system
  • Can leverage conversation context

Cons:

  • Transcript accumulates (mitigated by compaction)
  • Failed attempts stay in context
  • Not true re-anchoring

The Honest Assessment

When I built my implementation, I followed Anthropic's pattern. Stop hooks, same session, accumulated context. It works—especially for bounded tasks like "fix all TypeScript errors."

But Gordon Mickel's critique is valid: Anthropic's own long-context guidance says "agents must re-anchor from sources of truth to prevent drift." Their plugin doesn't re-anchor. Neither did my original implementation.

The question is whether this matters for your use case.

When Each Approach Fits

Same-session works well for:

  • Bounded tasks (fix errors, add tests)
  • Short runs (under 20 iterations)
  • Tasks where conversation context helps

Fresh-context is better for:

  • Long-running loops (50+ iterations)
  • Multi-task backlogs (clearing prd.json)
  • Tasks where failed attempts might confuse future iterations

My implementation now supports both. Fresh-context is the default for multi-task mode (--next), with same-session available via the --same-session flag when you need it.

The Problem with User-Level Config

The standard approach puts Ralph configuration in your Claude Code user settings. This works, but it has limitations:

Different projects need different verification. My Next.js project runs pnpm verify (test + tsc + lint). A Python project might run pytest && mypy && ruff. A Go project wants go test ./... && go vet.

Context files vary. One project tracks tasks in plans/prd.json. Another uses GitHub Issues. Another has a TODO.md at the root.

The loop logic is hidden. It lives in my home directory, invisible to anyone else working on the project. It can't be version controlled with the code it's meant to help write.

I wanted Ralph to be a property of the project, not the developer.

The Skill-Based Pattern

Claude Code skills are packages of instructions that live in .claude/skills/. They can be shared across projects, imported into other agents, and—crucially—they can help you set up project-specific configurations.

Here's the structure I ended up with:

digital-self/                    # Monorepo root
├── .claude/
│   └── skills/
│       └── ralph-loop-setup/    # Reusable skill
│           ├── SKILL.md         # Discovery + overview
│           ├── setup.md         # Installation guide
│           ├── templates/       # Command, hook, and script templates
│           └── hooks/           # Stop hook template

└── 1-projects/
    └── mission-control/         # Individual project
        ├── .claude/
        │   ├── commands/
        │   │   ├── ralph-loop.md    # Project-specific start command
        │   │   └── cancel-ralph.md  # Cancel command
        │   ├── hooks/
        │   │   └── stop-hook.sh     # Same-session verification
        │   └── settings.json        # Hook registration
        ├── scripts/
        │   └── ralph/
        │       └── ralph.sh         # Fresh-context external loop
        └── plans/
            ├── progress.md          # Cross-session context (learnings)
            ├── guardrails.md        # Learned constraints (signs)
            └── prd.json             # Task tracking

The skill at the monorepo level knows how to install Ralph into any project. The project-specific files contain the actual configuration—customized verification, customized context paths, and a choice between same-session and fresh-context modes.

The Core Components

Same-Session Mode: The Stop Hook

For bounded tasks where you want simplicity, the stop hook approach works within Claude Code's native system:

#!/bin/bash
set -euo pipefail

RALPH_STATE_FILE=".claude/ralph-loop.local.md"

# If no state file, allow normal exit
if [ ! -f "$RALPH_STATE_FILE" ]; then
  exit 0
fi

# Parse state, run verification, build continuation prompt
# ...

# Block exit and re-prompt IN THE SAME SESSION
jq -n --arg prompt "$PROMPT" '{"decision": "block", "reason": $prompt}'

The key insight: customize the verification command for your project. That pnpm verify line is the only thing that changes between projects.

Trade-off: Context accumulates across iterations, but Claude Code's automatic compaction helps. Good enough for most bounded tasks.

Fresh-Context Mode: The External Loop

For long-running or multi-task backlogs, the external loop spawns fresh Claude sessions:

#!/bin/bash
# scripts/ralph/ralph.sh

MAX_ITERATIONS=${1:-50}
ITERATION=0

while [ $ITERATION -lt $MAX_ITERATIONS ]; do
  ITERATION=$((ITERATION + 1))
  echo "=== Ralph Loop: Iteration $ITERATION of $MAX_ITERATIONS ==="

  # Spawn FRESH Claude session each iteration
  # --dangerously-skip-permissions is REQUIRED for non-interactive mode
  OUTPUT=$(claude --print --output-format text --dangerously-skip-permissions \
    "Read .claude/ralph-state.md and plans/prd.json. \
     Pick next failing task and work on it. \
     Output <promise>COMPLETE</promise> when ALL tasks pass.")

  # Run verification
  pnpm verify
  VERIFY_EXIT=$?

  # Check for completion promise
  if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
    echo "All tasks complete!"
    exit 0
  fi

  # Update state file with iteration count and verification result
  # ...
done

echo "Max iterations reached"

Trade-off: Each iteration starts clean, but you lose any useful conversation context. Better for clearing a full prd.json backlog where drift would be a problem.

Choosing Your Mode

My implementation supports both modes. Fresh-context is the default for multi-task (--next) because it follows the true Ralph pattern—progress persists, failures evaporate.

# Single task, same-session - simpler, good for bounded tasks
/ralph-loop "Fix all TypeScript errors"

# Multi-task, fresh-context (default) - clean slate each iteration
/ralph-loop --next

# Multi-task, same-session (opt-in) - keeps conversation context
/ralph-loop --next --same-session

Or run the external script directly:

./scripts/ralph/ralph.sh 100  # Max 100 iterations

The State File

The loop needs to track its state across iterations. I use a simple markdown file with YAML frontmatter:

---
active: true
iteration: 3
max_iterations: 50
completion_promise: "COMPLETE"
started_at: "2026-01-10T12:00:00Z"
---

## Task

Fix all TypeScript errors and ensure tests pass.

This file is gitignored (.claude/ralph-loop.local.md). It's ephemeral—created when you start a loop, deleted when you finish.

The Context Files

Every Ralph loop needs two things: a task list and a progress log.

prd.json tracks what needs to be done:

{
  "branchName": "feature/auth-flow",
  "tasks": [
    {
      "id": "T-001",
      "title": "Add login form validation",
      "acceptanceCriteria": ["Email format validated", "Password min 8 chars"],
      "passes": false
    }
  ]
}

progress.md is Claude's memory across iterations:

# Ralph Progress Log

## Codebase Patterns
- Uses Zod for validation schemas
- Form components in app/_components/forms/

## 2026-01-10 - T-001
- Added validation schema
- Discovered existing useForm hook to leverage
- **Learning:** Input component has built-in error display

This is where the real power emerges. Claude learns the codebase patterns as it works. Those learnings persist across context window resets.

Guardrails (Signs)

This is where it gets interesting. Agrim Singh's "ralph for idiots" post introduced me to a pattern I hadn't considered: guardrails (also called "signs").

Guardrails are learned constraints that prevent repeated failures. Every iteration reads them and follows them.

guardrails.md captures these constraints:

# Ralph Guardrails (Signs)

Progress should persist. Failures should evaporate.

---

### SIGN-001: Verify Before Complete
**Trigger:** About to output completion promise
**Instruction:** ALWAYS run verification and confirm it passes first

### SIGN-002: Check All Tasks Before Complete
**Trigger:** Completing a task in multi-task mode
**Instruction:** Re-read prd.json and count `passes: false` tasks.
Only output completion when ALL pass.

### SIGN-003: Document Learnings
**Trigger:** Completing any task
**Instruction:** Update progress.md with patterns discovered

I learned this the hard way. My first multi-task run output <promise>COMPLETE</promise> after finishing one task instead of waiting for all tasks. Context had accumulated, and Claude lost track of the instruction to check ALL tasks. After adding SIGN-002 to guardrails.md, the problem disappeared.

The beauty of guardrails: they're project-specific. My Mission Control guardrails include things like "run pnpm prisma generate after schema changes." A Python project would have different constraints.

Add signs as you encounter failure patterns. The file grows over time, encoding lessons from every iteration failure. Eventually, your Ralph loop becomes resistant to the mistakes it used to make.

Catching What Tests Don't: Visual Snapshots

Tests verify behavior. They don't verify appearance.

After running a 30-iteration loop that touched multiple components, I realized I had no way to know if the UI still looked right. The tests passed. TypeScript was happy. But had I broken a layout somewhere? Misaligned a header? Removed a loading state?

So I added visual snapshots—a simple Playwright-based approach that captures key pages before and after a Ralph loop runs.

// scripts/ralph/snapshot.ts
async function captureSnapshots(phase: 'before' | 'after', runId: string) {
  const browser = await chromium.launch({ headless: true })
  const page = await browser.newPage()

  for (const pageConfig of config.pages) {
    await page.goto(`${baseUrl}${pageConfig.path}`, { waitUntil: 'networkidle' })
    if (pageConfig.waitFor) {
      await page.waitForSelector(pageConfig.waitFor, { timeout: 10000 })
    }
    await page.screenshot({ path: `${outputDir}/${pageConfig.name}.png` })
  }
}

Configure which pages to capture:

{
  "baseUrl": "http://localhost:3000",
  "viewport": { "width": 1280, "height": 800 },
  "pages": [
    { "name": "dashboard", "path": "/dashboard", "waitFor": "nav", "delay": 1000 },
    { "name": "settings", "path": "/settings", "waitFor": "nav" },
    { "name": "profile", "path": "/profile", "waitFor": "nav" }
  ]
}

The key insight: focus on layout, not data. I capture component structure and spacing, not the actual content. A new lead showing up in a table isn't a regression—a missing table header is.

Use with --snapshots:

/ralph-loop --next --snapshots

Output is advisory, not blocking:

📸 Visual Snapshots for Manual Review:
   Before: scripts/ralph/snapshots/20260111-120000/before/
   After:  scripts/ralph/snapshots/20260111-120000/after/

   Compare key pages for UI/UX regressions (layout, spacing, components).
   This is advisory - tests passed, but visual review recommended.

Why manual comparison? Automated pixel diffs would flag every data change as a "regression." What I care about is structural changes—did the nav disappear? Did the grid layout break? Did a component shift positions? That's a quick visual scan, not an automated diff.

The Start Command

The command supports both modes and multi-task workflows:

---
description: Start an autonomous Ralph loop for iterative development
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, TodoWrite
argument-hint: "<task>" | --next [--same-session] [--snapshots] [--branch NAME] [--max-iterations N]
---

# Ralph Loop

<instruction>
1. Parse arguments (task, --next, --same-session, --branch, max-iterations)
2. If --next without --same-session, launch external loop script and exit
3. Otherwise, create state file and begin same-session loop
4. Read context files (progress.md, guardrails.md, prd.json)
5. When ALL tasks complete, output: <promise>COMPLETE</promise>
</instruction>

Usage examples:

# Single task, same-session (default for single tasks)
/ralph-loop "Fix all TypeScript errors" --max-iterations 20

# Multi-task, fresh-context (default for --next)
/ralph-loop --next

# Multi-task, same-session (opt-in when context helps)
/ralph-loop --next --same-session

# With visual snapshots for UI regression review
/ralph-loop --next --snapshots

# Work on a separate branch
/ralph-loop --next --branch ralph/backlog

# Preview what would run
/ralph-loop --next --dry-run

And cancel with:

/cancel-ralph

Why This Pattern Works

Version controlled. The entire Ralph configuration lives in the repo. Anyone cloning the project gets the same autonomous workflow capability.

Project-appropriate. Each project defines its own verification command. No awkward switching between global configs.

Discoverable. The skill documents itself. New team members can run /ralph-loop --help and understand what's available.

Composable. The monorepo skill can install Ralph into any project. Update the skill, and you can propagate improvements to all projects.

Debuggable. When something goes wrong, you can read the state file, check the hook logic, trace the transcript. Everything is visible.

When to Use Ralph Loops

They're excellent for:

  • Bounded refactoring - "Convert all class components to hooks"
  • Error cleanup - "Fix all TypeScript errors"
  • Test coverage - "Add tests for all API routes"
  • Feature implementation - With clear acceptance criteria in prd.json

They're less suitable for:

  • Open-ended exploration - "Make the app faster" (too vague)
  • Design decisions - Claude can suggest, but you should approve
  • Content creation - Unless you have very specific templates

The key is verification. If you can write a command that checks success, Ralph can iterate toward it.

Setting It Up

Quick Start (Recommended)

Install the plugin once, use it in any project:

# Add the marketplace
/plugin marketplace add MarioGiancini/ralph-loop-setup

# Install the plugin
/plugin install ralph-loop-setup

Then in any project, just ask Claude:

Install Ralph loops in this project. Verification command is `pnpm test`.

The skill handles all the setup automatically.

Alternatively, clone directly:

git clone https://github.com/MarioGiancini/ralph-loop-setup.git ~/.claude/skills/ralph-loop-setup

Manual Setup

If you prefer to set up manually:

  1. Create the directories
mkdir -p .claude/commands .claude/hooks plans
  1. Add the stop hook - Customize the verification command for your stack

  2. Register the hook in .claude/settings.json:

{
  "hooks": {
    "Stop": [{"hooks": [{"type": "command", "command": ".claude/hooks/stop-hook.sh"}]}]
  }
}
  1. Add the commands - ralph-loop.md and cancel-ralph.md

  2. Create context files - progress.md, guardrails.md, and prd.json

  3. Gitignore the state file

.claude/ralph-loop.local.md

The skill pattern means I can do this for any new project with a single command. The templates are ready; the patterns are proven.

What I Learned

Building this taught me several things about autonomous agent patterns:

Project-aware beats user-aware. Skills, commands, and hooks that live with the code are more valuable than global plugins. They're shareable. They're discoverable. They evolve with the project.

Understand the architectural trade-offs. I initially followed Anthropic's stop-hook pattern without realizing I was making an architectural choice. Same-session vs fresh-context isn't a detail—it fundamentally affects how your agent handles long-running work. Know what you're choosing.

Re-anchoring matters for long runs. For short, bounded tasks, accumulated context is fine. For clearing a backlog of 10+ tasks, fresh context prevents subtle drift where failed approaches from task 3 influence how the agent thinks about task 8.

Offer the choice. My implementation now supports both modes. Fresh-context is the default for multi-task (--next) because it follows the true Ralph philosophy—progress persists, failures evaporate. Use same-session (--same-session) for bounded single tasks where conversation context helps.

Guardrails compound. When you encounter a failure pattern, add it to guardrails.md. Over time, your Ralph loop becomes resistant to mistakes it used to make. The SIGN-002 guardrail ("Check All Tasks Before Complete") saved me from the premature completion bug I described earlier.

Tests don't catch everything. After a long loop, your tests might pass but your UI might be broken. Visual snapshots fill the gap that automated testing can't. It's not about pixel-perfect comparison—it's about catching structural regressions that would slip through CI.

Non-interactive mode needs explicit permissions. The claude --print mode can't prompt for tool permissions interactively. Without --dangerously-skip-permissions, your fresh-context loop will hang indefinitely, appearing to work but producing no output. I discovered this the hard way—14 minutes of watching a process that was actually stuck waiting for approval that would never come.

The official Ralph plugin will probably get fixed eventually. But I'm not sure I'll switch back. The project-specific approach gives me something the plugin can't: configuration that makes sense for each codebase I work in, and the choice of which architectural pattern to use.

If you're working in a monorepo with multiple projects, or you want your Ralph configuration to be part of your codebase, the skill-based pattern is worth the setup investment.

The loop will run. The tests will fail. Claude will learn. And eventually, you'll see that satisfying message:

<promise>COMPLETE</promise>

If you're experimenting with Ralph loops or other autonomous agent patterns, I'd love to hear what's working for you. Are you using fresh-context or same-session? What verification commands are you running? Hit me up on Twitter/X or LinkedIn.


The ralph-loop-setup plugin is available for any Claude Code environment. Add the marketplace with /plugin marketplace add MarioGiancini/ralph-loop-setup, then install with /plugin install ralph-loop-setup.

Thanks to Geoffrey Huntley for the original Ralph pattern, Ryan Carson for snarktank/ralph, Gordon Mickel for the flow-next critique that pushed me to understand the architectural trade-offs, and Agrim Singh for the "ralph for idiots" thread that introduced me to guardrails and the philosophy that "progress should persist, failures should evaporate."


Did you find this article useful?

Subscribe to my newsletter for more content like this.


Related Posts

View All →