Ralph k8s
2026-01-21 • Alexander Baxter
Deploying a Modular “Ralph” AI Agent to Kubernetes (Helm + Docker + ConfigMaps)
This idea is to create a Helm chart (.helm/) for running an autonomous coding agent (“Ralph”) as a Kubernetes Job. The core concept is:
- Prompts are code (versioned, modular, templated).
- The agent runtime is a container image (CLIs + polyglot toolchain).
- Scripts + prompts ship as ConfigMaps (mounted into the Job).
- Workspace + credentials persist in a PVC (so runs can resume and auth survives across pods).
Below is the setup process, from prompt design through operational tooling.
Designing the prompt (and making it modular)
1) Split “base” rules from provider-specific rules
The easiest way to keep prompts maintainable is to have:
- a base prompt that defines how the agent works (iteration loop, task selection, boundaries, etc.)
- a provider prompt that defines model-specific guidance (Claude vs Gemini vs others)
This repo follows that structure under:
.helm/prompts/base-prompt.md.helm/prompts/claude-prompt.md.helm/prompts/gemini-prompt.md
The chart compiles the final prompt using a ConfigMap template that concatenates the base + provider prompt and runs Helm templating (tpl) so you can inject values like job.expertise:
# .helm/templates/configmap-prompts.yaml
data:
system-prompt.md: |
{{- $base := .Files.Get "prompts/base-prompt.md" -}}
{{- $providerSpec := .Files.Get (printf "prompts/%s-prompt.md" .Values.job.aiProvider) -}}
{{- $combined := printf "%s\n\n%s" $base $providerSpec -}}
{{- tpl $combined . | nindent 4 -}} 2) Use a completion sentinel (but keep it safe)
Autonomous agents typically need a single unambiguous “done” signal so the outer loop can stop cleanly.
The base prompt and scripts use a strict completion sentinel tag. In this blog post, we’ll refer to it abstractly as:
<<COMPLETE_SENTINEL>>
…to avoid accidentally copy/pasting the real tag into places it doesn’t belong.
Setting up the script loop
It ships provider-specific loop scripts and selects one at runtime based on AI_PROVIDER.
Concretely:
- Scripts live in the chart under
.helm/scripts/(e.g.ralph-claude.sh,ralph-gemini.sh) - Helm packages them into a scripts ConfigMap (
.helm/templates/configmap-scripts.yaml) - The Job mounts that ConfigMap at
/app/scripts - The Job selects the right script at runtime with:
SCRIPT_PATH="/app/scripts/ralph-${AI_PROVIDER}.sh"bash "$SCRIPT_PATH" "$REPO_NAME" <maxIterations>
Provider scripts live at:
.helm/scripts/ralph-claude.sh.helm/scripts/ralph-gemini.sh
They both implement:
- prompt resolution (env
RALPH_PROMPT_FILE, fallback to/etc/ralph/system-prompt.md) - loop for
maxIterations - completion sentinel detection
- rate limit detection + sleep-and-retry logic
Building the Docker image (agents + runtimes installed)
The container image is the “agent workstation.” The Dockerfile builds a polyglot base:
- Ubuntu 24.04
- git, ssh, jq, coreutils, build tools
- Python 3, Node 20, Java 17
- Claude CLI + Gemini CLI
Excerpt:
FROM ubuntu:24.04
RUN apt-get update && apt-get install -y --no-install-recommends
git openssh-client jq python3 python3-pip python3-venv openjdk-17-jdk
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
&& apt-get install -y nodejs
RUN npm install -g @google/gemini-cli pnpm
RUN curl -fsSL https://claude.ai/install.sh | bash This image is referenced from Helm values:
# .helm/values.yaml
image:
repository: registry/ralph-polyglot-base
tag: latest Wiring prompts + scripts via ConfigMaps
Prompts ConfigMap
configmap-prompts.yaml builds a single file system-prompt.md at deploy time (base + provider-specific prompt).
The Job then mounts it into /etc/ralph and points RALPH_PROMPT_FILE at it.
Scripts ConfigMap
configmap-scripts.yaml packages every scripts/*.sh into a ConfigMap:
# .helm/templates/configmap-scripts.yaml
data:
# This ranges through the 'scripts/' folder and includes every .sh file
{{- range $path, $_ := .Files.Glob "scripts/*.sh" }}
{{ base $path }}: |-
{{ $.Files.Get $path | indent 4 }}
{{- end }} The Job mounts this ConfigMap at /app/scripts with defaultMode: 0755 so scripts are executable.
The main workload: .helm/templates/job.yaml
At a high level, the Job does three things:
- InitContainer clones/refreshes the repo into a persistent workspace PVC (
/work). - Main container restores auth, runs the loop script, then syncs auth back to the PVC on exit.
- Volumes mount the PVC, ssh key Secret, scripts ConfigMap, and prompt ConfigMap.
1) InitContainer: git clone with guardrails
The init container:
- configures SSH (
/ssh-secret/id_ed25519) - clones the repo (or reuses an existing clone)
- refuses to run if there are uncommitted/untracked changes (excluding
ralph-log.txt) - checks out the target branch and hard-resets to origin
This prevents “silent corruption” of a long-lived PVC workspace.
2) Main container: restore creds, run provider script, sync creds back
Key behaviors:
- Restores credentials from the PVC into
$HOMEif present - Picks a script by provider:
ralph-${AI_PROVIDER}.sh - On exit, copies
$HOMEcredential directories back into/workso future Jobs can restore them
Conceptually:
SCRIPT_PATH="/app/scripts/ralph-${AI_PROVIDER}.sh"
trap sync_auth_exit EXIT
bash "$SCRIPT_PATH" "$REPO_NAME" <maxIterations> 3) Volumes: PVC + Secret + ConfigMaps
The Job mounts:
- PVC at
/work(workspace + persisted auth) - SSH Secret at
/ssh-secret(git clone/push) - Scripts ConfigMap at
/app/scripts - Prompts ConfigMap at
/etc/ralph
pod-auth.yaml: logging agents in (and persisting credentials)
Because provider logins can be browser/interactive, the chart includes an Auth Pod you can exec into.
pod-auth.yaml runs a simple long-lived pod that mounts the same workspace PVC:
# .helm/templates/pod-auth.yaml
kind: Pod
spec:
containers:
- name: auth
command: ["sleep", "infinity"]
volumeMounts:
- name: workspace
mountPath: /work Typical flow
- Deploy auth pod.
kubectl execinto it.- Run
claude login/gemini login. - Copy auth state into
/workso Jobs can restore it later.
# Claude
cp -r ~/.claude /work/.claude-auth
cp ~/.claude.json /work/
# Gemini
cp -r ~/.gemini /work/.gemini job-cleanup.yaml: when things go wrong (rescue + reset)
Long-lived workspaces are great… until they aren’t. If a run leaves the repo in a messy state, the cleanup Job does two important things:
- Rescues changes to a timestamped branch (so you don’t lose work)
- Resets the workspace back to
origin/<integration branch>
Key behaviors in .helm/templates/job-cleanup.yaml:
- Detects dirty working tree or untracked files
- Creates
ralph/rescue-<timestamp> - Commits everything (
git add -A) and pushes the rescue branch - Hard resets and cleans to the integration branch
This is the operational “break glass” tool for PVC-backed sessions.
Justfile: ergonomic commands for humans
The Justfile is the operator UX: it wraps Docker + Helm + kubectl into repeatable commands.
Highlights:
just build: build + push the imagejust auth: deploy auth pod, exec in, then uninstalljust start <name>: deploy a job with a project-specific values filejust logs <name>: tail logsjust cleanup <name>: run the cleanup Job and stream logs
Example excerpt:
auth:
helm upgrade --install {{RELEASE_NAME}}-auth {{CHART_DIR}} -n {{NAMESPACE}} --set enableAuth=true --set enableJob=false -f {{VALUES_FILE}} --wait
kubectl exec -it {{RELEASE_NAME}}-auth -n {{NAMESPACE}} -- bash
helm uninstall {{RELEASE_NAME}}-auth -n {{NAMESPACE}}
start name:
helm upgrade --install {{RELEASE_NAME}}-{{name}} {{CHART_DIR}} -n {{NAMESPACE}} --set enableAuth=false -f {{VALUES_FILE}} -f {{name}}-values.yaml Putting it all together
- Design prompt: split base + provider prompts; keep it templated and versioned.
- Script loop: dispatch by provider/agent, detect completion sentinel, handle rate limits.
- Docker image: install CLIs + runtimes; run as non-root; include git/ssh/jq.
- ConfigMaps: mount scripts + compiled prompt into the Job.
- Job: init clone → run agent loop → sync auth/state → exit.
- Auth pod: interactive login; copy credentials into the PVC.
- Cleanup job: rescue branch + reset when the workspace gets stuck.
- Justfile: make all of the above easy to run consistently.