chopratejas/headroom is a local context compression layer for AI agents: it sits between your agent and the LLM, compresses everything the agent reads — tool outputs, logs, RAG chunks, files, conversation history — and forwards a much smaller prompt to the provider. The project's published benchmarks report 47–92% token savings on real agent traces, with reversible compression so the LLM can retrieve originals on demand. This guide walks you through installation and the four supported integration modes — library, proxy, agent wrap, and MCP — all based on Headroom's official README and docs.
Contents
Step 1Prerequisites
- Python 3.10+ with
pip— if you install via pip; or - Node.js 18+ with npm — if you want the TypeScript/Node package; or
- Docker — if you prefer a container.
- An AI coding agent (optional but recommended for the wrap mode): Claude Code, Codex, Cursor, Aider, or GitHub Copilot CLI.
Step 2Install Headroom
Option A — pip (Python, recommended)
pip install "headroom-ai[all]"
The [all] extra pulls in every component. If you'd rather opt in à la carte, granular extras are available: [proxy], [mcp], [ml] (the Kompress-base model), [code], [memory], [relevance], [image], [agno], [langchain], [evals].
If you use pipx, pin an interpreter explicitly:
pipx install --python python3.13 "headroom-ai[all]"
Option B — npm (TypeScript / Node)
npm install headroom-ai
Option C — Docker
Pull the image and start the proxy container (port 8787 mapped to host):
docker pull ghcr.io/chopratejas/headroom:latest docker run -p 8787:8787 ghcr.io/chopratejas/headroom:latest
pip install fails with a version error, check python --version first.
Step 3Pick a mode: wrap / proxy / library / MCP
Headroom supports four integration shapes. Pick the one that matches how you already use your tools — you don't have to commit to just one.
Mode A — Agent wrap (one command, easiest)
Wrap a supported coding agent in a single command; Headroom handles the proxy lifecycle and the agent's launch arguments for you:
headroom wrap claude # Claude Code (supports --memory, --code-graph) headroom wrap codex # Codex (shares memory with Claude) headroom wrap cursor # Cursor (prints config — paste it once) headroom wrap aider # Aider (starts proxy + launches) headroom wrap copilot # GitHub Copilot CLI (starts proxy + launches)
The compatibility matrix in Headroom's README also lists OpenClaw (installed as a ContextEngine plugin). Any OpenAI-compatible client can be used through the proxy mode below.
Mode B — Proxy (zero code changes, any language)
If your tool isn't on the wrap list, run Headroom as a local proxy and point your tool at it:
headroom proxy --port 8787
Anything that speaks the OpenAI-compatible API can be redirected to http://localhost:8787. The proxy applies the same compression pipeline used by headroom wrap.
Mode C — Library (inline in your code)
For programmatic use inside your own app, call compress(messages) directly. Python:
from headroom import compress compressed = compress(messages, model="gpt-4o")
TypeScript / Node — the JS SDK delegates compression to a local Headroom proxy, so start the proxy first, then point the client at it:
headroom proxy --port 8787
import { compress } from "headroom-ai";
const compressed = await compress(messages, {
model: "gpt-4o",
baseUrl: "http://localhost:8787",
});
The README's integration table also lists drop-in shims for the Anthropic / OpenAI SDKs, the Vercel AI SDK, LiteLLM callbacks, LangChain, Agno, Strands and ASGI apps.
Mode D — MCP server (for Claude Desktop and other MCP clients)
headroom mcp install
This exposes Headroom's tools — headroom_compress, headroom_retrieve, headroom_stats — to any MCP-native client.
Step 4GitHub Copilot CLI subscription mode
Headroom can route GitHub Copilot CLI subscription traffic (not just bring-your-own-key) through the local proxy:
headroom wrap copilot --subscription -- --model gpt-4o
The wrapper resolves the account-specific Copilot API endpoint and prints it as COPILOT_PROVIDER_API_URL=... during launch, then routes the OpenAI-compatible Copilot CLI requests through Headroom before forwarding to GitHub Copilot's hosted API.
secret-tool, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. On Docker or CI, prefer passing GITHUB_COPILOT_TOKEN or GITHUB_COPILOT_GITHUB_TOKEN explicitly rather than relying on host keychain access.
Step 5Verify token savings
Check Headroom is actually doing something useful for your workload:
headroom perf
This prints before/after token counts on representative agent traces. Headroom's published benchmarks (run with python -m headroom.evals suite --tier 1) show:
- 92% savings on code search and SRE incident debugging
- 73% savings on GitHub issue triage
- 47% savings on codebase exploration
- Accuracy preserved on GSM8K (0.870 vs 0.870), TruthfulQA (+0.030), SQuAD v2 and BFCL
Your numbers will vary with content type and the upstream LLM — these are the project's published averages, not a guarantee for every workload.
Handy CLI commands & extras
headroom wrap <agent>— one-shot wrap for Claude Code / Codex / Cursor / Aider / Copilot / OpenClawheadroom proxy --port 8787— start the local OpenAI-compatible proxyheadroom perf— measure compression ratio on a representative workloadheadroom mcp install— register the MCP server with MCP-native clientsheadroom learn— dry-run: mine failed sessions and print proposed corrections. Add--applyto actually write them toCLAUDE.md/AGENTS.md/GEMINI.mdpython -m headroom.evals suite --tier 1— reproduce the published benchmarks locally
claude and codex) lets them share a deduplicated memory store, so context one agent learned is available to the others. Originals are kept locally via Headroom's CCR (reversible compression) — the LLM calls headroom_retrieve when it needs the full value.
FAQ
How do I install Headroom?
Pick one: pip install "headroom-ai[all]" for Python (3.10+ required), npm install headroom-ai for TypeScript / Node (Node 18+), or pull and run the container with docker pull ghcr.io/chopratejas/headroom:latest && docker run -p 8787:8787 ghcr.io/chopratejas/headroom:latest.
Which agents does headroom wrap support?
Claude Code, Codex, Cursor, Aider, GitHub Copilot CLI and OpenClaw, per the README's compatibility matrix. Anything OpenAI-compatible also works via headroom proxy.
How does it work with GitHub Copilot?
headroom wrap copilot --subscription -- --model gpt-4o intercepts Copilot CLI's OpenAI-compatible requests and routes them through the local proxy before forwarding to GitHub's hosted Copilot API.
How much will I save?
Headroom's published benchmarks show 47–92% token reduction depending on workload, with accuracy preserved on GSM8K, TruthfulQA, SQuAD v2 and BFCL. Run headroom perf to measure on your own traces.
Is it local-only?
The compressor, proxy, MCP server and the CCR original-store all run on your machine. The compressed prompt still goes to whichever upstream LLM provider you configure, so the provider sees the compressed request just as it would any other.
Where are the full docs?
This guide covers install and the four integration modes. For architecture, CCR internals, the Kompress-base model card and provider-specific notes, see the official sources: github.com/chopratejas/headroom and headroom-docs.vercel.app/docs.
This guide is based on Headroom's public materials (GitHub: chopratejas/headroom and its README, plus headroom-docs.vercel.app). Commands, package names, ports, savings figures and the agent compatibility matrix follow the project's official documentation; Headroom is an active project, so defer to the latest official docs if anything differs. Apache-2.0 licensed. Written by NGJOO AI Lab, updated 2026-06-08.