What crosses the wire
The megathread
In late March 2026 a Reddit thread started collecting reports of severe degradation in Claude usage limits. Not a one-off. The thread grew. Users on every tier, including the $200/month Max plan at 20x usage, documented the same pattern: limits tightening during peak hours, requests throttled in ways the official documentation didn't describe, behavior that changed depending on when you sent the request and how much you'd already sent.
The frustrating part wasn't the limits themselves. It was the opacity. You hit something, but you couldn't see what. Claude Code kept working until it didn't. No visible meter. No warning. Just a hard stop, or a cryptic degradation that left you wondering whether your prompt was the problem or your quota was.
That's the problem textproxy was built to address. Not the limits, which Anthropic sets and we can't change. But the missing visibility layer between your work and the wire.
Why a proxy
The proxy is the right seat for everything that wants to observe or shape the API conversation. Claude Code sends HTTPS requests to api.anthropic.com. Every token count, every model selection, every response header crosses the wire. If you sit in that position, you see all of it.
textproxy is a lightweight local MITM between Claude Code and the API. It listens on localhost:7474, terminates TLS, inspects each SSE response for token usage data, and re-encrypts before forwarding. Claude Code doesn't notice. You get a live counter.
brew install paperworlds/tap/textproxy
# Start the daemon
textproxy start
# Use the proxied alias
claude-ctx # fish alias: HTTPS_PROXY=http://localhost:7474 + CA cert
# Check what happened
textproxy stats
The CA cert is installed once during setup (textproxy ca install). Everything after that is transparent.
Observe
The first thing you get is the session view: how many tokens this session has consumed, broken down by input, output, and cache. textproxy stats prints the current snapshot. The statusline integration writes a compact version to a state file, so your shell prompt or tmux status bar can show it live.
$ textproxy stats
session 2h 14m · 23 requests
input 184,320 tok
output 41,880 tok
cache read 92,100 tok
─────────────────────────
total 318,300 tok 31.8% of 1M cap
The history is a ~/.cache/textproxy/history.jsonl file, one JSON object per request. Each entry records the model, token counts, profile, timestamp, and a short request fingerprint. That file is the raw material for everything downstream.
Analyze
textproxy analyze reads the history and buckets it by hour, model, day, and profile. The main use is seeing your own peak-hour pattern. Do you send more requests in the afternoon? Does the model distribution shift when you're under deadline? After a week of real data, the picture gets specific.
$ textproxy analyze --by hour
hour requests input output cache
00 12 84k 18k 42k
09 38 271k 58k 134k
10 47 334k 71k 167k
11 61 435k 93k 217k
...
textproxy bench does the other half: controlled time-of-day request firing on a fixed prompt, so you can measure actual latency and token delivery by hour rather than inferring from your real workload. If you suspected peak-hour degradation before, bench gives you numbers to compare against.
Survive
Observing the problem doesn't solve it. The third mode is the one that's harder to talk about, because it's designed and implemented but not yet on by default.
The limp-home routing layer sits inside the proxy and rewrites outbound requests based on how close you are to your rolling cap. The idea is a degradation curve: rather than hitting a hard limit and stopping, you get a softer landing. The model steps down. Requests slow down. At the extreme edge, the proxy refuses rather than letting Anthropic's 429 catch you mid-task.
The default curve looks like this:
| Consumption | Action |
|---|---|
| 80% | Rewrite model to claude-haiku-4 |
| 90% | Rewrite model to claude-haiku-4 + 10s sleep |
| 95% | Refuse with 429 |
The signal is a percent-of-cap reading. By default, textproxy derives it from its own rolling window: it tracks how many tokens you've sent over the last five hours and divides by your configured cap. You tell it your cap once; it handles the math from there.
# ~/.config/textproxy/config.json
{
"routing": {
"enabled": true,
"dry_run": false,
"signal_source": "internal",
"token_cap_5h": 1000000
}
}
Pro subscribers are around 1M tokens per five-hour window. Max subscribers are higher, around 5M. The right number depends on your tier and isn't exposed via the API, so you set it once based on what you know about your plan.
The dry_run: true default is deliberate. When dry_run is on, the routing decisions are logged but the request body is sent unchanged. You can watch what the system would have done without committing to it. To activate for real, set both enabled: true and dry_run: false.
You can also test the curve manually before running it live:
# Simulate being at 87.5% of cap
textproxy consumption set 87.5
# Watch what would happen
textproxy stats # shows routing decision alongside consumption
The routing is pure: no goroutines, no file watchers. It reads the signal on each request, evaluates the curve, and applies the decision. No background state to reason about.
What's not done yet
The internal signal source works for the common case, but it has a limitation. textproxy only sees requests that cross its own proxy. If you also use Claude.ai in a browser, or have other tools hitting the API, their consumption is invisible to the rolling window. The internal reading will undercount.
The fix is the file signal source: an external process reads your actual account consumption (via ccusage or similar) and writes {"percent": 87.5} to ~/.cache/textproxy/consumption.json. The routing layer will use that instead of its own estimate. The wiring is there; the external poller isn't written yet.
# Future: a cron or launchd job will write this automatically.
# For now you can write it manually to test.
textproxy consumption set 87.5 # set
textproxy consumption # read current value
textproxy consumption clear # reset
The other open item is response caching: identical prompts returning cached answers, gated on request flags (no caching when temperature is non-zero or tool_choice is non-deterministic). Agentic loops that re-ask the same question stop paying twice. That's the most user-value-aligned thing on the roadmap and the most complex to get right with streaming responses.
The bigger picture
The proxy is the natural seat for everything that wants to observe or shape the API conversation. textproxy starts with token observability because that's the most immediate pain. The roadmap is broader: caching, per-session budgets, request rewriting, eventually cross-vendor adapter translation once an agent protocol stabilises. If it crosses the wire, it can live here. The MITM cert is the leverage.
Pairs with textsessions: the TUI reads ~/.cache/textproxy/session.json directly and refreshes every five seconds. While you're triaging sessions, the detail panel shows the current session's token consumption without leaving the interface. No separate terminal, no textproxy stats call.
Pairs with textaccounts: each profile in textaccounts produces a separate token consumption track in textproxy's history. You can see exactly how many tokens your work profile versus your personal profile have consumed in the same five-hour window, and configure different caps per profile if the plans differ.