What crosses the wire

2026-05-04 · 9 min read textproxy

The megathread

In late March 2026 a Reddit thread started collecting reports of severe degradation in Claude usage limits. Not a one-off. The thread grew. Users on every tier, including the $200/month Max plan at 20x usage, documented the same pattern: limits tightening during peak hours, requests throttled in ways the official documentation didn't describe, behavior that changed depending on when you sent the request and how much you'd already sent.

The frustrating part wasn't the limits themselves. It was the opacity. You hit something, but you couldn't see what. Claude Code kept working until it didn't. No visible meter. No warning. Just a hard stop, or a cryptic degradation that left you wondering whether your prompt was the problem or your quota was.

That's the problem textproxy was built to address. Not the limits, which Anthropic sets and we can't change. But the missing visibility layer between your work and the wire.

Why a proxy

The proxy is the right seat for everything that wants to observe or shape the API conversation. Claude Code sends HTTPS requests to api.anthropic.com. Every token count, every model selection, every response header crosses the wire. If you sit in that position, you see all of it.

textproxy is a lightweight local MITM between Claude Code and the API. It listens on localhost:7474, terminates TLS, inspects each SSE response for token usage data, and re-encrypts before forwarding. Claude Code doesn't notice. You get a live counter.

brew install paperworlds/tap/textproxy

# Start the daemon
textproxy start

# Use the proxied alias
claude-ctx   # fish alias: HTTPS_PROXY=http://localhost:7474 + CA cert

# Check what happened
textproxy stats

The CA cert is installed once during setup (textproxy ca install). Everything after that is transparent.

Observe

The first thing you get is the session view: how many tokens this session has consumed, broken down by input, output, and cache. textproxy stats prints the current snapshot. The statusline integration writes a compact version to a state file, so your shell prompt or tmux status bar can show it live.

$ textproxy stats
session     2h 14m · 23 requests
input       184,320 tok
output       41,880 tok
cache read   92,100 tok
─────────────────────────
total       318,300 tok   31.8% of 1M cap

The history is a ~/.cache/textproxy/history.jsonl file, one JSON object per request. Each entry records the model, token counts, profile, timestamp, and a short request fingerprint. That file is the raw material for everything downstream.

Analyze

textproxy analyze reads the history and buckets it by hour, model, day, and profile. The main use is seeing your own peak-hour pattern. Do you send more requests in the afternoon? Does the model distribution shift when you're under deadline? After a week of real data, the picture gets specific.

$ textproxy analyze --by hour
hour  requests  input     output    cache
00    12        84k       18k       42k
09    38        271k      58k       134k
10    47        334k      71k       167k
11    61        435k      93k       217k
...

textproxy bench does the other half: controlled time-of-day request firing on a fixed prompt, so you can measure actual latency and token delivery by hour rather than inferring from your real workload. If you suspected peak-hour degradation before, bench gives you numbers to compare against.

Survive

Observing the problem doesn't solve it. The third mode is the one that's harder to talk about, because it's designed and implemented but not yet on by default.

The limp-home routing layer sits inside the proxy and rewrites outbound requests based on how close you are to your rolling cap. The idea is a degradation curve: rather than hitting a hard limit and stopping, you get a softer landing. The model steps down. Requests slow down. At the extreme edge, the proxy refuses rather than letting Anthropic's 429 catch you mid-task.

The default curve looks like this:

Consumption	Action
80%	Rewrite model to `claude-haiku-4`
90%	Rewrite model to `claude-haiku-4` + 10s sleep
95%	Refuse with 429

The signal is a percent-of-cap reading. By default, textproxy derives it from its own rolling window: it tracks how many tokens you've sent over the last five hours and divides by your configured cap. You tell it your cap once; it handles the math from there.

# ~/.config/textproxy/config.json
{
  "routing": {
    "enabled": true,
    "dry_run": false,
    "signal_source": "internal",
    "token_cap_5h": 1000000
  }
}

Pro subscribers are around 1M tokens per five-hour window. Max subscribers are higher, around 5M. The right number depends on your tier and isn't exposed via the API, so you set it once based on what you know about your plan.

The dry_run: true default is deliberate. When dry_run is on, the routing decisions are logged but the request body is sent unchanged. You can watch what the system would have done without committing to it. To activate for real, set both enabled: true and dry_run: false.

You can also test the curve manually before running it live:

# Simulate being at 87.5% of cap
textproxy consumption set 87.5

# Watch what would happen
textproxy stats   # shows routing decision alongside consumption

The routing is pure: no goroutines, no file watchers. It reads the signal on each request, evaluates the curve, and applies the decision. No background state to reason about.

What's not done yet

The internal signal source works for the common case, but it has a limitation. textproxy only sees requests that cross its own proxy. If you also use Claude.ai in a browser, or have other tools hitting the API, their consumption is invisible to the rolling window. The internal reading will undercount.

The fix is the file signal source: an external process reads your actual account consumption (via ccusage or similar) and writes {"percent": 87.5} to ~/.cache/textproxy/consumption.json. The routing layer will use that instead of its own estimate. The wiring is there; the external poller isn't written yet.

# Future: a cron or launchd job will write this automatically.
# For now you can write it manually to test.
textproxy consumption set 87.5   # set
textproxy consumption            # read current value
textproxy consumption clear      # reset

The other open item is response caching: identical prompts returning cached answers, gated on request flags (no caching when temperature is non-zero or tool_choice is non-deterministic). Agentic loops that re-ask the same question stop paying twice. That's the most user-value-aligned thing on the roadmap and the most complex to get right with streaming responses.

The bigger picture

The proxy is the natural seat for everything that wants to observe or shape the API conversation. textproxy starts with token observability because that's the most immediate pain. The roadmap is broader: caching, per-session budgets, request rewriting, eventually cross-vendor adapter translation once an agent protocol stabilises. If it crosses the wire, it can live here. The MITM cert is the leverage.

Pairs with textsessions: the TUI reads ~/.cache/textproxy/session.json directly and refreshes every five seconds. While you're triaging sessions, the detail panel shows the current session's token consumption without leaving the interface. No separate terminal, no textproxy stats call.

Pairs with textaccounts: each profile in textaccounts produces a separate token consumption track in textproxy's history. You can see exactly how many tokens your work profile versus your personal profile have consumed in the same five-hour window, and configure different caps per profile if the plans differ.

Repo: github.com/paperworlds/textproxy