Dark skies
Two essays came out recently that I keep returning to. Lars Faye's "Agentic Coding is a Trap" and Lorin Hochstein's "Life Comes at You Fast." Both are careful, honest, and pointing at something real. Both stop just short of naming what the thing actually is.
They diagnose darkness from opposite directions. Faye goes inward: delegation erodes the mental model you need to supervise what you delegated. Anthropic's own research found a 47% drop in debugging skills among developers who used AI assistance heavily. His prescription is aggressive: keep 20 to 100 percent of your coding manual. His argument is that you can't understand what you didn't write, and that understanding is what you need when the model fails you.
Hochstein goes outward: AI generates demand faster than the infrastructure can adapt. GitHub and Anthropic have both been saturated by their own growth. The engineers who need to rearchitect for scale are the same engineers fighting the fires caused by the scale they haven't rearchitectured for yet. The loop closes on itself. His argument isn't about skill erosion. It's about structural saturation that steals its own remedy.
Both are right. The darkness is real in both places. But they're arguing about how much to delegate, and that's not where the problem lives.
The wrong question
The question both essays are circling, without quite asking it, is: after the session ends, what do you actually have?
Faye's answer is: less skill than you started with, if you delegated the hard parts. Hochstein's answer is: a system under more pressure than before. These are accurate. But neither is asking about the session itself. What trace does the work leave? Not in the git history, not in the compiled output. In the process that produced them. What happened? Why did it take that path? Where did it go wrong before it corrected?
When a human engineer works through a problem, there's a residue. A scratchpad, a Slack thread, a comment someone will hate in six months that starts with // do not remove this. The session has a shape. When the same work happens inside a long-running Claude session, the shape is locked in a JSONL file, compressed after a few hours, unreadable without tooling. You can see what was committed. You can't see how it got there.
That's the darkness. Not skill erosion. Not saturation. Illegibility.
The closest prior art
Cursor published something relevant recently on customizing agents: a clean model distinguishing Rules (always-on markdown context, loaded into every session) from Skills (contextually loaded workflows, invoked by name). The model is genuinely useful. If you want the agent to always follow a naming convention, that's a Rule. If you want it to run a specific testing workflow when asked, that's a Skill. Both live in version-controlled files. Both are explicit and reviewable.
This is structure-in. It's the right direction. But it's still configuration declared before the session starts, not observation captured during it. You define what the agent knows. You get no receipt.
After the session ends, you have whatever was committed. The decision trail (why this approach and not that one, what the agent tried first, where it stalled, what it reconsidered) is in the JSONL, which nobody reads, and which gets compressed before most people think to look.
Dark skies
Astronomers use the term "dark sky" to mean a place with low enough light pollution that you can see what's actually up there. It's not the absence of light. It's the absence of the wrong light: the ambient noise that drowns out the signal you're trying to read.
Agentic work generates its own light pollution. Every session is a source of signal (what happened, what worked, what the model decided) surrounded by noise (full JSONL transcripts, tool call payloads, intermediate states). The signal is in there. But you have to already know what you're looking for to extract it, and by the time you know what you're looking for, you've usually already moved on.
What you want is a dark sky: low-noise, human-readable signal from the session, written as the session runs, not reconstructed afterward. Not a summary generated at the end (the model is compressing at the point of maximum uncertainty). Not the raw JSONL (unreadable at volume). Something written incrementally, in human time, answering the specific question a future reader needs to answer: what happened here, and why.
The receipt
The thing that's been missing from the conversation (missing from Faye, from Hochstein, from the Cursor model) is a per-session structured log written during the session.
Not a plan written before (that's a wishlist). Not a summary written after (that's reconstruction). A log written as the work unfolds, at the granularity of "here's the step, here's the outcome, here's what changed direction." A ----delimited YAML block per phase. Timestamp, phase label, what was attempted, what was found, what's next. A hundred words per block, written once, never re-read within the session itself.
That's the checkpoint log format that lives in the textforums thread pinned since last week. The session writes it. The session never reads it back. Token cost is negligible because write-only streams are cheap. The file lands next to the work, named for the session, human-shaped, answering without a grep.
This is what neither Faye's "code more manually" nor Hochstein's "rearchitect faster" addresses. The skill erosion Faye describes is partly an illegibility problem: if you could read back what the agent actually tried, you'd learn from it the way you learn from watching someone else debug. The saturation Hochstein describes generates incidents that have no readable history beyond what the model reported at the time, which is a description of the surface, not a map of the subsurface.
Not fighting agents
Paperworlds is a pet project where I'm trying to work through some of these questions in practice. Not a framework, not a prescription. A handful of small tools, each working on a specific problem. The checkpoint log is one of them, and honestly the one I'm least certain about. The idea is simple: write the receipt during, not after. Whether that's actually the right answer I don't know yet.
Faye and Hochstein are right that something is wrong. The frame I'd offer is narrower: not more delegation or less, not faster rearchitecture, but when the session ends, can you read what happened? Right now, mostly you can't. The trace is in a JSONL nobody opens, expiring on a timer, written for the model that compressed it, not the engineer who comes back a week later.
Make the session auditable, write the receipt during; right now I'm feeling like this is a missing primitive I can test.