The 'developer' skill, for Claude -- tony.site

The 'developer' skill, for Claude

2026-0113 Auckland, New Zealand.

The claude-developer-skill is my attempt at pushing progressive disclosure as far as I possibly can. TL;DR: I present a Claude Code skill that enables Claude to record and index my tastes, and recall these in relevant conversations automatically.

LLMs demonstrate an ability to regurgitate content from their training data, and sometimes this content is actually accurate. But, like a human half-remembering something they read a long time ago, there's sometimes errors. Actually, there's often errors, but giving them a chance to refresh their recollection with a quick read of some reference material means they can self-correct, and output improves. For an LLM, this means the reference material needs to be contained within the context window.

Background: WTF is LLM?

An LLM is just a stateless function that takes the entire context window as input and produces (based on it's model of human language), what it estimates the next token to be. When you submit a message through chatgpt.com or claude.ai, the browser is sending the entire conversation to the backend servers that operate the model. The inference program does an incomprehensible amount of mathematics and produces a single token, the first fragment of a word that the model predicts that the assistant would respond with. The backend takes that single output token, concatenates it to the end of the conversation, and invokes the inference program again. The next word fragment is obtained. The backend continues in this loop until the model eventually emits a sentinel "stop" token; the model is predicting that the context window ends with the end of a message from the assistant. The backend sends all of the generated tokens back to your browser and the new message is displayed in the UI for you to read.

Waiting for all of this to happen would be a rather poor user experience. You'd see nothing for several seconds (several minutes for longer responses), then suddenly a massive wall of text. Instead, most of the providers use a fancy thing called Server-Sent Events wherein the backend sends back each token as soon as it's created, and the browser updates the assistant message with the new tokens in real-time. Let's not get into SSE right now; we're already nearly wandering off-topic.

The key limiting factor with growing context size is VRAM. This is inherent to the design of the transformer architecture. In fact, for the naiive transformer implementations the memory requirement is proportional to the square of the number of tokens in the context! As I understand it the frontier labs are doing some crazy stuff to make it less than that in practise, though the relationship remains superlinear.

And even if you could, one of my hypotheses is that the current ~200k limits are... good, actually.

Think about if you had to sit two tests. Both the same length of time, both the same level of difficulty, and in topics you know equally well. For the first test, you're sat next to all 32 volumes (32,640 pages) of the last printed version of Encyclopaedia Britannica and are free to refer to it throughout the test. For the second test, you're given a single sheet of paper with the answers to all of the questions written on it.

Which test do you think you're going to perform better on?

For AI agents tasked with writing programs, it presents a conundrum. You need it to produce correct output in as many situations as possible; this incentivises you to fill the context window with as much potentially-useful stuff as possible. But you're fundamentally limited by the size of the context window, the maximum number of tokens that the LLM providers will infer from in a single pass. Anyone who's used Claude in anger will also tell you that Claude sometimes forgets instructions from earlier in the context window. There's therefore an opposing incentive for the context window to be as empty as possible. Oh, whatever is a token-slinger to do?

The content of the context window needs to be relevant to the task at hand. You want to put these microintelligences in the "test where you can see the answers" situation more than the "test where you can read an encyclopaedia" situation. This is the core thesis of Skills: content that is dynamically and selectively included into the context "just-in-time".

WTF are Skills?

Skills are a disclosure in three parts:

The name of the skill; used to identify it,
The content of the skill; "the answers to the test", and
The description of the skill; the instructions that tell the LLM when it should read the content.

Here's an example of what a Skill looks like:

---
name: go-programming
description: Rules for writing programs in Go. Claude MUST read this document before writing any Go code or planning such work.
---

- Prefer mixed-case names like `XmlHttpRequest` instead of `XMLHTTPRequest`.
- Make sure to run `go fmt && golangci-lint && go test && go build ./... && make integration-test` after every code change and fix any issues.
- Always wrap errors with `fmt.Errorf("unable to $verb for $noun: %w", err)` at every error site. `return err` without wrapping is forbidden in all code.
- Some additional example rule to pad out the content to prove the point.
- Yet more example rules to add more content.

Skills live on disk, usually at ~/.claude/skills/$skillName/SKILL.md.

When Claude Code starts up, one of the many things that it does is read all of the available skills off disk and put a special "system" message into the beginning of the conversation. This is the "system prompt" you hear people talk about. Among much other context, the system prompt tells Claude about the available skills and when it should use them by extracting the name and description from the YAML front-matter in each of the skill documents:

The assistant is Claude.
You are running in Claude Code, an agent application...

...a *lot* of other stuff omitted for clarity...

Claude should use the `Skill` tool to invoke skills according to the instructions associated with each skill.

Here are the available skills:

1. go-programming: Rules for writing programs in Go. Claude MUST read this document before writing any Go code or planning such work.
2. some-other-skill: The associated instructions that tell Claude when to use the skill.
3. more-skills-to-pad-content: And the associated trigger-phrases that signal to Claude to invoke the skill.

Claude then only reads the documents that are relevant to the conversation.

Skill theory

Great! Let's go creating as many Skills as possible

you don't want to have to remember how all these skill stuff works. you want to be working with claude and ust sya "ypdate ry instructions" and have things... 'just work' have claude update the instructions _for you_ this requires _maintenance instructions_

so you have skill doc: name + instructions on when to use the skill + skill instructions on how to achieve some outcome. you unclude inthe skill a little note like: when making updates to these instructions you MUST read ./MAINTENANCE.md maintenance.md contains additional instructions about how to update the instructions, i.e. how the information is structured.

You're early! This post is still under construction. Check back again later; things should hopefully be more coherent, then.