Prompt caching is everything

Thariq Shihpar, an engineer at Anthropic, breaks down how Prompt caching is everything in building LLM apps such as Claude Code.

Prompt caching works by prefix matching — the API caches everything from the start of the request up to each cache_control breakpoint. This means the order you put things in matters enormously, you want as many of your requests to share a prefix as possible. The best way to do this is static content first, dynamic content last.

Another mandatory reading for engineers working with tools like Claude Code or Codex, but also if you are building conversational LLM systems.

One interesting tip here is that switching models mid conversation is really inefficient (and can cost you more money) because caching is done per model!

— via Simon Willison