Prompt caching with Claude
๐ Abstract
The article discusses the availability of prompt caching on the Anthropic API, which allows developers to cache frequently used context between API calls. This feature can reduce costs by up to 90% and latency by up to 85% for long prompts, and is currently available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.
๐ Q&A
[01] When to use prompt caching
1. What are some situations where prompt caching can be effective?
- Conversational agents: Reduce cost and latency for extended conversations, especially those with long instructions or uploaded documents.
- Coding assistants: Improve autocomplete and codebase Q&A by keeping a summarized version of the codebase in the prompt.
- Large document processing: Incorporate complete long-form material including images in your prompt without increasing response latency.
- Detailed instruction sets: Share extensive lists of instructions, procedures, and examples to fine-tune Claude's responses.
- Agentic search and tool use: Enhance performance for scenarios involving multiple rounds of tool calls and iterative changes.
- Talk to books, papers, documentation, podcast transcripts, and other long-form content: Bring any knowledge base alive by embedding the entire document(s) into the prompt.
2. What kind of improvements have early customers seen with prompt caching?
- Substantial speed and cost improvements for a variety of use cases, such as including a full knowledge base, 100-shot examples, or each turn of a conversation in their prompt.
[02] How we price cached prompts
1. How are cached prompts priced?
- Writing to the cache costs 25% more than the base input token price for the given model.
- Using cached content costs only 10% of the base input token price.
2. What are the pricing details for the different Claude models?
- Claude 3.5 Sonnet:
- Input
- Prompt caching
- Output
- Claude 3 Opus:
- Input
- Prompt caching (Coming soon)
- Output
- Claude 3 Haiku:
- Input
- Prompt caching
- Output
[03] Customer spotlight: Notion
1. How is Notion using prompt caching? Notion is adding prompt caching to Claude-powered features for its AI assistant, Notion AI. This allows them to optimize internal operations and create a more elevated and responsive user experience for their customers.
2. What are the benefits Notion is seeing from using prompt caching? Reduced costs and increased speed, while maintaining state-of-the-art quality.