Prompt Caching in OpenClaw: Why Your Long Sessions Should Use This
Prompt caching is one of those features that sounds boring but actually saves you real money. We’re talking 90% reduction on token costs for long sessions. Here’s what it is, how to use it, and why you should care.
What Even Is Prompt Caching?
Your model has a context window—say, 128K tokens. In a long session, most of that context stays the same. Your instructions, system prompt, conversation history. Only the latest message changes.
Prompt caching lets the model cache that static prefix and reuse it. Instead of charging you full price for all 100K tokens again, it charges ~10% for the cached ones. Multiply that across dozens of requests and you see real savings.
How OpenClaw Handles It (You Don’t Have To Do Much)
OpenClaw manages caching automatically. Just configure it once:
{
"providers": {
"anthropic": {
"cacheRetention": {
"type": "ephemeral",
"ttlMinutes": 5
}
}
}
}
That’s it. The gateway handles the rest. Cache gets created, reused, tracked—you don’t think about it.
Two Cache Strategies (Pick One)
Ephemeral (Default – Good for Most People)
Cache lives for 5 minutes. Perfect for conversations where people are actively talking to your bot.
"cacheRetention": {
"type": "ephemeral",
"ttlMinutes": 5
}
Persistent (Good for Batch Work)
Cache stays around until you explicitly clear it. Use this when you’re processing lots of similar requests and want maximum reuse.
"cacheRetention": {
"type": "persistent"
}
If you’re running a batch job processing 1000 documents with the same instructions, persistent caching is your friend.
Different Agents, Different Cache Strategies
New in 2026.2.23: you can set cache behavior per agent. Your chatbot might want ephemeral (fast, interactive). Your background research agent might want persistent (deep dives, repeated context).
{
"agents": {
"chatbot": {
"model": "anthropic/claude-opus-4.6",
"params": {
"cacheRetention": {
"type": "ephemeral",
"ttlMinutes": 10
}
}
},
"researcher": {
"model": "anthropic/claude-opus-4.6",
"params": {
"cacheRetention": {
"type": "persistent"
}
}
}
}
}
Provider Support (Not Everywhere Yet)
Anthropic: Full support. Works great.
AWS Bedrock: Works for Claude models only. Nova and Mistral don’t support it.
Moonshot/Kimi: Supported. Especially good for long video analysis sessions.
OpenRouter: Depends on the upstream provider. If the model is proxied through a caching provider, it works.
Groq: Doesn’t support caching. No problem—it’s already fast.
Measuring If It’s Actually Working
Run this to see cache stats:
openclaw status --model anthropic/claude-opus-4.6
You’ll see:
- Total tokens used
- How many tokens were cached vs fresh
- Cache hit ratio (aim for >50% on long sessions)
- Cost breakdown
If your cache hit ratio is near 0%, either your context is changing too fast or your sessions are too short. That’s fine—caching helps more with longer interactions.
Real-World Numbers
Say you have a 50K context window and you run 10 turns with the bot. Without caching:
- 50K input tokens × 10 turns = 500K tokens
- Cost: ~$0.01 (Anthropic pricing)
With caching:
- 50K input first turn = $0.01
- 5K cached reads × 9 turns = $0.0045
- Total cost: ~$0.015
That’s not life-changing for 10 turns. But scale it to 100 turns with a 100K context window and now you’re looking at real savings.
One Thing To Watch (Bootstrap Files)
OpenClaw caches your SOUL.md, AGENTS.md, and MEMORY.md files. If you update them mid-session, existing sessions keep using the old cached version. New sessions get the fresh version.
If you want to force a refresh:
openclaw sessions reset --session my-session
This clears the bootstrap cache and picks up your updated files on the next message.
Best Practices (Actually Useful Stuff)
- Enable it by default. There’s no real downside. Minimal overhead, real savings.
- Use ephemeral for interactive bots. 5-30 minute TTL is solid for user-facing stuff.
- Use persistent for batch work. Processing 1000 docs? Cache everything and watch the costs drop.
- Monitor your hit ratio. If it’s below 10%, your context is changing too frequently. Rethink your strategy.
- Test with caching off occasionally. Make sure caching isn’t hiding bugs or weird behavior.
Troubleshooting (Common Issues)
My cache hit ratio is 0%. What’s wrong?
Either the provider doesn’t support caching, cache is disabled in config, or your sessions are too short. Check openclaw config get | grep cache.
I changed my SOUL.md but the bot still uses the old version.
That’s the bootstrap file cache. Run openclaw sessions reset to clear it for that session.
Bedrock isn’t caching (but Anthropic is).
Make sure you’re using Claude models on Bedrock. Other models don’t support caching.
What’s Next?
- Read about session pruning for extremely long contexts
- Check model failover to combine caching with fallback chains
- Join the Discord to swap caching war stories with other folks running long-context agents