Prompt Caching in OpenClaw: Save Money and Speed Things Up

Prompt Caching in OpenClaw: Why Your Long Sessions Should Use This

Prompt caching is one of those features that sounds boring but actually saves you real money. We’re talking 90% reduction on token costs for long sessions. Here’s what it is, how to use it, and why you should care.

What Even Is Prompt Caching?

Your model has a context window—say, 128K tokens. In a long session, most of that context stays the same. Your instructions, system prompt, conversation history. Only the latest message changes.

Prompt caching lets the model cache that static prefix and reuse it. Instead of charging you full price for all 100K tokens again, it charges ~10% for the cached ones. Multiply that across dozens of requests and you see real savings.

How OpenClaw Handles It (You Don’t Have To Do Much)

OpenClaw manages caching automatically. Just configure it once:

{
  "providers": {
    "anthropic": {
      "cacheRetention": {
        "type": "ephemeral",
        "ttlMinutes": 5
      }
    }
  }
}

That’s it. The gateway handles the rest. Cache gets created, reused, tracked—you don’t think about it.

Two Cache Strategies (Pick One)

Ephemeral (Default – Good for Most People)

Cache lives for 5 minutes. Perfect for conversations where people are actively talking to your bot.

"cacheRetention": {
  "type": "ephemeral",
  "ttlMinutes": 5
}

Persistent (Good for Batch Work)

Cache stays around until you explicitly clear it. Use this when you’re processing lots of similar requests and want maximum reuse.

"cacheRetention": {
  "type": "persistent"
}

If you’re running a batch job processing 1000 documents with the same instructions, persistent caching is your friend.

Different Agents, Different Cache Strategies

New in 2026.2.23: you can set cache behavior per agent. Your chatbot might want ephemeral (fast, interactive). Your background research agent might want persistent (deep dives, repeated context).

{
  "agents": {
    "chatbot": {
      "model": "anthropic/claude-opus-4.6",
      "params": {
        "cacheRetention": {
          "type": "ephemeral",
          "ttlMinutes": 10
        }
      }
    },
    "researcher": {
      "model": "anthropic/claude-opus-4.6",
      "params": {
        "cacheRetention": {
          "type": "persistent"
        }
      }
    }
  }
}

Provider Support (Not Everywhere Yet)

Anthropic: Full support. Works great.

AWS Bedrock: Works for Claude models only. Nova and Mistral don’t support it.

Moonshot/Kimi: Supported. Especially good for long video analysis sessions.

OpenRouter: Depends on the upstream provider. If the model is proxied through a caching provider, it works.

Groq: Doesn’t support caching. No problem—it’s already fast.

Measuring If It’s Actually Working

Run this to see cache stats:

openclaw status --model anthropic/claude-opus-4.6

You’ll see:

Total tokens used
How many tokens were cached vs fresh
Cache hit ratio (aim for >50% on long sessions)
Cost breakdown

If your cache hit ratio is near 0%, either your context is changing too fast or your sessions are too short. That’s fine—caching helps more with longer interactions.

Real-World Numbers

Say you have a 50K context window and you run 10 turns with the bot. Without caching:

50K input tokens × 10 turns = 500K tokens
Cost: ~$0.01 (Anthropic pricing)

With caching:

50K input first turn = $0.01
5K cached reads × 9 turns = $0.0045
Total cost: ~$0.015

That’s not life-changing for 10 turns. But scale it to 100 turns with a 100K context window and now you’re looking at real savings.

One Thing To Watch (Bootstrap Files)

OpenClaw caches your SOUL.md, AGENTS.md, and MEMORY.md files. If you update them mid-session, existing sessions keep using the old cached version. New sessions get the fresh version.

If you want to force a refresh:

openclaw sessions reset --session my-session

This clears the bootstrap cache and picks up your updated files on the next message.

Best Practices (Actually Useful Stuff)

Enable it by default. There’s no real downside. Minimal overhead, real savings.
Use ephemeral for interactive bots. 5-30 minute TTL is solid for user-facing stuff.
Use persistent for batch work. Processing 1000 docs? Cache everything and watch the costs drop.
Monitor your hit ratio. If it’s below 10%, your context is changing too frequently. Rethink your strategy.
Test with caching off occasionally. Make sure caching isn’t hiding bugs or weird behavior.

Troubleshooting (Common Issues)

My cache hit ratio is 0%. What’s wrong?

Either the provider doesn’t support caching, cache is disabled in config, or your sessions are too short. Check openclaw config get | grep cache.

I changed my SOUL.md but the bot still uses the old version.

That’s the bootstrap file cache. Run openclaw sessions reset to clear it for that session.

Bedrock isn’t caching (but Anthropic is).

Make sure you’re using Claude models on Bedrock. Other models don’t support caching.

What’s Next?

Read about session pruning for extremely long contexts
Check model failover to combine caching with fallback chains
Join the Discord to swap caching war stories with other folks running long-context agents

OpenClaw Skill: Humanizer — Make AI-generated text sound

OpenClaw Skill: Obsidian — Obsidian vault automation

OpenClaw Skill: API Gateway — Universal API gateway

OpenClaw Skill: Nano Pdf — Natural language PDF

OpenClaw Skill: Nano Banana Pro — High-quality image

Prompt Caching in OpenClaw: Why Your Long Sessions Should Use This

What Even Is Prompt Caching?

How OpenClaw Handles It (You Don’t Have To Do Much)

Two Cache Strategies (Pick One)

Different Agents, Different Cache Strategies

Provider Support (Not Everywhere Yet)

Measuring If It’s Actually Working

Real-World Numbers

One Thing To Watch (Bootstrap Files)

Best Practices (Actually Useful Stuff)

Troubleshooting (Common Issues)

What’s Next?

Getting Moonshot Working in OpenClaw: The Practical Guide

Slack Threading in OpenClaw: Actually Working Now

Leave a comment Cancel reply

OpenClaw Skill: Humanizer — Make AI-generated text sound

OpenClaw Skill: Obsidian — Obsidian vault automation

OpenClaw Skill: API Gateway — Universal API gateway

Categories

Most Viewed Posts

OpenClaw 2026.2.23 Just Dropped: Here’s What You Need to Know

Prompt Caching in OpenClaw: Save Money and Speed Things Up

Archives

Categories

Trending

OpenClaw Skill: Humanizer — Make AI-generated text sound

OpenClaw Skill: Obsidian — Obsidian vault automation

OpenClaw Skill: API Gateway — Universal API gateway

OpenClaw Skill: Nano Pdf — Natural language PDF

OpenClaw Skill: Nano Banana Pro — High-quality image

Gallery

Search

Recommend for you

OpenClaw Skill: Humanizer — Make AI-generated text sound

OpenClaw Skill: Obsidian — Obsidian vault automation

OpenClaw Skill: API Gateway — Universal API gateway

OpenClaw Skill: Nano Pdf — Natural language PDF

Prompt Caching in OpenClaw: Save Money and Speed Things Up

Prompt Caching in OpenClaw: Why Your Long Sessions Should Use This

What Even Is Prompt Caching?

How OpenClaw Handles It (You Don’t Have To Do Much)

Two Cache Strategies (Pick One)

Different Agents, Different Cache Strategies

Provider Support (Not Everywhere Yet)

Measuring If It’s Actually Working

Real-World Numbers

One Thing To Watch (Bootstrap Files)

Best Practices (Actually Useful Stuff)

Troubleshooting (Common Issues)

What’s Next?

Getting Moonshot Working in OpenClaw: The Practical Guide

Slack Threading in OpenClaw: Actually Working Now

whaledesk

Leave a comment Cancel reply

OpenClaw 2026.2.23 Just Dropped: Here’s What You Need to Know

Prompt Caching in OpenClaw: Save Money and Speed Things Up