Skip to main content

Documentation Index

Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Applies to:
  • Plan -
  • Deployment -

loadPrompt caching behavior

Summary

Goal: Understand how loadPrompt caching works, when the SDK falls back to the local cache. Features: Local prompt caching, network-first loading, environment/version-aware fallback behavior, configurable cache location and size.

How loadPrompt caching works

On every call, the TypeScript SDK:
  1. Calls the API over the network first (/v1/prompt for slug-based loads, /v1/prompt/{id} for id-based loads).
  2. On success, parses the response and writes it to a local cache.
  3. On failure, reads from the local cache as a fallback — but only when neither version nor environment was passed.
The cache is purely a resilience mechanism, not a request-deduplication mechanism. Successful calls always go to the network; the local cache is only read when the API call throws.
The cache is local to the machine running the SDK. It is not a server-side cache, and it is not shared between processes via Braintrust.

Cache layout

The cache has two layers:
LayerLifetimeNotes
Memory (LRU)Lives inside the SDK’s BraintrustState objectWiped when the process exits. Not shared between processes.
DiskSurvives process restarts as long as the filesystem persistsOne gzip-compressed JSON file per entry, named by a hash of the cache key.
Default disk location: ~/.braintrust/prompt_cache/ Configuration (environment variables):
  • BRAINTRUST_PROMPT_CACHE_DIR — directory for the disk cache. Default ~/.braintrust/prompt_cache.
  • BRAINTRUST_PROMPT_CACHE_MEMORY_MAX — maximum entries in the in-memory LRU. Default 1024.
  • BRAINTRUST_PROMPT_CACHE_DISK_MAX — maximum entries on disk before LRU eviction by mtime. Default 1048576.
The disk layer is automatically disabled in runtimes without a usable filesystem API (browsers, Cloudflare Workers, Vercel Edge, and similar sandboxes). In those environments, caching is memory-only and effectively scoped to a single request handler / cold-start lifetime.

Why pinned reads bypass the cache

When you pass version or environment, you are asserting “give me exactly this revision.” If the API is unreachable, the SDK has no way to verify that the cached entry still corresponds to that pin (the server-side mapping for an environment may have moved, etc.), so it fails loudly rather than silently substituting a possibly-different value.