Concepts
Core ideas behind CLAIV Memory: event types, the enrichment pipeline, token budgets, and multi-tenant isolation.
How CLAIV Memory works
CLAIV Memory is a context engine, not a chatbot. You ingest events, and when your LLM needs context about a user, you call recall. CLAIV returns structured context (system_context + memory_blocks) that you inject into your LLM's prompt. Your LLM generates the final reply.
Data flow
Ingest
Your app sends events (messages, tool calls, app events) to /v1/ingest. Returns event_id and deduped.
Enrich (async)
A background worker extracts facts, episodes, and embeddings. This takes 1–5 seconds. Recall may return empty until enrichment completes.
Recall
Your app asks for context. CLAIV returns system_context, memory_blocks, citations, and token_estimate, all within your token budget.
Generate
You inject the returned context into your LLM's system prompt. Your LLM generates the final response to the user.
Event types
The type field in an ingest request must be one of three values:
message
A user or assistant message in a conversation. The most common event type.
Examples: chat messages, Q&A exchanges, instructions from the user.
{
"user_id": "user-123",
"type": "message",
"content": "I prefer dark mode and short responses"
}tool_call
A tool or function call made by the AI assistant. Captures what tools were invoked and their results.
Examples: API calls, database queries, function invocations.
{
"user_id": "user-123",
"type": "tool_call",
"content": "Called weather_api for San Francisco, returned 72F sunny"
}app_event
An application-level event: navigation, purchases, feature usage, or any custom event your app emits.
Examples: page views, button clicks, purchases, settings changes.
{
"user_id": "user-123",
"type": "app_event",
"content": "User upgraded to Pro plan"
}Async enrichment
After you call /v1/ingest, the API responds immediately with event_id. A background worker then enriches the event:
- Extracts facts and episodes from the content
- Generates embeddings for semantic search
- Updates the user's memory graph
- Deduplicates against existing data
/v1/recall immediately after ingest, the newly ingested event may not appear yet. In most production flows this is not a problem, as recall happens on the next user interaction.Token budgets
Every recall request includes a token_budget (range: 200–8000). CLAIV selects and ranks the most relevant memories, then trims and packs them to fit within your budget.
The response includes token_estimate which tells you approximately how many tokens the returned context will consume. This helps you manage your LLM's context window.
How budgets affect recall
Lower budgets (200–500): Only the most critical facts. Best for constrained models.
Medium budgets (500–2000): Good balance of facts and recent episodes.
Higher budgets (2000–8000): Rich context with detailed memory blocks. Best for complex tasks.
Tenant isolation
Each project in your console maps to a unique tenant_id. All data is strictly isolated per tenant:
- API keys are scoped to a single tenant
- Events, memories, and embeddings are tenant-isolated
- Recall only returns data for the tenant associated with the API key
- Forget only deletes data within the tenant scope
Forget and compliance
The /v1/forget endpoint deletes derived data (extracted facts, episodes, embeddings, summaries) for a given user_id. The original ingested events are preserved for audit purposes.
Every forget call returns a receipt_id and deleted_counts, providing cryptographic proof of deletion for GDPR/CCPA compliance.