Concepts

Core ideas behind CLAIV Memory: event types, the enrichment pipeline, token budgets, and multi-tenant isolation.

How CLAIV Memory works

CLAIV Memory is a context engine, not a chatbot. You ingest events, and when your LLM needs context about a user, you call recall. CLAIV returns structured context (system_context + memory_blocks) that you inject into your LLM's prompt. Your LLM generates the final reply.

Data flow

Ingest

Your app sends events (messages, tool calls, app events) to /v1/ingest. Returns event_id and deduped.

Enrich (async)

A background worker extracts facts, episodes, and embeddings. This takes 1–5 seconds. Recall may return empty until enrichment completes.

Recall

Your app asks for context. CLAIV returns system_context, memory_blocks, citations, and token_estimate, all within your token budget.

Generate

You inject the returned context into your LLM's system prompt. Your LLM generates the final response to the user.

Event types

The type field in an ingest request must be one of three values:

message

A user or assistant message in a conversation. The most common event type.

Examples: chat messages, Q&A exchanges, instructions from the user.

{
  "user_id": "user-123",
  "type": "message",
  "content": "I prefer dark mode and short responses"
}

tool_call

A tool or function call made by the AI assistant. Captures what tools were invoked and their results.

Examples: API calls, database queries, function invocations.

{
  "user_id": "user-123",
  "type": "tool_call",
  "content": "Called weather_api for San Francisco, returned 72F sunny"
}

app_event

An application-level event: navigation, purchases, feature usage, or any custom event your app emits.

Examples: page views, button clicks, purchases, settings changes.

{
  "user_id": "user-123",
  "type": "app_event",
  "content": "User upgraded to Pro plan"
}

Async enrichment

After you call /v1/ingest, the API responds immediately with event_id. A background worker then enriches the event:

Extracts facts and episodes from the content
Generates embeddings for semantic search
Updates the user's memory graph
Deduplicates against existing data

Enrichment typically takes 1–5 seconds. If you call /v1/recall immediately after ingest, the newly ingested event may not appear yet. In most production flows this is not a problem, as recall happens on the next user interaction.

Token budgets

Every recall request includes a token_budget (range: 200–8000). CLAIV selects and ranks the most relevant memories, then trims and packs them to fit within your budget.

The response includes token_estimate which tells you approximately how many tokens the returned context will consume. This helps you manage your LLM's context window.

How budgets affect recall

Lower budgets (200–500): Only the most critical facts. Best for constrained models.

Medium budgets (500–2000): Good balance of facts and recent episodes.

Higher budgets (2000–8000): Rich context with detailed memory blocks. Best for complex tasks.

Tenant isolation

Each project in your console maps to a unique tenant_id. All data is strictly isolated per tenant:

API keys are scoped to a single tenant
Events, memories, and embeddings are tenant-isolated
Recall only returns data for the tenant associated with the API key
Forget only deletes data within the tenant scope

The tenant_id is inferred from your API key. You never send tenant_id in request payloads.

Forget and compliance

The /v1/forget endpoint deletes derived data (extracted facts, episodes, embeddings, summaries) for a given user_id. The original ingested events are preserved for audit purposes.

Every forget call returns a receipt_id and deleted_counts, providing cryptographic proof of deletion for GDPR/CCPA compliance.

Next steps

Full API reference with request/response schemas API key management and security best practices