Introducing CLAIV Memory V6
A 5-step enrichment pipeline, predicate routing, synthesized recall with llm_context.text, and GDPR-compliant deletion with structured receipts.
Today we are shipping CLAIV Memory V6. This release hardens every layer of the architecture: extraction is now a deterministic 5-step pipeline, recall returns a richer structured response with pre-synthesized context ready to inject into your system prompt, and the Forget endpoint returns a structured receipt documenting exactly what was removed. V6 is the version we have been building towards since V1 — reliable enough to run in production without surprise, auditable enough to satisfy compliance, and simple enough to integrate in an afternoon.
The 5-Step Enrichment Pipeline
Every call to POST /v6/ingest triggers an asynchronous enrichment pipeline that runs in the background. The five steps are:
- ExtractThe LLM reads the raw conversation turn and produces candidate facts as subject-relation-object triples, each with a verbatim
source_textquote and character-level offsets pointing back to the original message. - MapCandidate facts are matched against the existing memory graph. The system identifies which facts are new, which update existing facts, and which contradict facts already on record. Semantic deduplication happens here — “loves Python” and “prefers Python over everything” resolve to the same predicate.
- GateA relevance and quality gate filters out low-signal candidates: passing mentions, obvious filler, and statements that do not carry persistent meaning. Only facts that are likely to remain relevant across future sessions pass through. This keeps the memory graph clean rather than accumulating noise.
- EmbedSurviving facts are embedded into the vector space used for semantic recall. The embedding is computed from the structured triple, not the raw chunk, which ensures recall precision is anchored to the extracted meaning rather than surrounding conversational context.
- TierEach fact is assigned a memory tier — hot, warm, or cold — based on recency, access frequency, and a computed importance score. Hot facts are always included in recall. Warm facts surface via vector and predicate search. Cold facts are archived superseded values, retained for temporal queries and compliance.
The pipeline runs asynchronously. Ingest returns immediately with an acknowledgement; enrichment completes in the background. This means ingest latency is bounded by the HTTP round-trip, not by LLM inference time.
Predicate Routing in Recall
V6 recall runs four search channels in parallel and merges the results before synthesis:
- Predicate match — Direct lookup of facts whose subject-relation pair matches the parsed query intent. If you ask “What programming languages does this user prefer?”, the predicate router resolves
user → prefers → *and fetches all matching facts without touching vector search. - Vector search — Embedding-based search over the fact store for queries where the predicate cannot be resolved directly. Surfaces semantically related facts ranked by similarity.
- Temporal search — Traverses supersession chains to resolve questions about past state. “Where did they live before?” follows temporal edges rather than searching by embedding.
- Keyword search — Lexical fallback for proper nouns, product names, and other terms that embedding models may not represent well.
The routing object in every recall response tells you which channels fired and how many results each contributed. This makes recall fully debuggable: you can see exactly how the system found what it returned.
The V6 Recall Response
The recall response in V6 is structured into distinct layers, each serving a different purpose in your application:
// POST /v6/recall
{
"answer_facts": [ // Direct answers to the query, highest relevance
{
"subject": "user",
"relation": "prefers",
"object": "TypeScript",
"source_text": "I use TypeScript for everything",
"tier": "hot",
"confidence": 0.97
}
],
"supporting_facts": [], // Contextually related facts that didn't directly answer
"background_context": [], // Hot-tier facts always included regardless of query
"routing": { // How facts were found
"channels_used": ["predicate", "vector"],
"predicate_hits": 3,
"vector_hits": 2
},
"llm_context": { // Inject this directly into your system prompt
"text": "The user prefers TypeScript and works primarily in Next.js.
They are building a SaaS product targeting enterprise customers."
}
}The key field is llm_context.text. This is a pre-synthesized narrative generated from the recalled facts, ready to inject verbatim into your system prompt. You do not need to format the facts yourself. Most integrations look like this:
// Recall user memory before each conversation turn
const memory = await fetch('/v6/recall', {
method: 'POST',
headers: { Authorization: 'Bearer YOUR_KEY' },
body: JSON.stringify({
user_id: 'user-123',
conversation_id: 'conv-456',
query: 'What is this user working on?'
})
}).then(r => r.json());
// Inject into system prompt — that's it
const systemPrompt = `You are a helpful assistant.
User context:
${memory.llm_context.text}`;GDPR Forget with Structured Receipts
The V6 Forget endpoint returns a structured receipt that documents exactly what was removed. No vague confirmation message — a precise breakdown of every deleted record type:
// POST /v6/forget
{
"receipt_id": "rcpt_abc123",
"deleted_at": "2026-03-04T10:22:14Z",
"deleted_counts": {
"events": 47,
"facts": 31,
"episodes": 8,
"chunks": 94,
"claims": 12,
"open_loops": 2
}
}The receipt_id is stored immutably and can be retrieved later to prove compliance. Each count maps to a specific record type in the CLAIV data model: events are the raw conversation turns, facts are the extracted triples, episodes are conversation groupings, chunks are the embedding units, claims are extracted propositions, and open loops are unresolved inference chains. When a user exercises their right to be forgotten, you can hand a regulator a receipt that shows precisely what was deleted.
What Changed from V5
V5 introduced the core concepts: two-phase extraction, tiered memory, and evidence-backed facts. V6 builds on that foundation with several concrete changes:
- Extraction is now 5 steps, not 2. The Map and Gate steps are new. Map eliminates duplicate resolution bugs. Gate keeps the memory graph clean without developer intervention.
conversation_idis optional on recall. Omit it for cross-chat memory (all known facts about the user). Include it for conversation-scoped memory, where recent turns in that conversation are weighted higher in results. Both modes returnllm_context.textready to inject.- No token_budget parameter. V5 exposed a
token_budgetfield on recall. In V6, context fitting is handled internally based on your plan tier. The response is always sized for efficient injection. - No feedback endpoint. The V5 feedback loop endpoint has been removed. The 5-step pipeline eliminates the need for explicit feedback to drive extraction quality.
- Recall response is richer.
answer_facts,supporting_facts,background_context,routing, andllm_contextreplace the flat V5 response. - Forget returns structured counts. V5 returned a generic confirmation. V6 returns a receipt with per-type deletion counts.
Getting Started with V6
V6 is available now on all plans. If you are migrating from V5, the main change is updating your response handling to read from llm_context.text instead of synthesis. You can optionally pass conversation_id on recall calls to enable conversation-scoped weighting — or omit it entirely for cross-chat memory across all of a user's conversations. Everything else is additive.
New integrations should start from the V6 quickstart guide. The interactive playground is fully updated to V6 and shows live ingest and recall responses you can explore before writing a single line of code.