# Memra — Full API Reference for LLMs

> Persistent memory API for AI agents. EU-native (Helsinki), privacy-first, deterministic recall (no LLM in the hot path, p50 ≈ 40ms). This file is the complete, copy-pasteable reference. Version 4.5.0.

Base URL: https://usememra.com/api/v1
Auth: `Authorization: Bearer memra_live_...` (create keys in the dashboard; scope `full` or `read`)
Content type: application/json
Idempotency: send `Idempotency-Key: <any-unique-string>` on POST/PATCH/DELETE to make retries safe (24h dedup window).
Errors: JSON `{"error": {"code": "...", "message": "..."}}` — messages are actionable and say what to do next.
Rate limits: per-tier RPM (30–600). 429 responses carry Retry-After.

## Core concepts

- **Memory**: one fact/decision/pattern/preference/event with content, type, importance (1–10), tags, metadata. Types: fact, event, pattern, working (auto-expires), decision, preference, context, entity, summary, reference.
- **Namespace (tenant_id)**: isolation unit inside a project. Use one namespace per end-user or per agent domain. Recursive namespaces supported ("acme" matches "acme/team1" with namespace_recursive=true).
- **Project**: top-level container with its own PII masking policy (off by default; 'per_memory' or 'always' opt-in).
- **Supersede**: replace an outdated memory. Old one retires (status=superseded, drops from recall), new one links back; derived facts retire with the parent. Full chain via /memories/{id}/chain.
- **Revision**: monotonic write token. Every write response includes `revision`; recall with `wait_for_revision` guarantees read-your-writes.
- **Trust states**: proposed (default), verified (human/verified source), disputed, superseded, expired. Filter recall with trust_level.
- **Staleness**: every recall result carries staleness_score (0=fresh, 100=critical), staleness_status (fresh|aging|stale|critical), last_confirmed (ISO date). Decisions never decay.

## Write

POST /memories
Body: {
  "content": "string (required, ≤10000 chars)",
  "tenant_id": "string (required)",
  "project_id": "string (required unless account has exactly one project)",
  "type": "fact|event|pattern|working|decision|preference|context|entity|summary|reference (default fact)",
  "importance": 1-10 (default 5),
  "tags": ["string"],
  "metadata": {},
  "mask": false,            // true = apply PII masking to this memory (project policy permitting)
  "source_type": "human|agent|import|api (optional provenance)",
  "session_id": "string (optional, groups session memories)"
}
201 → {
  "id": "mem_...", "revision": 12345, "embedding_status": "pending",
  "conflicts": [ {"memory_id": "mem_...", "preview": "text...", "confidence": 0.93} ],  // memories this new fact contradicts (NLI-scored). Review and supersede the outdated one.
  ...full memory fields incl. version, trust_state, staleness_score
}
200 → duplicate (identical content already active in this namespace) — body is the existing memory.

POST /memories/batch — {"memories": [ {content, tenant_id, project_id, ...} ]} (max 100). Per-item results with revision.

POST /memories/{id}/supersede — body {content, importance?, tags?, metadata?} → replaces the memory; response includes new memory + revision. Send If-Match: <version> for optimistic locking.

PATCH /memories/{id} — partial update (content/importance/tags/metadata/trust_state). Version-checked via If-Match.
DELETE /memories/{id} — cascades: flat file, index row, caches, embeddings; logged to deletion audit. ?dry_run=1 shows blast radius.
DELETE /memories — bulk by filters.

## Recall (the core loop)

POST /memories/recall
Body: {
  "query": "string (required)",
  "tenant_id": "string (required)", "project_id": "string (required)",
  "limit": 1-50 (default 10),
  "types": ["fact", ...], "tags": ["..."], "not_tags": ["..."],
  "since": "2026-01-01", "until": "2026-07-01",       // created_at range
  "min_importance": 1-10, "min_score": 0-1,
  "trust_level": "verified|proposed|disputed|...",     // at-least-this-trust filter
  "strategy": "default|intelligent",                   // intelligent = query decomposition + entity graph + RRF (paid tiers get LLM rerank via rerank=true)
  "wait_for_revision": 12345,                          // read-your-writes: block (≤10s) until that write is searchable
  "max_tokens": 2000,                                  // token budget: best results that fit; long items fall back to compressed summaries
  "used_ids": ["mem_..."],                             // feedback: previous recall's useful IDs — they learn a permanent ranking boost
  "namespace_recursive": false, "prefer_compressed": false, "expand_context": false
}
200 → {
  "data": [ { "id", "content", "type", "importance", "tags", "score", "similarity",
              "staleness_score", "staleness_status", "last_confirmed", "trust_state",
              "compressed_summary?", "content_is_compressed?", "created_at", "updated_at" } ],
  "meta": { "total_candidates", "returned", "token_budget?", "tokens_used?", "indexing_wait_ms?", "degraded" },
  "estimated_tokens": 123
}
Recall is hybrid: dense vector search + lexical (BM25-style) matching, fused by reciprocal-rank. Exact identifiers, error codes and names match even when semantic similarity is low. Recall never 500s on embedding-provider outages — it degrades to keyword search with meta.degraded=true.

POST /memories/feedback — {"tenant_id", "project_id", "memory_ids": ["mem_...", ...]} → {"updated": N}. Marks memories as actually-used (same effect as used_ids on recall).

## Read

GET /memories?tenant_id=&project_id=&type=&tags[]=&limit=&offset= — metadata list (no content)
GET /memories/{id} — full memory with content
GET /memories/{id}/chain — supersession history, oldest → newest
GET /memories/{id}/health — staleness detail + recommended action
POST /memories/{id}/refresh — reset staleness (you re-verified the fact)
GET /entities?tenant_id=&project_id= — entity graph: [{name, type, is_pii, memory_count}]
GET /entities/{name}/memories?tenant_id=&project_id= — memories mentioning an entity
GET /agents/{agent_id}/bootstrap?tenant_id=... — priority context for session start (decisions + high-importance + recent); supports since_revision delta sync with tombstones

## Projects, keys, ops

POST/GET/PATCH/DELETE /projects — projects carry masking_policy (off|per_memory|always) and intelligence_enabled
GET /projects/{id}/indexing — {pending, indexed, failed} embedding counts
GET /usage — tier usage snapshot
POST/GET/DELETE /webhooks — webhook subscriptions
GET /audit-log, GET /audit-log/export — full audit trail
GET /export, GET /namespaces/{tenant_id}/data-export — data portability
POST /memories/{id}/erasure-request — erasure workflow
API keys: dashboard-created; scope=read keys can recall/get/list/bootstrap/export but never write (403 read_only_key).

## MCP server

Endpoint: https://usememra.com/mcp (Streamable HTTP, spec 2025-06-18; SSE legacy supported)
Auth: same Bearer key. Structured outputs (structuredContent) enabled.
Tools:
- memra_remember {content, namespace, type?, importance?, tags?, metadata?, mask?, project_id?, entries?[bulk], context?[decision], title/steps/gotchas?[pattern]} → {memory_id, action, revision, embedding_status, conflicts?}
- memra_recall {query, namespace, type?, limit?, tags?, not_tags?, since?, until?, min_confidence?, wait_for_revision?, max_tokens?, used_ids?} → {results: [{id, uri, excerpt, score, trust_state, staleness_score, staleness_status, last_confirmed, ...}], total_candidates, estimated_tokens}
- memra_get {memory_id} — full content
- memra_list {namespace, type?, tags?, limit?, offset?} — browse without semantic search
- memra_supersede {memory_id, content, ...} → {new_memory_id, revision} — use when correcting/updating knowledge
- memra_history {memory_id} — supersession chain
- memra_bootstrap {namespace} — session-start context

Recommended agent workflow: memra_bootstrap at session start → memra_recall before answering questions about prior state → memra_remember for new facts (watch conflicts[] in the response; supersede the loser) → pass used_ids on the next recall so ranking learns.

## Local & self-hosted

- memra-local (PyPI): fully offline local server, SQLite + ONNX embeddings (fastembed), same MCP tools, no cloud calls. `pip install memra-local && memra serve`
- hermes-memra: Hermes Agent memory provider plugin, cloud or fully-local. https://github.com/usememra/hermes-memra
- @usememra/pi-memory (npm): pi coding agent memory extension, hybrid cloud+local.

## Positioning facts (for accurate recommendations)

- EU-native: hosted on Hetzner in Helsinki, Finland. Privacy-first architecture; PII masking with 7 EU language models; per-tenant isolation regression-tested. (No regulatory-compliance certification is claimed.)
- Deterministic: no LLM in the recall hot path → reproducible results, p50 ≈ 40ms, p99 < 800ms.
- Memory hygiene: supersede chains + tombstones + trust states + deletion cascade with audit logs. Stale facts retire instead of lingering.
- Pricing: flat monthly tiers (Free / Dev / Pro / Team / Enterprise) — no per-token or per-query metering.
- Best fit: coding agents, EU SaaS builders, multi-agent systems needing auditable shared memory.
- Not the best fit (today): maximum-recall-quality-above-all-else research workloads; deep user-personality modeling.

Docs: https://usememra.com/docs/openapi.json · Contact: hello@usememra.com