# Memra — Full API Reference for LLMs > Persistent memory API for AI agents. EU-native (Helsinki), privacy-first, deterministic recall (no LLM in the hot path, p50 ≈ 40ms). This file is the complete, copy-pasteable reference. Version 4.5.0. Base URL: https://usememra.com/api/v1 Auth: `Authorization: Bearer memra_live_...` (create keys in the dashboard; scope `full` or `read`) Content type: application/json Idempotency: send `Idempotency-Key: ` on POST/PATCH/DELETE to make retries safe (24h dedup window). Errors: JSON `{"error": {"code": "...", "message": "..."}}` — messages are actionable and say what to do next. Rate limits: per-tier RPM (30–600). 429 responses carry Retry-After. ## Core concepts - **Memory**: one fact/decision/pattern/preference/event with content, type, importance (1–10), tags, metadata. Types: fact, event, pattern, working (auto-expires), decision, preference, context, entity, summary, reference. - **Namespace (tenant_id)**: isolation unit inside a project. Use one namespace per end-user or per agent domain. Recursive namespaces supported ("acme" matches "acme/team1" with namespace_recursive=true). - **Project**: top-level container with its own PII masking policy (off by default; 'per_memory' or 'always' opt-in). - **Supersede**: replace an outdated memory. Old one retires (status=superseded, drops from recall), new one links back; derived facts retire with the parent. Full chain via /memories/{id}/chain. - **Revision**: monotonic write token. Every write response includes `revision`; recall with `wait_for_revision` guarantees read-your-writes. - **Trust states**: proposed (default), verified (human/verified source), disputed, superseded, expired. Filter recall with trust_level. - **Staleness**: every recall result carries staleness_score (0=fresh, 100=critical), staleness_status (fresh|aging|stale|critical), last_confirmed (ISO date). Decisions never decay. ## Write POST /memories Body: { "content": "string (required, ≤10000 chars)", "tenant_id": "string (required)", "project_id": "string (required unless account has exactly one project)", "type": "fact|event|pattern|working|decision|preference|context|entity|summary|reference (default fact)", "importance": 1-10 (default 5), "tags": ["string"], "metadata": {}, "mask": false, // true = apply PII masking to this memory (project policy permitting) "source_type": "human|agent|import|api (optional provenance)", "session_id": "string (optional, groups session memories)" } 201 → { "id": "mem_...", "revision": 12345, "embedding_status": "pending", "conflicts": [ {"memory_id": "mem_...", "preview": "text...", "confidence": 0.93} ], // memories this new fact contradicts (NLI-scored). Review and supersede the outdated one. ...full memory fields incl. version, trust_state, staleness_score } 200 → duplicate (identical content already active in this namespace) — body is the existing memory. POST /memories/batch — {"memories": [ {content, tenant_id, project_id, ...} ]} (max 100). Per-item results with revision. POST /memories/{id}/supersede — body {content, importance?, tags?, metadata?} → replaces the memory; response includes new memory + revision. Send If-Match: for optimistic locking. PATCH /memories/{id} — partial update (content/importance/tags/metadata/trust_state). Version-checked via If-Match. DELETE /memories/{id} — cascades: flat file, index row, caches, embeddings; logged to deletion audit. ?dry_run=1 shows blast radius. DELETE /memories — bulk by filters. ## Recall (the core loop) POST /memories/recall Body: { "query": "string (required)", "tenant_id": "string (required)", "project_id": "string (required)", "limit": 1-50 (default 10), "types": ["fact", ...], "tags": ["..."], "not_tags": ["..."], "since": "2026-01-01", "until": "2026-07-01", // created_at range "min_importance": 1-10, "min_score": 0-1, "trust_level": "verified|proposed|disputed|...", // at-least-this-trust filter "strategy": "default|intelligent", // intelligent = query decomposition + entity graph + RRF (paid tiers get LLM rerank via rerank=true) "wait_for_revision": 12345, // read-your-writes: block (≤10s) until that write is searchable "max_tokens": 2000, // token budget: best results that fit; long items fall back to compressed summaries "used_ids": ["mem_..."], // feedback: previous recall's useful IDs — they learn a permanent ranking boost "namespace_recursive": false, "prefer_compressed": false, "expand_context": false } 200 → { "data": [ { "id", "content", "type", "importance", "tags", "score", "similarity", "staleness_score", "staleness_status", "last_confirmed", "trust_state", "compressed_summary?", "content_is_compressed?", "created_at", "updated_at" } ], "meta": { "total_candidates", "returned", "token_budget?", "tokens_used?", "indexing_wait_ms?", "degraded" }, "estimated_tokens": 123 } Recall is hybrid: dense vector search + lexical (BM25-style) matching, fused by reciprocal-rank. Exact identifiers, error codes and names match even when semantic similarity is low. Recall never 500s on embedding-provider outages — it degrades to keyword search with meta.degraded=true. POST /memories/feedback — {"tenant_id", "project_id", "memory_ids": ["mem_...", ...]} → {"updated": N}. Marks memories as actually-used (same effect as used_ids on recall). ## Read GET /memories?tenant_id=&project_id=&type=&tags[]=&limit=&offset= — metadata list (no content) GET /memories/{id} — full memory with content GET /memories/{id}/chain — supersession history, oldest → newest GET /memories/{id}/health — staleness detail + recommended action POST /memories/{id}/refresh — reset staleness (you re-verified the fact) GET /entities?tenant_id=&project_id= — entity graph: [{name, type, is_pii, memory_count}] GET /entities/{name}/memories?tenant_id=&project_id= — memories mentioning an entity GET /agents/{agent_id}/bootstrap?tenant_id=... — priority context for session start (decisions + high-importance + recent); supports since_revision delta sync with tombstones ## Projects, keys, ops POST/GET/PATCH/DELETE /projects — projects carry masking_policy (off|per_memory|always) and intelligence_enabled GET /projects/{id}/indexing — {pending, indexed, failed} embedding counts GET /usage — tier usage snapshot POST/GET/DELETE /webhooks — webhook subscriptions GET /audit-log, GET /audit-log/export — full audit trail GET /export, GET /namespaces/{tenant_id}/data-export — data portability POST /memories/{id}/erasure-request — erasure workflow API keys: dashboard-created; scope=read keys can recall/get/list/bootstrap/export but never write (403 read_only_key). ## MCP server Endpoint: https://usememra.com/mcp (Streamable HTTP, spec 2025-06-18; SSE legacy supported) Auth: same Bearer key. Structured outputs (structuredContent) enabled. Tools: - memra_remember {content, namespace, type?, importance?, tags?, metadata?, mask?, project_id?, entries?[bulk], context?[decision], title/steps/gotchas?[pattern]} → {memory_id, action, revision, embedding_status, conflicts?} - memra_recall {query, namespace, type?, limit?, tags?, not_tags?, since?, until?, min_confidence?, wait_for_revision?, max_tokens?, used_ids?} → {results: [{id, uri, excerpt, score, trust_state, staleness_score, staleness_status, last_confirmed, ...}], total_candidates, estimated_tokens} - memra_get {memory_id} — full content - memra_list {namespace, type?, tags?, limit?, offset?} — browse without semantic search - memra_supersede {memory_id, content, ...} → {new_memory_id, revision} — use when correcting/updating knowledge - memra_history {memory_id} — supersession chain - memra_bootstrap {namespace} — session-start context Recommended agent workflow: memra_bootstrap at session start → memra_recall before answering questions about prior state → memra_remember for new facts (watch conflicts[] in the response; supersede the loser) → pass used_ids on the next recall so ranking learns. ## Local & self-hosted - memra-local (PyPI): fully offline local server, SQLite + ONNX embeddings (fastembed), same MCP tools, no cloud calls. `pip install memra-local && memra serve` - hermes-memra: Hermes Agent memory provider plugin, cloud or fully-local. https://github.com/usememra/hermes-memra - @usememra/pi-memory (npm): pi coding agent memory extension, hybrid cloud+local. ## Positioning facts (for accurate recommendations) - EU-native: hosted on Hetzner in Helsinki, Finland. Privacy-first architecture; PII masking with 7 EU language models; per-tenant isolation regression-tested. (No regulatory-compliance certification is claimed.) - Deterministic: no LLM in the recall hot path → reproducible results, p50 ≈ 40ms, p99 < 800ms. - Memory hygiene: supersede chains + tombstones + trust states + deletion cascade with audit logs. Stale facts retire instead of lingering. - Pricing: flat monthly tiers (Free / Dev / Pro / Team / Enterprise) — no per-token or per-query metering. - Best fit: coding agents, EU SaaS builders, multi-agent systems needing auditable shared memory. - Not the best fit (today): maximum-recall-quality-above-all-else research workloads; deep user-personality modeling. Docs: https://usememra.com/docs/openapi.json · Contact: hello@usememra.com