Memra Python SDK — Quickstart

Get Memra running in a Python project in about 10 minutes. This guide assumes you already have a Memra API key (sign up at usememra.com).


1. Install

pip install memra-sdk

Requires Python 3.9+. Installed as memra-sdk, imported as memra.

2. Configure auth (one recommended pattern)

Put your key in an environment variable and read it once at startup. Don't hard-code it, don't scatter it across files.

export MEMRA_API_KEY="memra_live_your_key_here"
# memra_client.py
import os
from memra import MemraClient

client = MemraClient(api_key=os.environ["MEMRA_API_KEY"])

Import client from this module everywhere — one client per process, reused across requests. Close it on shutdown, or use with MemraClient(...) as client: for short scripts.


3. Hello memory

Store a fact, recall it by meaning, and see how much context it will cost you.

from memra import MemraClient
import os

with MemraClient(api_key=os.environ["MEMRA_API_KEY"]) as client:
    # Store a memory
    memory = client.memories.add(
        content="Alice prefers dark mode and drinks her coffee black.",
        tenant_id="user_alice",
        project_id="my-app",
        type="preference",
        importance=7,
    )
    print(f"stored {memory.id}")

    # Recall by meaning (reranking is on by default)
    result = client.memories.recall(
        query="How does Alice like her coffee?",
        tenant_id="user_alice",
        project_id="my-app",
        rerank=True,
    )

    for mem in result.data:
        print(f"[{mem.score:.3f}] {mem.content}")

    print(f"\nestimated_tokens for this recall: {result.estimated_tokens}")

What to notice:

  • tenant_id scopes the memory to one user. Always pass it.
  • rerank=True is the default; passing it explicitly is just a reminder.
  • result.estimated_tokens is the total tokens you're about to inject into your LLM prompt if you feed all result.data into it. Read this before every call so your context budget stays honest.

4. A realistic example: episodic memory for a chatbot

Store a few conversation turns, then recall the ones relevant to a new question. This is where reranking earns its keep — the top dense-similarity hit is often not the answer you want.

from memra import MemraClient
import os

TENANT = "user_alice"
PROJECT = "support-bot"

with MemraClient(api_key=os.environ["MEMRA_API_KEY"]) as client:
    # Episodic turns from a past support conversation
    turns = [
        "Alice reported her export job failed on 2026-04-10 at 14:02 UTC.",
        "Root cause: she exceeded the 10k-row free-tier limit.",
        "Alice upgraded to the Pro plan the next day and the export succeeded.",
        "Alice asked whether older exports were preserved — yes, retained 90 days.",
    ]
    for t in turns:
        client.memories.add(
            content=t,
            tenant_id=TENANT,
            project_id=PROJECT,
            type="event",
        )

    # Later: Alice comes back and asks something fuzzy.
    result = client.memories.recall(
        query="Why did my last export break and did I fix it?",
        tenant_id=TENANT,
        project_id=PROJECT,
        limit=5,
    )

    print(f"top hit: {result.data[0].content}")
    print(f"returned {result.meta.returned} of {result.meta.total_candidates} candidates")
    print(f"tokens to inject: {result.estimated_tokens}")

Why reranking matters here: pure vector similarity often surfaces the "older exports preserved?" turn because it has the richest lexical overlap with "export". The reranker reads the actual question — why did it break and did I fix it — and promotes the root-cause + resolution turns to the top.


5. Handling errors

Catch the typed subclass you care about; let the rest bubble up.

from memra.exceptions import MemraAuthError, MemraQuotaError, MemraNotFoundError

try:
    result = client.memories.recall(query="...", tenant_id="...", project_id="...")
except MemraAuthError:
    raise SystemExit("Check MEMRA_API_KEY — it's missing or revoked.")
except MemraQuotaError:
    # Rate limit or plan quota. Back off, don't retry hot.
    ...
except MemraNotFoundError:
    ...  # wrong project_id, usually

Next steps

  • Async runtime? Swap MemraClient for AsyncMemraClient — same methods, await-able.
  • Multiple users? Use one tenant_id per user. Use projects.create() once to carve apps apart.
  • Bulk ingestion? client.memories.batch([...]) accepts up to 100 items per call.
  • Correcting an existing memory? Use client.memories.supersede(id, content=...) instead of adding a new one — the old memory is retired, search stops returning it, and the audit trail is preserved. client.memories.chain(id) returns the full oldest→newest history.
  • Compliance? client.privacy.export() and client.privacy.create_erasure_request(id).

Full reference: usememra.com/docs/sdks/python.