Memra Python SDK — Quickstart
Get Memra running in a Python project in about 10 minutes. This guide assumes you already have a Memra API key (sign up at usememra.com).
1. Install
pip install memra-sdk
Requires Python 3.9+. Installed as memra-sdk, imported as memra.
2. Configure auth (one recommended pattern)
Put your key in an environment variable and read it once at startup. Don't hard-code it, don't scatter it across files.
export MEMRA_API_KEY="memra_live_your_key_here"
# memra_client.py
import os
from memra import MemraClient
client = MemraClient(api_key=os.environ["MEMRA_API_KEY"])
Import client from this module everywhere — one client per process, reused across requests. Close it on shutdown, or use with MemraClient(...) as client: for short scripts.
3. Hello memory
Store a fact, recall it by meaning, and see how much context it will cost you.
from memra import MemraClient
import os
with MemraClient(api_key=os.environ["MEMRA_API_KEY"]) as client:
# Store a memory
memory = client.memories.add(
content="Alice prefers dark mode and drinks her coffee black.",
tenant_id="user_alice",
project_id="my-app",
type="preference",
importance=7,
)
print(f"stored {memory.id}")
# Recall by meaning (reranking is on by default)
result = client.memories.recall(
query="How does Alice like her coffee?",
tenant_id="user_alice",
project_id="my-app",
rerank=True,
)
for mem in result.data:
print(f"[{mem.score:.3f}] {mem.content}")
print(f"\nestimated_tokens for this recall: {result.estimated_tokens}")
What to notice:
tenant_idscopes the memory to one user. Always pass it.rerank=Trueis the default; passing it explicitly is just a reminder.result.estimated_tokensis the total tokens you're about to inject into your LLM prompt if you feed allresult.datainto it. Read this before every call so your context budget stays honest.
4. A realistic example: episodic memory for a chatbot
Store a few conversation turns, then recall the ones relevant to a new question. This is where reranking earns its keep — the top dense-similarity hit is often not the answer you want.
from memra import MemraClient
import os
TENANT = "user_alice"
PROJECT = "support-bot"
with MemraClient(api_key=os.environ["MEMRA_API_KEY"]) as client:
# Episodic turns from a past support conversation
turns = [
"Alice reported her export job failed on 2026-04-10 at 14:02 UTC.",
"Root cause: she exceeded the 10k-row free-tier limit.",
"Alice upgraded to the Pro plan the next day and the export succeeded.",
"Alice asked whether older exports were preserved — yes, retained 90 days.",
]
for t in turns:
client.memories.add(
content=t,
tenant_id=TENANT,
project_id=PROJECT,
type="event",
)
# Later: Alice comes back and asks something fuzzy.
result = client.memories.recall(
query="Why did my last export break and did I fix it?",
tenant_id=TENANT,
project_id=PROJECT,
limit=5,
)
print(f"top hit: {result.data[0].content}")
print(f"returned {result.meta.returned} of {result.meta.total_candidates} candidates")
print(f"tokens to inject: {result.estimated_tokens}")
Why reranking matters here: pure vector similarity often surfaces the "older exports preserved?" turn because it has the richest lexical overlap with "export". The reranker reads the actual question — why did it break and did I fix it — and promotes the root-cause + resolution turns to the top.
5. Handling errors
Catch the typed subclass you care about; let the rest bubble up.
from memra.exceptions import MemraAuthError, MemraQuotaError, MemraNotFoundError
try:
result = client.memories.recall(query="...", tenant_id="...", project_id="...")
except MemraAuthError:
raise SystemExit("Check MEMRA_API_KEY — it's missing or revoked.")
except MemraQuotaError:
# Rate limit or plan quota. Back off, don't retry hot.
...
except MemraNotFoundError:
... # wrong project_id, usually
Next steps
- Async runtime? Swap
MemraClientforAsyncMemraClient— same methods,await-able. - Multiple users? Use one
tenant_idper user. Useprojects.create()once to carve apps apart. - Bulk ingestion?
client.memories.batch([...])accepts up to 100 items per call. - Correcting an existing memory? Use
client.memories.supersede(id, content=...)instead of adding a new one — the old memory is retired, search stops returning it, and the audit trail is preserved.client.memories.chain(id)returns the full oldest→newest history. - Compliance?
client.privacy.export()andclient.privacy.create_erasure_request(id).
Full reference: usememra.com/docs/sdks/python.