Memory
Conversation memory with relevance search.
Overview
Memory stores per-user conversation context and retrieves it by relevance. It uses trigram-based cosine similarity for search — no embeddings API, no vector database. Everything runs inside the binary on SQLite.
API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/memory/{user_id} | Store a memory entry |
| GET | /api/memory/{user_id}/relevant | Find relevant memories by query |
| GET | /api/memory/{user_id} | List all memories for a user |
| DELETE | /api/memory/{user_id}/{id} | Delete a specific memory |
| POST | /api/memory/{user_id}/summarize | Auto-summarize old entries |
Storing Memories
# Store a memory with 30-day TTL curl -X POST http://localhost:4200/api/memory/user_123 \ -d '{"content":"User prefers Python for data tasks", "expires_in":"30d"}' # Store without expiry (persists indefinitely) curl -X POST http://localhost:4200/api/memory/user_123 \ -d '{"content":"User works at Acme Corp in the ML team"}'
The expires_in field accepts durations like 24h, 7d, 30d. Omit it for permanent storage.
Relevance Search
# Find memories relevant to a query curl "http://localhost:4200/api/memory/user_123/relevant?query=python+scripting&limit=5" # Returns entries ranked by relevance_score (0-1) [{"id":"me_8a4f2c","content":"User prefers Python for data tasks","relevance_score":0.87,...}]
Search uses trigram cosine similarity with a 0.1 minimum threshold. Results are ranked by score descending. The limit parameter defaults to 5 (max 20).
Auto-Summarization
# Summarize entries older than 7 days curl -X POST http://localhost:4200/api/memory/user_123/summarize # Old entries are combined into a summary and originals deleted {"status":"summarized","entries_removed":12,"summary_id":"me_c91d5f"}
Summarization combines up to 20 entries older than 7 days into a single summary entry, then deletes the originals. Run periodically to keep memory lean.
TTL Expiry
Expired entries are automatically excluded from all queries (list, relevant). No background cleanup job needed — the filter is applied at query time.