Memory

Conversation memory with relevance search.

Overview

Memory stores per-user conversation context and retrieves it by relevance. It uses trigram-based cosine similarity for search — no embeddings API, no vector database. Everything runs inside the binary on SQLite.

API Endpoints

MethodPathDescription
POST/api/memory/{user_id}Store a memory entry
GET/api/memory/{user_id}/relevantFind relevant memories by query
GET/api/memory/{user_id}List all memories for a user
DELETE/api/memory/{user_id}/{id}Delete a specific memory
POST/api/memory/{user_id}/summarizeAuto-summarize old entries

Storing Memories

# Store a memory with 30-day TTL
curl -X POST http://localhost:4200/api/memory/user_123 \
  -d '{"content":"User prefers Python for data tasks", "expires_in":"30d"}'

# Store without expiry (persists indefinitely)
curl -X POST http://localhost:4200/api/memory/user_123 \
  -d '{"content":"User works at Acme Corp in the ML team"}'

The expires_in field accepts durations like 24h, 7d, 30d. Omit it for permanent storage.

# Find memories relevant to a query
curl "http://localhost:4200/api/memory/user_123/relevant?query=python+scripting&limit=5"

# Returns entries ranked by relevance_score (0-1)
[{"id":"me_8a4f2c","content":"User prefers Python for data tasks","relevance_score":0.87,...}]

Search uses trigram cosine similarity with a 0.1 minimum threshold. Results are ranked by score descending. The limit parameter defaults to 5 (max 20).

Auto-Summarization

# Summarize entries older than 7 days
curl -X POST http://localhost:4200/api/memory/user_123/summarize

# Old entries are combined into a summary and originals deleted
{"status":"summarized","entries_removed":12,"summary_id":"me_c91d5f"}

Summarization combines up to 20 entries older than 7 days into a single summary entry, then deletes the originals. Run periodically to keep memory lean.

TTL Expiry

Expired entries are automatically excluded from all queries (list, relevant). No background cleanup job needed — the filter is applied at query time.

Note: Memory uses trigram similarity, not semantic embeddings. It works best for keyword and phrase matching. For embedding-based search, pipe memories through a dedicated vector store.