Memory strategy
Two distinct memory systems, don't conflate them.
SOMA has two distinct memory layers. They solve different problems and must not be collapsed into one.
| Layer | Store | Purpose | Retention |
|---|---|---|---|
| Conversational (short-term) | Mastra Memory + PostgresStore | Last N turns per thread, used to keep multi-turn context coherent | lastMessages: 12 |
| Knowledge (long-term) | entities / edges / events / facts | The graph. Queried by tools | Forever (or until forget / supersede) |
Conversational memory
Mastra's built-in Memory tracks the message history of a thread, identified by threadId. On the web, the thread id is web:<userId>:<epochMs>; on the bot, it's tg:<chatId>:<epochMs>. The agent uses the last 12 turns as immediate context.
Gated on opt-in. Mastra's PgVector worker has historical issues running inside Next.js serverless (thread-stream worker resolution). We keep conversational memory disabled by default and enable only when SOMA_MASTRA_MEMORY=1 is set. SOMA's knowledge graph covers long-term recall through tool calls anyway.
Knowledge memory
The graph is the canonical memory. Tools hit it directly:
memory_recall— semantic search overentitiesvia pgvector HNSW, optionally reranked.search_entities— lexical full-text search overentities.nameand searchable properties.graph_neighbors— BFS walk from a root entity, up to 3 hops.
Facts are pulled in separately when the agent needs durable preferences ("user prefers morning workouts"). The fact-extract workflow runs async per conversation turn.
:::note
Mastra's built-in semanticRecall is disabled. It would duplicate work (the knowledge graph already handles semantic recall) and require a second embedder config. The semanticRecall: { ... } block was removed from the Memory constructor.
:::
Why two layers
Conversational memory answers "what did I just say two messages ago?". Knowledge memory answers "what do I know about Atomic Habits?". They have different:
- Lifetimes — conversational is bounded (12 messages); knowledge is unbounded.
- Granularity — conversational stores raw messages; knowledge stores extracted entities + facts.
- Access patterns — conversational is always full-dump-on-read; knowledge is query-by-semantic-similarity.
Collapsing them would require either caching raw messages indefinitely (storage bloat) or discarding them immediately (losing multi-turn context). The split keeps both paths optimal.