Overview
System architecture
Topology, request flow, and where each surface lives.
SOMA runs entirely on Vercel. Every compute surface — RSC pages, server actions, streaming chat, Inngest workflow handler, Telegram webhook, Gmail Pub/Sub webhook — is a Next.js route in apps/web. No Docker, no separate worker processes, no Fly.
flowchart LR
subgraph External
TG["Telegram<br/>@happ_soma_bot"]
GPS["Google Pub/Sub<br/>(gmail)"]
INN["Inngest Cloud<br/>(cron + events)"]
end
subgraph Vercel["Vercel · soma-ai.cc"]
direction TB
Pages["RSC pages /app/*"]
Chat["/api/chat"]
InngestR["/api/inngest"]
Bot["/api/telegram/webhook"]
Gmail["/api/webhooks/gmail"]
Mastra["Mastra agent<br/>@soma/agent"]
Tools["@soma/tools"]
end
subgraph Services
SB[("Supabase<br/>Postgres + pgvector<br/>Auth")]
LLM["Anthropic<br/>Claude 4.5"]
Voy["Voyage AI<br/>embed + rerank"]
GG["Google APIs<br/>Gmail + Calendar"]
end
subgraph Observability
Lf["Langfuse"]
Sn["Sentry"]
Ph["PostHog"]
Ax["Axiom"]
end
TG --> Bot
GPS --> Gmail
INN --> InngestR
Bot --> Mastra
Chat --> Mastra
InngestR --> Tools
Gmail -.->|fire event| INN
Mastra --> Tools
Mastra --> LLM
Tools --> Voy
Tools --> SB
Tools --> GG
Pages --> SB
Mastra -.-> Lf
Vercel -.-> Sn
Pages -.-> Ph
Vercel -.-> AxRequest flow — user chat
sequenceDiagram
participant U as User
participant Web as apps/web /api/chat
participant Agent as somaAgent (Mastra)
participant Claude
participant Tools as @soma/tools
participant PG as Postgres + pgvector
participant Lf as Langfuse
U->>Web: POST message
Web->>Lf: startTrace(web.chat)
Web->>Agent: stream(messages)
Agent->>Claude: completion with tool schemas
Claude-->>Agent: tool_call memory_recall(query)
Agent->>Tools: memory_recall.execute(query)
Tools->>Tools: embed(query) via Voyage
Tools->>PG: SELECT ... ORDER BY embedding cosine
PG-->>Tools: top-N candidates (3x)
Tools->>Tools: rerank via Voyage rerank-2
Tools-->>Agent: top-K reranked
Agent->>Claude: continuation with context
Claude-->>Agent: final text
Agent-->>Web: streaming chunks
Web-->>U: SSE tokens
Web->>Lf: trace.end and flushWhy Vercel-only
SOMA has zero long-running or stateful workloads. Every compute surface is request/response:
- Chat streams are ≤ 10s (Claude latency dominates)
- Inngest workflow steps are ≤ 5min (well within Vercel Pro's per-invocation limit)
- Telegram webhook is a single POST roundtrip
- No WebSockets, no persistent connections, no GPU inference
For this profile, Vercel wins on every axis that matters: deploy time (30s vs Docker 5-8min), dev/prod parity (one runtime), maintenance (one dashboard), cost ($45/mo all-in).
The decision-making is captured in Migration history.