System architecture

SOMA runs entirely on Vercel. Every compute surface — RSC pages, server actions, streaming chat, Inngest workflow handler, Telegram webhook, Gmail Pub/Sub webhook — is a Next.js route in apps/web. No Docker, no separate worker processes, no Fly.

flowchart LR
  subgraph External
    TG["Telegram<br/>@happ_soma_bot"]
    GPS["Google Pub/Sub<br/>(gmail)"]
    INN["Inngest Cloud<br/>(cron + events)"]
  end

  subgraph Vercel["Vercel · soma-ai.cc"]
    direction TB
    Pages["RSC pages /app/*"]
    Chat["/api/chat"]
    InngestR["/api/inngest"]
    Bot["/api/telegram/webhook"]
    Gmail["/api/webhooks/gmail"]
    Mastra["Mastra agent<br/>@soma/agent"]
    Tools["@soma/tools"]
  end

  subgraph Services
    SB[("Supabase<br/>Postgres + pgvector<br/>Auth")]
    LLM["Anthropic<br/>Claude 4.5"]
    Voy["Voyage AI<br/>embed + rerank"]
    GG["Google APIs<br/>Gmail + Calendar"]
  end

  subgraph Observability
    Lf["Langfuse"]
    Sn["Sentry"]
    Ph["PostHog"]
    Ax["Axiom"]
  end

  TG --> Bot
  GPS --> Gmail
  INN --> InngestR
  Bot --> Mastra
  Chat --> Mastra
  InngestR --> Tools
  Gmail -.->|fire event| INN
  Mastra --> Tools
  Mastra --> LLM
  Tools --> Voy
  Tools --> SB
  Tools --> GG
  Pages --> SB
  Mastra -.-> Lf
  Vercel -.-> Sn
  Pages -.-> Ph
  Vercel -.-> Ax

Request flow — user chat

sequenceDiagram
  participant U as User
  participant Web as apps/web /api/chat
  participant Agent as somaAgent (Mastra)
  participant Claude
  participant Tools as @soma/tools
  participant PG as Postgres + pgvector
  participant Lf as Langfuse

  U->>Web: POST message
  Web->>Lf: startTrace(web.chat)
  Web->>Agent: stream(messages)
  Agent->>Claude: completion with tool schemas
  Claude-->>Agent: tool_call memory_recall(query)
  Agent->>Tools: memory_recall.execute(query)
  Tools->>Tools: embed(query) via Voyage
  Tools->>PG: SELECT ... ORDER BY embedding cosine
  PG-->>Tools: top-N candidates (3x)
  Tools->>Tools: rerank via Voyage rerank-2
  Tools-->>Agent: top-K reranked
  Agent->>Claude: continuation with context
  Claude-->>Agent: final text
  Agent-->>Web: streaming chunks
  Web-->>U: SSE tokens
  Web->>Lf: trace.end and flush

Why Vercel-only

SOMA has zero long-running or stateful workloads. Every compute surface is request/response:

Chat streams are ≤ 10s (Claude latency dominates)
Inngest workflow steps are ≤ 5min (well within Vercel Pro's per-invocation limit)
Telegram webhook is a single POST roundtrip
No WebSockets, no persistent connections, no GPU inference

For this profile, Vercel wins on every axis that matters: deploy time (30s vs Docker 5-8min), dev/prod parity (one runtime), maintenance (one dashboard), cost ($45/mo all-in).

The decision-making is captured in Migration history.

Request flow — user chat

Why Vercel-only

On this page