
When we designed DAISI’s host-to-consumer pipeline, we knew one thing for certain: REST alone wasn’t going to cut it for streaming millions of tokens per second with sub-100 ms latency across a globally distributed network.
That’s why the core of every live inference session in DAISI runs on gRPC — and the performance difference is night-and-day.
| Requirement | REST Limitation | gRPC Advantage |
|---|---|---|
| Bidirectional streaming | One request → one response. Streaming requires hacks (SSE, long-poll, WebSockets) | Native bidirectional streams. Tokens flow back instantly as they’re generated. |
| Ultra-low latency | HTTP/1.1 or even HTTP/2 still adds header bloat and connection overhead | Protocol Buffers + HTTP/2 multiplexing → ~60–80 % less overhead than JSON/REST |
| Strong typing & code-gen | Open-ended JSON schemas → runtime errors, manual docs | .proto contracts → auto-generated, type-safe clients in C#, Go, Python, JS, etc. |
| Header compression | Repeated JSON keys and headers on every message | HPACK compression shrinks repeated metadata (session IDs, auth, etc.) |
| Flow control & backpressure | No built-in mechanism — clients can be overwhelmed | Built-in flow control prevents hosts from being flooded |
| Binary efficiency | Text-based JSON → 2–5× larger payloads | Protobuf binary → tiny messages, critical when streaming thousands of tokens/s |
| Keep-alive & multiplexing | One TCP connection per request (or expensive connection pooling) | Multiple logical streams over a single long-lived TCP connection |
In a typical DAISI inference session:
That simply isn’t achievable at scale with a pure REST design without layering on WebSockets or Server-Sent Events — which add complexity, latency, and failure points we refused to accept.
We didn’t throw REST out completely:
But the moment real-time inference begins? gRPC takes over.
gRPC isn’t just a nice-to-have for DAISI — it’s the reason we can deliver centralized-grade latency on a fully decentralized, peer-to-peer network while keeping payload sizes tiny and CPU usage on host devices minimal.
When you’re streaming intelligence to thousands of concurrent users from everyday laptops and phones, every millisecond and every byte counts.
That’s why we bet on gRPC — and why the numbers speak for themselves.
Curious what sub-100 ms global inference feels like?
Alpha is still free and unlimited — come build with us at daisi.ai.
— David
(Still measuring everything in milliseconds and proud of it)
Get early access when it is available in January 2026.