11/7/2025 David Graham

Why DAISI Chose gRPC Over REST for Real-Time AI Inference

11/7/2025 David Graham

Why DAISI Chose gRPC Over REST for Real-Time AI Inference

Why DAISI Chose gRPC Over REST for Real-Time AI Inference

When we designed DAISI’s host-to-consumer pipeline, we knew one thing for certain: REST alone wasn’t going to cut it for streaming millions of tokens per second with sub-100 ms latency across a globally distributed network.

That’s why the core of every live inference session in DAISI runs on gRPC — and the performance difference is night-and-day.

The Key Advantages We Needed — and Why gRPC Delivers

Requirement REST Limitation gRPC Advantage
Bidirectional streaming One request → one response. Streaming requires hacks (SSE, long-poll, WebSockets) Native bidirectional streams. Tokens flow back instantly as they’re generated.
Ultra-low latency HTTP/1.1 or even HTTP/2 still adds header bloat and connection overhead Protocol Buffers + HTTP/2 multiplexing → ~60–80 % less overhead than JSON/REST
Strong typing & code-gen Open-ended JSON schemas → runtime errors, manual docs .proto contracts → auto-generated, type-safe clients in C#, Go, Python, JS, etc.
Header compression Repeated JSON keys and headers on every message HPACK compression shrinks repeated metadata (session IDs, auth, etc.)
Flow control & backpressure No built-in mechanism — clients can be overwhelmed Built-in flow control prevents hosts from being flooded
Binary efficiency Text-based JSON → 2–5× larger payloads Protobuf binary → tiny messages, critical when streaming thousands of tokens/s
Keep-alive & multiplexing One TCP connection per request (or expensive connection pooling) Multiple logical streams over a single long-lived TCP connection

Real-World Impact on DAISI Sessions

In a typical DAISI inference session:

  • A consumer opens a single gRPC channel
  • The orchestrator hands off a direct encrypted gRPC stream to the chosen host
  • Tokens begin flowing back as they’re generated — no polling, no chunked encoding tricks
  • Median consumer-to-host round-trip stays under 100 ms even on residential connections

That simply isn’t achievable at scale with a pure REST design without layering on WebSockets or Server-Sent Events — which add complexity, latency, and failure points we refused to accept.

Where We Still Use REST

We didn’t throw REST out completely:

  • Public website, account management, payouts dashboard → classic REST + OpenAPI
  • Light-weight health checks and discovery → simple HTTPS endpoints

But the moment real-time inference begins? gRPC takes over.

Bottom Line

gRPC isn’t just a nice-to-have for DAISI — it’s the reason we can deliver centralized-grade latency on a fully decentralized, peer-to-peer network while keeping payload sizes tiny and CPU usage on host devices minimal.

When you’re streaming intelligence to thousands of concurrent users from everyday laptops and phones, every millisecond and every byte counts.

That’s why we bet on gRPC — and why the numbers speak for themselves.

Curious what sub-100 ms global inference feels like?
Alpha is still free and unlimited — come build with us at daisi.ai.

— David
(Still measuring everything in milliseconds and proud of it)

Alpha Phase

Get early access when it is available in January 2026.

An unhandled error has occurred. Reload 🗙

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.