Why DAISI Chose gRPC Over REST for Real-Time AI Inference

55 15

Why DAISI Chose gRPC Over REST for Real-Time AI Inference

When we designed DAISI’s host-to-consumer pipeline, we knew one thing for certain: REST alone wasn’t going to cut it for streaming millions of tokens per second with sub-100 ms latency across a globally distributed network.

That’s why the core of every live inference session in DAISI runs on gRPC — and the performance difference is night-and-day.

The Key Advantages We Needed — and Why gRPC Delivers

Requirement	REST Limitation	gRPC Advantage
Bidirectional streaming	One request → one response. Streaming requires hacks (SSE, long-poll, WebSockets)	Native bidirectional streams. Tokens flow back instantly as they’re generated.
Ultra-low latency	HTTP/1.1 or even HTTP/2 still adds header bloat and connection overhead	Protocol Buffers + HTTP/2 multiplexing → ~60–80 % less overhead than JSON/REST
Strong typing & code-gen	Open-ended JSON schemas → runtime errors, manual docs	.proto contracts → auto-generated, type-safe clients in C#, Go, Python, JS, etc.
Header compression	Repeated JSON keys and headers on every message	HPACK compression shrinks repeated metadata (session IDs, auth, etc.)
Flow control & backpressure	No built-in mechanism — clients can be overwhelmed	Built-in flow control prevents hosts from being flooded
Binary efficiency	Text-based JSON → 2–5× larger payloads	Protobuf binary → tiny messages, critical when streaming thousands of tokens/s
Keep-alive & multiplexing	One TCP connection per request (or expensive connection pooling)	Multiple logical streams over a single long-lived TCP connection

Real-World Impact on DAISI Sessions

In a typical DAISI inference session:

A consumer opens a single gRPC channel
The orchestrator hands off a direct encrypted gRPC stream to the chosen host
Tokens begin flowing back as they’re generated — no polling, no chunked encoding tricks
Median consumer-to-host round-trip stays under 100 ms even on residential connections

That simply isn’t achievable at scale with a pure REST design without layering on WebSockets or Server-Sent Events — which add complexity, latency, and failure points we refused to accept.

Where We Still Use REST

We didn’t throw REST out completely:

Public website, account management, payouts dashboard → classic REST + OpenAPI
Light-weight health checks and discovery → simple HTTPS endpoints

But the moment real-time inference begins? gRPC takes over.

Bottom Line

gRPC isn’t just a nice-to-have for DAISI — it’s the reason we can deliver centralized-grade latency on a fully decentralized, peer-to-peer network while keeping payload sizes tiny and CPU usage on host devices minimal.

When you’re streaming intelligence to thousands of concurrent users from everyday laptops and phones, every millisecond and every byte counts.

That’s why we bet on gRPC — and why the numbers speak for themselves.

Curious what sub-100 ms global inference feels like?
Alpha is still free and unlimited — come build with us at daisi.ai.

— David
(Still measuring everything in milliseconds and proud of it)

Alpha Phase

Get early access when it is available in January 2026.

Name

Email

Phone

I have read and agree to the DAISI terms of service and privacy policy.

I agree that DAISI can contact me via text message and email regarding my account and the network status. Reply STOP to end messages at any time.

EARN

BUILD

NETWORK

LEARN

NEWS

11/7/2025 David Graham