Browser Host
Run a DAISI host directly in your browser using WebGPU. No downloads, no install — just open a tab and start hosting.
What is Browser Host?
Browser Host turns any browser tab into a fully functional DAISI host. It loads GGUF models, runs inference entirely on your GPU via WebGPU compute shaders, and connects directly to the ORC (Orchestrator) via gRPC-web — all without any server involvement.
Your data stays on your device. The model runs locally in your browser. Only inference commands and responses travel over the network to the ORC.
Requirements
| Requirement | Details |
|---|---|
| Browser | Chrome 113+ or Edge 113+ (WebGPU required) |
| GPU | Any GPU with WebGPU support (NVIDIA, AMD, Intel, Apple Silicon) |
| VRAM | Minimum 1 GB free. Larger models need more — see VRAM table below. |
| Account | A DAISI account at manager.daisinet.com |
Getting Started
- Navigate to Browser Host in the Manager sidebar.
- The page will detect your GPU and display adapter information (device, architecture, features).
- Select a model from the dropdown. The first download may take a few minutes depending on your connection — once cached, future loads are instant.
- Once loaded, the host automatically creates a host identity and connects to the ORC.
- Your browser tab is now a live DAISI host, processing inference requests from the network.
VRAM Requirements
VRAM usage depends on the model size and quantization. The browser host estimates VRAM before downloading and warns if a model may not fit.
| Model | Quant | Approx. VRAM |
|---|---|---|
| Qwen 0.6B | Q4_0 | ~500 MB |
| Qwen 0.6B | Q8_0 | ~800 MB |
| TinyLlama 1.1B | Q8_0 | ~1.5 GB |
| Llama 3.2 1B | Q4_0 | ~1 GB |
| Llama 3.2 1B | Q8_0 | ~1.8 GB |
WebGPU has a per-buffer limit of 2 GB, but total VRAM usage can exceed this since the engine uses multiple buffers.
Going Online
When the host connects to the ORC, it becomes available for inference requests from the network. Here's what happens:
-
Connection The browser connects directly to the ORC via gRPC-web. No server proxy — your browser talks to the ORC the same way native hosts do.
-
Heartbeats The host sends heartbeats every 60 seconds with model information. The ORC uses these to know your host is alive and what models you have loaded.
-
Auto-Reconnect If the connection drops, the host automatically reconnects with exponential backoff (up to 10 retries). No manual intervention needed.
Private Chat
Use the Private Chat button to test the model locally. Your conversation runs entirely on your device — nothing is sent to a server. This is always free and always private.
Use the ORC Chat button to test the full pipeline: your prompt goes through the ORC, gets routed to your browser host, inference runs on your GPU, and tokens stream back through the ORC.
Supported Models
Browser Host supports GGUF models with the following configurations:
| Feature | Support |
|---|---|
| Architecture | Llama, Qwen, Mistral (and compatible) |
| GPU Quantization | Q4_0, Q8_0 (native GPU shaders) |
| CPU Dequant | F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q4_K, Q6_K |
| Chat Templates | Automatic — reads Jinja2 template from GGUF metadata |
| Attention | Multi-head + Grouped Query Attention (GQA) |
Troubleshooting
-
"WebGPU not available" Make sure you're using Chrome 113+ or Edge 113+. On some systems you may need to enable WebGPU in chrome://flags.
-
"Model too large" Try a smaller model or a more compressed quantization (Q4_0 uses roughly half the VRAM of Q8_0). Close other GPU-intensive tabs.
-
"Disconnected from ORC" The host will auto-reconnect. If it doesn't, try reloading the page. Ensure your ORC address is correct in the Manager settings.
-
Slow inference Use Q8_0 models for best quality/speed tradeoff. Larger models are slower. Close other tabs that may be using the GPU.