selfHostedLLM Archives - Modular Technology Group

AI Workspaces, Tips & Tricks

The Model That Barely Slows Down: Gemma 4 26B vs Qwen 3.6 35B at Long Context
We ran Gemma 4 26B and Qwen 3.6 35B-A3B head-to-head on the same server, same quantization, same protocol. Gemma 4 is 3.7× faster at 32k context — and 7.2× faster at 128k. The gap widens with context, and the reason reveals something important about model selection for long-context workloads.

Continue reading
AI Workspaces, Privacy, Security

The Market Is Moving to Local AI. Here’s Why Modular Bet on It Early.
Continue reading