>>> GOOGLE TURBOQUANT — INFERENCE EFFICIENCY BREAKTHROUGH

GOOGLE TURBOQUANT — INFERENCE EFFICIENCY BREAKTHROUGH

KV Cache −6x Memory | 8x Attention Speedup | Zero Accuracy Loss

■ WHAT IT IS

Training-free compression algo targeting the KV cache (working memory in LLMs).

Compresses each cache value from 16 bits → 3 bits. No retraining required.

Drop-in on production models. PyTorch/MLX ports live within 24h of release.

■ HOW IT WORKS

Stage 1 – PolarQuant: rotates KV vectors into polar coords, eliminating per-block normalization overhead (the 1-2 bit tax that defeats most compression).

Stage 2 – QJL: reduces vectors to sign bits (+1/-1) via Johnson-Lindenstrauss Transform. Zero memory overhead. High-precision query estimator preserves attention accuracy.

Result: approaches information-theoretic optimum. Online, data-oblivious,

accelerator-friendly.

■ BENCHMARKS (H100)

- 4-bit impl: 8x speedup on attention logit computation vs unquantized 32-bit

- 3-bit: 6x KV cache reduction, zero degradation on LongBench / NIAH / RULER / L-Eval

Community test (MLX / Qwen3.5-35B, 8.5K–64K ctx): 100% output match at 2.5-bit

■ MARKET REACTION

SK Hynix -6.0% KRX

Kioxia -5.9% TSE

SanDisk -5.7% NASDAQ

Samsung -4.9% KRX

WDC -4.7% NASDAQ

Micron -3.0% NASDAQ

■ ANALYST SPLIT

- Wells Fargo (Rocha): "Directly attacking the cost curve. Calls into question how much memory capacity is needed." — bearish on near-term demand; adoption TBD.

- Morgan Stanley: does not touch model weights or training HBM. Bullish.

- SemiAnalysis (Wang): bottleneck relief → more capable models → more hardware.

- Quilter Cheviot: "Evolutionary, not revolutionary." Cyclical sell-off, not structural.

- Cloudflare CEO Prince: "Google's DeepSeek moment."

■ MY VIEW

Overreaction. TurboQuant is inference-only — training HBM demand (the supercycle driver) is entirely unaffected. Jevons dynamics apply: 6x cheaper inference → longer contexts, more agents, more RAG pipelines deployed.

MU/SK Hynix weakness is technical/positional, not a demand inflection.

Watch Q2 datacenter capex guidance from AWS/Azure/GOOGL as the real signal.