Back to all notes
Bookmark

Quantization from the ground up | ngrok blog

A clear explainer on why quantization works and what it trades off. The core idea is not “make the model smaller,” but “store the same ideas with fewer bits,” which explains why quality often drops only slightly. This reframing makes the infra trade‑offs feel concrete: precision vs. speed vs. cost. It’s a good mental model if you ship models rather than just use them. If you care about cost‑per‑inference, you eventually need this literacy.

#bookmark