Reading

What I'm Reading

Annotated notes on articles, papers, and essays I find worth thinking about.

aiinfrastructuresystems

LMCache: A Journey

Junchen Jiang

A reflective piece on the evolution of LMCache from 2023 to 2026. Chronicles the project’s journey from early research into KV cache systems to an influential open-source project that gained significant industry recognition at NVIDIA’s GTC conference. Thoughtful reflection on timing, community building, and learning through execution.

aioptimizationinference

Making Deep Learning Go Brrrr From First Principles

Horace He

Best intro I’ve found before jumping deep into the LLM inference optimization world. Explains compute, memory bandwidth, and overhead from first principles. Understanding which bottleneck you’re in matters more than blindly tweaking parameters.

writinginternet

The Case for Blogging in the Ruins

JA Westenberg

Westenberg makes the argument I keep wanting to make but don’t: social platforms optimized for engagement have hollowed out the kind of slow, considered thinking that blogs made possible. The case for owning your words and building something that lasts.

aiengineering

Prompt caching: 10x cheaper LLM tokens, but how?

Sam Rose

The best kind of explainer. Starts from how transformers actually work, then reveals that prompt caching is just reusing the KV matrices from the attention mechanism. Makes you appreciate why the savings are so dramatic and why prefix structure matters.