What I'm Reading
Annotated notes on articles, papers, and essays I find worth thinking about.
LMCache: A Journey
Junchen JiangA reflective piece on the evolution of LMCache from 2023 to 2026. Chronicles the project’s journey from early research into KV cache systems to an influential open-source project that gained significant industry recognition at NVIDIA’s GTC conference. Thoughtful reflection on timing, community building, and learning through execution.
Making Deep Learning Go Brrrr From First Principles
Horace HeBest intro I’ve found before jumping deep into the LLM inference optimization world. Explains compute, memory bandwidth, and overhead from first principles. Understanding which bottleneck you’re in matters more than blindly tweaking parameters.
Prompt caching: 10x cheaper LLM tokens, but how?
Sam RoseThe best kind of explainer. Starts from how transformers actually work, then reveals that prompt caching is just reusing the KV matrices from the attention mechanism. Makes you appreciate why the savings are so dramatic and why prefix structure matters.
The Case for Blogging in the Ruins
JA WestenbergWestenberg makes the argument I keep wanting to make but don’t: social platforms optimized for engagement have hollowed out the kind of slow, considered thinking that blogs made possible. The case for owning your words and building something that lasts.
No entries with that tag yet.