<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>PagedAttention on 安橙的博客</title><link>https://blog.ans20xx.com/tags/pagedattention/</link><description>Recent content in PagedAttention on 安橙的博客</description><generator>Hugo -- 0.163.3</generator><language>zh</language><lastBuildDate>Sat, 20 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.ans20xx.com/tags/pagedattention/index.xml" rel="self" type="application/rss+xml"/><item><title>Day 31 · PagedAttention &amp; vLLM</title><link>https://blog.ans20xx.com/posts/ai/day31/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day31/</guid><description>学习 PagedAttention 与 vLLM 的核心机制:为什么 KV Cache 会浪费显存,如何用 block table 管理逻辑块到物理块的映射,copy-on-write 如何支撑并行采样和 beam search,以及这些机制如何服务高吞吐 LLM serving。</description></item><item><title>Day 32 · vLLM 实战</title><link>https://blog.ans20xx.com/posts/ai/day32/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0800</pubDate><guid>https://blog.ans20xx.com/posts/ai/day32/</guid><description>动手部署一个 7B 模型到 vLLM,开启 OpenAI 兼容 API,学习 --max-num-seqs 与 --gpu-memory-utilization 的调参方法,并建立推理服务压测与排错流程。</description></item></channel></rss>