<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Megatron-LM on 安橙的博客</title><link>https://blog.ans20xx.com/tags/megatron-lm/</link><description>Recent content in Megatron-LM on 安橙的博客</description><generator>Hugo -- 0.163.3</generator><language>zh</language><lastBuildDate>Sat, 20 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.ans20xx.com/tags/megatron-lm/index.xml" rel="self" type="application/rss+xml"/><item><title>Day 19 · Tensor Parallel</title><link>https://blog.ans20xx.com/posts/ai/day19/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day19/</guid><description>深入 Megatron-LM Tensor Parallel:理解列并行与行并行 Linear 的矩阵切分、通信边界、MLP 和 Attention 的 TP 布局;手画 Transformer block 的张量切分图,看懂 tensor_model_parallel_size 如何影响显存、计算与通信。</description></item><item><title>Day 22 · 3D / 4D 并行实战</title><link>https://blog.ans20xx.com/posts/ai/day22/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day22/</guid><description>把 Day19-21 的 TP、PP、DP、SP/CP 组合起来,在单机多卡上用 Megatron-LM 跑一个小 GPT,并通过调整 tensor-model-parallel-size 与 pipeline-model-parallel-size 理解并行维度的取舍。</description></item><item><title>Day 28 · 周复盘 + 小项目</title><link>https://blog.ans20xx.com/posts/ai/day28/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day28/</guid><description>阶段 2 收官:复盘分布式训练 Infra 的 NCCL、DDP、ZeRO、TP、PP、SP/CP、DeepSpeed、checkpoint、data pipeline、算子加速与 profiling;在 2 卡或云上 8 卡训练一个约 125M GPT,记录 MFU,并完成 ZeRO-3 vs TP+PP 的硬件取舍笔记。</description></item></channel></rss>