<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>DDP on 安橙的博客</title><link>https://blog.ans20xx.com/tags/ddp/</link><description>Recent content in DDP on 安橙的博客</description><generator>Hugo -- 0.163.3</generator><language>zh</language><lastBuildDate>Sat, 20 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.ans20xx.com/tags/ddp/index.xml" rel="self" type="application/rss+xml"/><item><title>Day 15 · 分布式基础</title><link>https://blog.ans20xx.com/posts/ai/day15/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day15/</guid><description>进入 AI Infra 分布式训练阶段:理解进程组、rank/world_size、torchrun 启动模型,掌握 AllReduce、AllGather、ReduceScatter、Broadcast 四类集合通信,并跑通一个 DDP MNIST。</description></item><item><title>Day 17 · 数据并行 DP/DDP</title><link>https://blog.ans20xx.com/posts/ai/day17/</link><pubDate>Sat, 20 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.ans20xx.com/posts/ai/day17/</guid><description>进入分布式训练的第一条主线:从 DataParallel 到 DistributedDataParallel,拆开梯度同步时机、Reducer、bucket、overlap 与 no_sync;阅读 torch/nn/parallel/distributed.py 关键路径,并用 torchrun 跑一个可观测的 DDP 实验。</description></item></channel></rss>