method | SM | thread |
---|---|---|
notify dispatch | 1 + 8 | 128 |
dispatch | 20 | 128 |
combine | 20 | 768 |
method | SM | thread |
---|---|---|
notify dispatch | 1 + 8 (1 send, 8 receive) | 128 |
dispatch | 20 (even send, odd receive) | 128 |
combine | 20 (even send, odd receive) | 768 |
idea | description | location |
---|---|---|
dual stream communication | toggling communication/compute stream | DeepEP/csrc/deep_ep.cpp |
out-of-doc PTX load/store | by-pass L1 cache load/store | DeepEP/csrc/kernels/utils.cuh |
warp specialization | kernel do branching in same warp | DeepEP/csrc/kernels/intranode.cu |
topology-aware routing | forward either on IB or NVLink | DeepEP/deep_ep/buffer.py |