Software engineer. Having fun by learning how things work, from written code to compilation to machine instruction. C, C++, and Python.

I try to understand each part of the stack, from top level software framework like PyTorch, Transformer layers, to hardware instrinsics like efficient CUDA kernel, memory bandwidth, pipelining, SASS/PTX instructions. This blog serves to document the learnings from those investigations.

I can be reached at the LinkedIn page.