Transformer
1. Architecture
Annotated Transformer, in PyTorch http://nlp.seas.harvard.edu/annotated-transformer/
Illustrated GPT-2 https://jalammar.github.io/illustrated-gpt2/
2. Scaling
GPU Book https://huggingface.co/spaces/nanotron/ultrascale-playbook