Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs
jtatarchuk Tuesday, April 01, 2025
Summary
The article discusses the training of large language models, focusing on the Qwerky-72B and Qwerky-32B models. It explores the computational and engineering challenges involved in training these models, as well as their potential capabilities and applications.
8
2
Summary
substack.recursal.ai