Story

Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs

jtatarchuk Tuesday, April 01, 2025
Summary
The article discusses the training of large language models, focusing on the Qwerky-72B and Qwerky-32B models. It explores the computational and engineering challenges involved in training these models, as well as their potential capabilities and applications.
8 2
Summary
substack.recursal.ai
Visit article Read on Hacker News Comments 2