Story

Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs

jtatarchuk Tuesday, April 01, 2025

Summary

The article discusses the training of large language models, focusing on the Qwerky-72B and Qwerky-32B models. It explores the computational and engineering challenges involved in training these models, as well as their potential capabilities and applications.

8 2

Summary

substack.recursal.ai

Visit article Read on Hacker News Comments 2