@bluedevil "Some coverage of this project has overstated its implications. To be clear:
Training works, but utilization is low (~2-3% of peak) with significant engineering challenges remaining
Many element-wise operations still fall back to CPU
This does not replace GPU training for anything beyond small research models today"