Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 368 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 368 Bytes

Attention Gym

class 101

Implement attention operator using basic PyTorch functions to match PyTorch MultiAttention behavior.

class 102

Implement the attention operator in CUDA.

class 201

Implement the flash attention operator using basic PyTorch functions (emulation for understanding).

class 202

Implement flash attention operator in CUDA.