attention-gym/README.md at master · jiashenC/attention-gym · GitHub

Attention Gym

class 101

Implement attention operator using basic PyTorch functions to match PyTorch MultiAttention behavior.

class 102

Implement the attention operator in CUDA.

class 201

Implement the flash attention operator using basic PyTorch functions (emulation for understanding).

class 202

Implement flash attention operator in CUDA.