The evaluation metrics are top-1 and top-5 video-level accuracy on Kinetics-400 validation set. The Finetuned model
column provides models either trained from scratch or fine-tuned on Kinetics. The Pre-trained model
column provides models pre-trained on Sports1M or IG-65M without further finetuning on Kinetics.
Input size |
Pre-trained dataset |
Pre-trained model |
Video@1 |
Video@5 |
Finetuned model |
GFLOPs |
params(M) |
16x112x112 |
None |
None |
66.6 |
86.7 |
link |
38.5 |
64.9 |
16x112x112 |
Sports1M |
link |
67.4 |
87.2 |
link |
38.5 |
64.9 |
Input size |
Pre-trained dataset |
Pre-trained model |
Video@1 |
Video@5 |
Finetuned model |
GFLOPs |
params(M) |
8x112x122 |
IG-65M |
link |
74.9 |
91.8 |
link |
49.8 |
63.6 |
32x112x122 |
IG-65M |
link |
80.0 |
94.2 |
link |
199.0 |
63.6 |
Input size |
Pre-trained dataset |
Pre-trained model |
Video@1 |
Video@5 |
Finetuned model |
GFLOPs |
params(M) |
32x112x122 |
None |
None |
77.3 |
92.5 |
link |
329.1 |
118.0 |
32x112x122 |
Sports1M |
link |
79.5 |
94.0 |
link |
329.1 |
118.0 |
32x112x112 |
IG-65M |
link |
81.6 |
95.3 |
link |
329.1 |
118.0 |
Input size |
Pre-trained dataset |
Pre-trained model |
Video@1 |
Video@5 |
Finetuned model |
GFLOPs |
params(M) |
32x224x224 |
None |
None |
76.5 |
92.1 |
link |
96.7 |
29.6 |
32x224x224 |
Sports1M |
link |
78.2 |
93.0 |
link |
96.7 |
29.6 |
32x224x224 |
IG-65M |
link |
82.6 |
95.3 |
link |
96.7 |
29.6 |
Input size |
Pre-trained dataset |
Pre-trained model |
Video@1 |
Video@5 |
Finetuned model |
GFLOPs |
params(M) |
32x224x224 |
None |
None |
77.8 |
92.8 |
link |
108.8 |
32.8 |
32x224x224 |
Sports1M |
link |
78.8 |
93.5 |
link |
108.8 |
32.8 |
32x224x224 |
IG-65M |
link |
82.5 |
95.3 |
link |
108.8 |
32.8 |