-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to timing 'model.to(device)' correctly? #8
Comments
You need to insert a torch.cuda.synchronize() before and after the operation you want to measure.
…
在 2022年6月2日,19:49,husterdjx ***@***.***> 写道:
I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result.
Here is my code:
import torch.nn as nn
device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')
model = torchvision.models.resnet152(pretrained=True)
def todevice(_model_, _device_=device):
T0 = time.perf_counter()
_model_.to(_device_)
torch.cuda.synchronize()
T1 = time.perf_counter()
print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000)))
model1 = nn.Sequential(*list(resnet152.children())[:6])
todevice(model1)
When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms.
Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result.
Could you tell me how to timing model.to(device) correctly?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
like this? def todevice(_model_, _device_=device):
torch.cuda.synchronize()
T0 = time.perf_counter()
_model_.to(_device_)
torch.cuda.synchronize()
T1 = time.perf_counter()
print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000))) Unfortunately, I still get unstable result from the same program... xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:3083.652761997655 ms
model to device cuda:3 cost:2.308813011040911 ms
model to device cuda:3 cost:11.649759981082752 ms
model to device cuda:3 cost:143.4171750152018 ms
model to device cuda:3 cost:42.07298799883574 ms
model to device cuda:3 cost:0.03912401734851301 ms
model to device cuda:3 cost:5.487112997798249 ms
xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:2506.3964820001274 ms
model to device cuda:3 cost:2.7847559831570834 ms
model to device cuda:3 cost:12.948957009939477 ms
model to device cuda:3 cost:244.6330439997837 ms
model to device cuda:3 cost:26.824778993614018 ms
model to device cuda:3 cost:0.03645301330834627 ms
model to device cuda:3 cost:3.0167640070430934 ms |
I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result.
Here is my code:
When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to
200ms
.Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result.
Could you tell me how to timing
model.to(device)
correctly?The text was updated successfully, but these errors were encountered: