Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to timing 'model.to(device)' correctly? #8

Open
juinshell opened this issue Jun 2, 2022 · 2 comments
Open

how to timing 'model.to(device)' correctly? #8

juinshell opened this issue Jun 2, 2022 · 2 comments

Comments

@juinshell
Copy link

I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result.
Here is my code:

import torch.nn as nn
device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')
model = torchvision.models.resnet152(pretrained=True)

def todevice(_model_, _device_=device):
    T0 = time.perf_counter()
    _model_.to(_device_)
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000)))

model1 = nn.Sequential(*list(resnet152.children())[:6])
todevice(model1)

When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms.
Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result.
Could you tell me how to timing model.to(device) correctly?

@SimonZsx
Copy link

SimonZsx commented Jun 2, 2022 via email

@juinshell
Copy link
Author

You need to insert a torch.cuda.synchronize() before and after the operation you want to measure.

在 2022年6月2日,19:49,husterdjx @.***> 写道:  I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result. Here is my code: import torch.nn as nn device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu') model = torchvision.models.resnet152(pretrained=True) def todevice(model, device=device): T0 = time.perf_counter() model.to(device) torch.cuda.synchronize() T1 = time.perf_counter() print("model to device %s cost:%s ms" % (device, ((T1 - T0) * 1000))) model1 = nn.Sequential(*list(resnet152.children())[:6]) todevice(model1) When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms. Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result. Could you tell me how to timing model.to(device) correctly? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

like this?

def todevice(_model_, _device_=device):
    torch.cuda.synchronize()
    T0 = time.perf_counter()
    _model_.to(_device_)
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000)))

Unfortunately, I still get unstable result from the same program...

xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:3083.652761997655 ms
model to device cuda:3 cost:2.308813011040911 ms
model to device cuda:3 cost:11.649759981082752 ms
model to device cuda:3 cost:143.4171750152018 ms
model to device cuda:3 cost:42.07298799883574 ms
model to device cuda:3 cost:0.03912401734851301 ms
model to device cuda:3 cost:5.487112997798249 ms
xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:2506.3964820001274 ms
model to device cuda:3 cost:2.7847559831570834 ms
model to device cuda:3 cost:12.948957009939477 ms
model to device cuda:3 cost:244.6330439997837 ms
model to device cuda:3 cost:26.824778993614018 ms
model to device cuda:3 cost:0.03645301330834627 ms
model to device cuda:3 cost:3.0167640070430934 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants