how to timing 'model.to(device)' correctly? #8

juinshell · 2022-06-02T11:49:36Z

I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result.
Here is my code:

import torch.nn as nn
device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')
model = torchvision.models.resnet152(pretrained=True)

def todevice(_model_, _device_=device):
    T0 = time.perf_counter()
    _model_.to(_device_)
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000)))

model1 = nn.Sequential(*list(resnet152.children())[:6])
todevice(model1)

When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms.
Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result.
Could you tell me how to timing model.to(device) correctly?

The text was updated successfully, but these errors were encountered:

SimonZsx · 2022-06-02T13:17:26Z

You need to insert a torch.cuda.synchronize() before and after the operation you want to measure.

…

在 2022年6月2日，19:49，husterdjx ***@***.***> 写道： I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result. Here is my code: import torch.nn as nn device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu') model = torchvision.models.resnet152(pretrained=True) def todevice(_model_, _device_=device): T0 = time.perf_counter() _model_.to(_device_) torch.cuda.synchronize() T1 = time.perf_counter() print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000))) model1 = nn.Sequential(*list(resnet152.children())[:6]) todevice(model1) When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms. Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result. Could you tell me how to timing model.to(device) correctly? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

juinshell · 2022-06-02T13:25:10Z

You need to insert a torch.cuda.synchronize() before and after the operation you want to measure.
…
在 2022年6月2日，19:49，husterdjx @.***> 写道： I am using pytorch's api in my python code to measure time for different layers of resnet152 to device(GPU, V-100).However, I cannot get a stable result. Here is my code: import torch.nn as nn device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu') model = torchvision.models.resnet152(pretrained=True) def todevice(model, device=device): T0 = time.perf_counter() model.to(device) torch.cuda.synchronize() T1 = time.perf_counter() print("model to device %s cost:%s ms" % (device, ((T1 - T0) * 1000))) model1 = nn.Sequential(*list(resnet152.children())[:6]) todevice(model1) When I use the code to test at different time, I can always get different answers, some of them are ridiculous, even to 200ms. Also, there are 4 GPU in my lab, I don't know whether other extra GPUs will affect my result. Could you tell me how to timing model.to(device) correctly? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

like this?

def todevice(_model_, _device_=device):
    torch.cuda.synchronize()
    T0 = time.perf_counter()
    _model_.to(_device_)
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    print("model to device %s cost:%s ms" % (_device_, ((T1 - T0) * 1000)))

Unfortunately, I still get unstable result from the same program...

xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:3083.652761997655 ms
model to device cuda:3 cost:2.308813011040911 ms
model to device cuda:3 cost:11.649759981082752 ms
model to device cuda:3 cost:143.4171750152018 ms
model to device cuda:3 cost:42.07298799883574 ms
model to device cuda:3 cost:0.03912401734851301 ms
model to device cuda:3 cost:5.487112997798249 ms
xxx:/workspace/pytorch# python layer.py --cuda_device=3
model to device cuda:3 cost:2506.3964820001274 ms
model to device cuda:3 cost:2.7847559831570834 ms
model to device cuda:3 cost:12.948957009939477 ms
model to device cuda:3 cost:244.6330439997837 ms
model to device cuda:3 cost:26.824778993614018 ms
model to device cuda:3 cost:0.03645301330834627 ms
model to device cuda:3 cost:3.0167640070430934 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to timing 'model.to(device)' correctly? #8

how to timing 'model.to(device)' correctly? #8

juinshell commented Jun 2, 2022

SimonZsx commented Jun 2, 2022 via email

juinshell commented Jun 2, 2022

how to timing 'model.to(device)' correctly? #8

how to timing 'model.to(device)' correctly? #8

Comments

juinshell commented Jun 2, 2022

SimonZsx commented Jun 2, 2022 via email

juinshell commented Jun 2, 2022