-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory check before inference to avoid VAE Decode using exceeded VRAM. #5745
base: master
Are you sure you want to change the base?
Memory check before inference to avoid VAE Decode using exceeded VRAM. #5745
Conversation
e85d80f
to
58eb317
Compare
comfy/sd.py
Outdated
logging.debug(f"Free memory: {free_memory} bytes, predicted memory useage of one batch: {memory_used} bytes") | ||
if free_memory < memory_used: | ||
logging.debug("Possible out of memory is detected, try to free memory.") | ||
model_management.free_memory(memory_used, self.device, [self.patcher]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the load_models_gpu function aleady calls free_memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got that. So there's no need for two checks and additional free_memory.
comfy/sd.py
Outdated
logging.debug(f"Free memory: {free_memory} bytes") | ||
if free_memory < memory_used: | ||
logging.warning("Warning: Out of memory is predicted for regular VAE decoding, directly switch to tiled VAE decoding.") | ||
predicted_oom = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for actually trying is because the memory estimation might not be accurate and will overestimate the amount of memory so it is better to try the decoding.
The proper way to solve the issue would be to free the memory properly on OOM before doing tiled decode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we need to know how to properly free the VRAM after OOM first...
I don't think it's suitable to just destroy entire model object and then reload it.
But actually sometimes I found OOM didn't occur at all, and it just continued running and consumed a lot of shared GPU memory and became very slow. This happened randomly.
Another point is what's the drawback to use tiled decode? Looks like tiled decode isn't slow, at least on my computer. So why we so care about overestimate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiled decode gives lower quality images/videos.
Check if free memory is not less than expected before doing actual decoding, and if it fails, switch to tiled VAE decoding directly. It seems PyTorch may continue occupying memory until the model is destroyed after OOM occurs. This commit tries to avoid OOM from happening in the first place for VAE Decode. This is for VAE Decode ran with exceeded VRAM from comfyanonymous#5737.
58eb317
to
a3b9b3c
Compare
Check if free memory is not less than expected before doing actual decoding, and if it fails, try to free for required amount of memory, and if it still fails, switch to tiled VAE decoding directly.
It seems PyTorch may continue occupying memory until the model is destroyed after OOM occurs. This commit tries to avoid OOM from happening in the first place for VAE Decode.
This is for VAE Decode ran with exceeded VRAM from #5737.