New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GPU memory leak in SentenceTransformerTrainer.train #3204

Open

captify-isemaniuk opened this issue Jan 30, 2025 · 0 comments

captify-isemaniuk commented Jan 30, 2025

I used for reference toy example from
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/other/training_wikipedia_sections.py
with small changes:

i added multiple iterations:
for it in range(5)
used only first 6 steps
in the end of every iteration i added:

    del train_dataset, eval_dataset, test_dataset, dev_evaluator, train_loss, args, model, trainer

    gc.collect()
    torch.cuda.empty_cache()

    print(f'iter: {it} memory_allocated: {torch.cuda.memory_allocated() / 1024**3}')
    print(f'iter: {it} memory_reserved:  {torch.cuda.memory_reserved() / 1024**3}')

Results:
iter: 0 memory_allocated: 0.2649421691894531
iter: 0 memory_reserved: 0.3046875

iter: 1 memory_allocated: 0.5132217407226562
iter: 1 memory_reserved: 0.548828125

iter: 2 memory_allocated: 0.7610130310058594
iter: 2 memory_reserved: 0.8125

iter: 3 memory_allocated: 1.0092926025390625
iter: 3 memory_reserved: 1.076171875

iter: 4 memory_allocated: 1.2575721740722656
iter: 4 memory_reserved: 1.33984375

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment