Memory leak in `remap` #134

liamkinne · 2024-09-16T00:07:24Z

There is some amount of memory being allocated during the remap function that isn't getting cleaned up.

This is with the latest commit on main. So I imagine it's something to do with rayon or the new tensor implementation.

This simple program will slowly grow in memory usage:

fn main() {
    let size = ImageSize {
        width: 1440,
        height: 1440,
    };

    let correction_map = correction_map(&size);

    loop {
        let input = Image::<u8, 3>::from_size_val(size, 0).unwrap();
        let mut output = Image::<f32, 3>::from_size_val(size, 0.0).unwrap();

        remap(
            &input.cast().unwrap(),
            &mut output,
            &correction_map.0,
            &correction_map.1,
            InterpolationMode::Bilinear,
        )
        .unwrap();
    }
}

The text was updated successfully, but these errors were encountered:

edgarriba · 2024-09-16T05:50:13Z

That’s weird, however the idea with the new function signature allowing mutable outputs is that you allocate once outside of the for loop. If you check in the benches is how I’m doing.

If you’re up to, would be great to provide a bench for remap using also [flamegraph](https://github.com/flamegraph-rs/flamegraph) or similar to understand better the memory usage pattern.

BTW, remap is not the fastest implementation for now, it needs some optimisations as eg. the interpolation function it´s computing everytime the offset indices of the tensor from where to interpolate which could be done as a pre-step by iterating row by row with step 1. A second one, would be using simd to compute the weighted interpolation.

emilmgeorge · 2024-09-20T13:21:21Z

I couldn't reproduce the growing memory usage issue while running for about 5 minutes. But, when testing with valgrind, an unrelated leak is found in the correction_map I used (included below). (No confirmed leaks within the original snippet).

This is the correction_map that I used.

use kornia::image::{Image, ImageSize};
use kornia::core::Tensor2;
use kornia::imgproc::interpolation::{remap, InterpolationMode, grid::meshgrid_from_fn};

fn correction_map(size: &ImageSize) -> (Tensor2<f32>, Tensor2<f32>) {
    // Identity mapping
    let (map_x, map_y) = meshgrid_from_fn(size.width, size.height, |x, y| {
        Ok((x as f32, y as f32))
    }).unwrap();
    (map_x, map_y)
}

Valgrind log

==419002== Memcheck, a memory error detector
==419002== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==419002== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==419002== Command: target/debug/mem_test
==419002== Parent PID: 418734
==419002== 
==419002== 
==419002== HEAP SUMMARY:
==419002==     in use at exit: 16,653,640 bytes in 129 blocks
==419002==   total heap usage: 230 allocs, 101 frees, 229,673,704 bytes allocated
==419002== 
==419002== 2,432 bytes in 8 blocks are possibly lost in loss record 16 of 23
==419002==    at 0x484BC13: calloc (vg_replace_malloc.c:1675)
==419002==    by 0x4011003: calloc (rtld-malloc.h:44)
==419002==    by 0x4011003: allocate_dtv (dl-tls.c:395)
==419002==    by 0x4011AE1: _dl_allocate_tls (dl-tls.c:664)
==419002==    by 0x4A44F14: allocate_stack (allocatestack.c:429)
==419002==    by 0x4A44F14: pthread_create@@GLIBC_2.34 (pthread_create.c:655)
==419002==    by 0x18AA61: std::sys::pal::unix::thread::Thread::new (thread.rs:87)
==419002==    by 0x14F581: std::thread::Builder::spawn_unchecked_ (mod.rs:577)
==419002==    by 0x14EA5E: std::thread::Builder::spawn_unchecked (mod.rs:456)
==419002==    by 0x15013D: std::thread::Builder::spawn (mod.rs:388)
==419002==    by 0x13D92E: <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (registry.rs:98)
==419002==    by 0x13EDFD: rayon_core::registry::Registry::new (registry.rs:304)
==419002==    by 0x13DD5B: rayon_core::registry::default_global_registry (registry.rs:201)
==419002==    by 0x13B9E0: core::ops::function::FnOnce::call_once (function.rs:250)
==419002== 
==419002== 8,294,400 bytes in 1 blocks are definitely lost in loss record 22 of 23
==419002==    at 0x48447A8: malloc (vg_replace_malloc.c:446)
==419002==    by 0x162139: std::sys::pal::unix::alloc::<impl core::alloc::global::GlobalAlloc for std::alloc::System>::alloc (alloc.rs:14)
==419002==    by 0x161F3E: <kornia_core::allocator::CpuAllocator as kornia_core::allocator::TensorAllocator>::alloc (allocator.rs:58)
==419002==    by 0x12FA86: kornia_core::storage::TensorStorage<T,A>::new (storage.rs:55)
==419002==    by 0x12F817: kornia_core::tensor::Tensor<T,_,A>::new_uninitialized (tensor.rs:112)
==419002==    by 0x121EE3: kornia_imgproc::interpolation::grid::meshgrid_from_fn (grid.rs:43)
==419002==    by 0x11A3FF: mem_test::correction_map (main.rs:7)
==419002==    by 0x11A567: mem_test::main (main.rs:19)
==419002==    by 0x11E1FA: core::ops::function::FnOnce::call_once (function.rs:250)
==419002==    by 0x1241ED: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152)
==419002==    by 0x1253D0: std::rt::lang_start::{{closure}} (rt.rs:162)
==419002==    by 0x18093F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==419002==    by 0x18093F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:557)
==419002==    by 0x18093F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:521)
==419002==    by 0x18093F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:350)
==419002==    by 0x18093F: {closure#2} (rt.rs:141)
==419002==    by 0x18093F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:557)
==419002==    by 0x18093F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:521)
==419002==    by 0x18093F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:350)
==419002==    by 0x18093F: std::rt::lang_start_internal (rt.rs:141)
==419002== 
==419002== 8,294,400 bytes in 1 blocks are definitely lost in loss record 23 of 23
==419002==    at 0x48447A8: malloc (vg_replace_malloc.c:446)
==419002==    by 0x162139: std::sys::pal::unix::alloc::<impl core::alloc::global::GlobalAlloc for std::alloc::System>::alloc (alloc.rs:14)
==419002==    by 0x161F3E: <kornia_core::allocator::CpuAllocator as kornia_core::allocator::TensorAllocator>::alloc (allocator.rs:58)
==419002==    by 0x12FA86: kornia_core::storage::TensorStorage<T,A>::new (storage.rs:55)
==419002==    by 0x12F817: kornia_core::tensor::Tensor<T,_,A>::new_uninitialized (tensor.rs:112)
==419002==    by 0x121FEE: kornia_imgproc::interpolation::grid::meshgrid_from_fn (grid.rs:44)
==419002==    by 0x11A3FF: mem_test::correction_map (main.rs:7)
==419002==    by 0x11A567: mem_test::main (main.rs:19)
==419002==    by 0x11E1FA: core::ops::function::FnOnce::call_once (function.rs:250)
==419002==    by 0x1241ED: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152)
==419002==    by 0x1253D0: std::rt::lang_start::{{closure}} (rt.rs:162)
==419002==    by 0x18093F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==419002==    by 0x18093F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:557)
==419002==    by 0x18093F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:521)
==419002==    by 0x18093F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:350)
==419002==    by 0x18093F: {closure#2} (rt.rs:141)
==419002==    by 0x18093F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:557)
==419002==    by 0x18093F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:521)
==419002==    by 0x18093F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:350)
==419002==    by 0x18093F: std::rt::lang_start_internal (rt.rs:141)
==419002== 
==419002== LEAK SUMMARY:
==419002==    definitely lost: 16,588,800 bytes in 2 blocks
==419002==    indirectly lost: 0 bytes in 0 blocks
==419002==      possibly lost: 2,432 bytes in 8 blocks
==419002==    still reachable: 62,408 bytes in 119 blocks
==419002==         suppressed: 0 bytes in 0 blocks
==419002== Reachable blocks (those to which a pointer was found) are not shown.
==419002== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==419002== 
==419002== For lists of detected and suppressed errors, rerun with: -s
==419002== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

The leak detected by valgrind is here:

kornia-rs/crates/kornia-core/src/storage.rs

Lines 60 to 64 in a652662

    
           Buffer::from_custom_allocation( 
        
               NonNull::new_unchecked(ptr), 
        
               len * std::mem::size_of::<T>(), 
        
               Arc::new(Vec::<T>::with_capacity(len)), 
        
           )

The issue is that the memory allocated in ptr won't be freed when the Arc::new(Vec::<T>::with_capacity(len)) is freed.

edgarriba · 2024-09-20T23:11:22Z

@emilmgeorge, thanks for your investigation. It seems there’s definitely an issue with Buffer::from_custom_allocation or possibly the way I implemented it initially. I've opened a ticket to address this: kornia-rs/issues/127.

It might be worthwhile to revisit the Tensor constructor API and the use of the Allocator.

Here are my thoughts on the design:

Make the Tensor struct extensible via Buffer::Allocation, allowing it to support data from various backends, such as CUDA, Wpu, etc.
For instance, one could implement a CudaAllocator and initialize a Tensor by allocating CUDA memory, or by using a pre-allocated CUDA pointer. This could be particularly useful for implementing image operations directly on CUDA, like a remap function.
Ideally, the kornia-core crate and the Tensor struct should be lightweight, acting primarily as a managed data container. This would keep it decoupled from operations and allow for interchangeable use with other library containers at zero copy cost.

edgarriba added the bug Something isn't working label Sep 20, 2024

emilmgeorge mentioned this issue Sep 23, 2024

Fix TensorStorage memory deallocation #145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in `remap` #134

Memory leak in `remap` #134

liamkinne commented Sep 16, 2024 •

edited

Loading

edgarriba commented Sep 16, 2024

emilmgeorge commented Sep 20, 2024

edgarriba commented Sep 20, 2024

Memory leak in remap #134

Memory leak in remap #134

Comments

liamkinne commented Sep 16, 2024 • edited Loading

edgarriba commented Sep 16, 2024

emilmgeorge commented Sep 20, 2024

edgarriba commented Sep 20, 2024

Memory leak in `remap` #134

Memory leak in `remap` #134

liamkinne commented Sep 16, 2024 •

edited

Loading