add interleaved versions of phase/cartToPolar/polarToCart #3607

chacha21 · 2023-12-12T10:21:07Z

This PR is for performance only (at the cost of more template code and increased GPU code size) The additional variants can help the caller skip the creation of temporary GPU mats (where memory is more likely to be a critical resource), and can even allow in-place processing. magnitude/angles/x/y are often already interleaved when dealing with DFTs.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

This PR is for performance only (at the cost of more template code and increased GPU code size) The additional variants can help the caller skip the creation of temporary GPU mats (where memory is more likely to be a critical resource), and can even allow in-place processing. magnitude/angles/x/y are often already interleaved when dealing with DFTs.

asmorkalov · 2023-12-26T06:42:01Z

@cudawarped could you take a look?

cudawarped · 2023-12-26T07:47:28Z

@cudawarped could you take a look?

Of course, but I may not have time before the release of 4.9.0.

additional "typename" disambiguifiers are required by some compilers

cudawarped · 2024-01-01T06:06:12Z

modules/cudaarithm/src/cuda/polar_cart.cu

+
+    GpuMat dst = getOutputMat(_dst, xy.size(), CV_32FC1, stream);
+
+    GpuMat_<float2> xyc(xy.reshape(2));


I know the existing functions use reshape to convert a GpuMat to a GpuMat_ before being passed to gridTransformxxx but I find this confusing because nothing is being reshaped, would it not be better to use globPtr<> directly on the GpuMats. e.g.

if (angleInDegrees) gridTransformUnary(globPtr<float2>(xy), globPtr<float>(dst), direction_interleaved_func<float2, true>(), stream);

If so the existing routines could be updated to remove the bloat.

cudawarped · 2024-01-01T06:07:29Z

modules/cudaarithm/src/cuda/polar_cart.cu

+    GpuMat mag = getOutputMat(_mag, xy.size(), CV_32FC1, stream);
+    GpuMat angle = getOutputMat(_angle, xy.size(), CV_32FC1, stream);
+
+    GpuMat_<float2> xyc(xy.reshape(2));


cudawarped · 2024-01-01T06:07:55Z

modules/cudaarithm/src/cuda/polar_cart.cu

+
+    GpuMat magAngle = getOutputMat(_magAngle, xy.size(), CV_32FC2, stream);
+
+    GpuMat_<float2> xyc(xy.reshape(2));


cudawarped · 2024-01-01T06:16:20Z

modules/cudaarithm/src/cuda/polar_cart.cu

@@ -192,6 +276,49 @@ namespace
        ymat(y, x) = mag_val * sin_a;
    }

+    template <typename T, bool useMag>
+    __global__ void polarToCartDstInterleavedImpl_(const GlobPtr<T> mag, const GlobPtr<T> angle, GlobPtr<typename MakeVec<T, 2>::type > xymat, const T scale, const int rows, const int cols)


If you use PtrStep<T> for mag, angle and xymat then you pass them directly to this function instead of using the intermediate GpuMat_<T> with reshape. Additionaly if angle is a PtrStepSz<T> then you don't need to pass cols and rows seperately. e.g.

__global__ void polarToCartDstInterleavedImpl_(const PtrStep<T> mag, const PtrStepSz<T> angle, PtrStep<typename MakeVec<T, 2>::type > xymat, const T scale, const int rows, const int cols) { typedef typename MakeVec<T, 2>::type T2; const int x = blockDim.x * blockIdx.x + threadIdx.x; const int y = blockDim.y * blockIdx.y + threadIdx.y; if (x >= angle.cols || y >= angle.rows) return;

You can also make this adjustment to all the other polarToCart kernel calls including the existing one.

cudawarped · 2024-01-01T06:22:45Z

modules/cudaarithm/src/cuda/polar_cart.cu

+    void polarToCartDstInterleavedImpl(const GpuMat& mag, const GpuMat& angle, GpuMat& xy, bool angleInDegrees, cudaStream_t& stream)
+    {
+        typedef typename MakeVec<T, 2>::type T2;
+        GpuMat_<T2> xyc(xy.reshape(2));


If you switch to PtrStep and PtrStepSz inside polarToCartDstInterleavedImpl_ then this can be simplified to

template <typename T> void polarToCartDstInterleavedImpl(const GpuMat& mag, const GpuMat& angle, GpuMat& xy, bool angleInDegrees, cudaStream_t& stream) { const dim3 block(32, 8); const dim3 grid(divUp(angle.cols, block.x), divUp(angle.rows, block.y)); const T scale = angleInDegrees ? static_cast<T>(CV_PI / 180.0) : static_cast<T>(1.0); if (mag.empty()) polarToCartDstInterleavedImpl_<T, false> << <grid, block, 0, stream >> >(mag, angle, xy, scale, angle.rows, angle.cols); else polarToCartDstInterleavedImpl_<T, true> << <grid, block, 0, stream >> >(mag, angle, xy, scale, angle.rows, angle.cols); }

cudawarped · 2024-01-01T06:24:58Z

modules/cudaarithm/test/test_element_operations.cpp

+
+    cv::cuda::GpuMat dstX1Y1 = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::GpuMat dstXY2 = createMat(size, CV_32FC1, useRoi);
+    cv::cuda::phase(loadMat(x, useRoi), loadMat(y, useRoi), dstX1Y1, angleInDegrees);


If you have a test case per function and compare the results to the CPU version it will make it easier to maintain going forward.

cudawarped · 2024-01-01T06:27:12Z

modules/cudaarithm/test/test_element_operations.cpp

+    angle = randomMat(size, type);
+    cv::Mat magnitudeAngle;
+    cv::merge(magnitudeAngleChannels, 2, magnitudeAngle);
+    const double tol = (type == CV_32FC1 ? 1.6e-4 : 1e-4) * (angleInDegrees ? 1.0 : 19.47);


Again I would suggest a test case per function, comparing to cv::polarToCart especially if you are going to use these tolerances. In this funciton you could now be 2*tol away from the CPU result.

use globPtr() and PtrStepSz<> to bypass confusing reshape() refactor tests

…opencv_contrib into cuda_phase_interleaved

the "empty mag" feature is useless for interleaved case get row/col size from angle mat rather than mag mat than could be empty in other cases

cudawarped · 2024-01-09T08:37:46Z

modules/cudaarithm/src/cuda/polar_cart.cu

+        const int x = blockDim.x * blockIdx.x + threadIdx.x;
+        const int y = blockDim.y * blockIdx.y + threadIdx.y;
+
+        if (x >= xymat.cols || y >= xymat.rows)


Try to keep the out of range check consistent. I realize mag can be empty but your using xymat, angle and magAngle. Maybe stick with angle and magAngle.

Then you only need to use PtrStepSz for angle/magAngle the other inputs can be PtrStep.

cudawarped · 2024-01-09T08:49:52Z

modules/cudaarithm/src/cuda/polar_cart.cu

+    GpuMat mag = getOutputMat(_mag, xy.size(), CV_32FC1, stream);
+    GpuMat angle = getOutputMat(_angle, xy.size(), CV_32FC1, stream);
+
+    GpuMat_<float> magc(mag.reshape(1));


Can we remove the reshape completely, looking at it again it doesn't do anything? i.e.
GpuMat_ magc(mag);

cudawarped · 2024-01-09T09:00:26Z

modules/cudaarithm/test/test_element_operations.cpp

@@ -2809,6 +2850,97 @@ INSTANTIATE_TEST_CASE_P(CUDA_Arithm, CartToPolar, testing::Combine(
    testing::Values(AngleInDegrees(false), AngleInDegrees(true)),
    WHOLE_SUBMAT));

+PARAM_TEST_CASE(CartToPolarInterleaved1, cv::cuda::DeviceInfo, cv::Size, AngleInDegrees, UseRoi)


Can you give these test cases more informative names. Worst case scenario you could use CartToPolarInputInterleaved, CartToPolarInputOutputInterleaved, PolarToCartOutputInterleaved, PolarToCartInputOutputInterleaved if you can't think of anything better.

code style and simplifications

cudawarped

LGTM 👍
Passing tests on Windows 11, VS 2022 with CUDA 12.3.
@asmorkalov can you take a look.

cudawarped · 2024-01-09T14:22:44Z

@chacha21 You will need to squash and rebase this onto the tip of the 4.x branch as the CUDA CMake configuration has changed in the main repo since you submited this PR so I think it will fail on the CI.

chacha21 · 2024-01-09T16:11:32Z

@chacha21 You will need to squash and rebase this onto the tip of the 4.x branch as the CUDA CMake configuration has changed in the main repo since you submited this PR.

Is this OK after "Merge branch '4.x'" ? My brain has never accepted git terminology, I am not sure about the good operation (with GitHub Desktop)

asmorkalov · 2024-04-17T14:25:28Z

I rebased the patch to current 4.x and got build error with Cuda 11.8 and Ubuntu 18.04:

[ 14%] Processing OpenCL kernels (core)
/home/alexander/Projects/OpenCV/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(625): error: a class or namespace qualified name is required

/home/alexander/Projects/OpenCV/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(643): error: a class or namespace qualified name is required

/home/alexander/Projects/OpenCV/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(625): error: a class or namespace qualified name is required

/home/alexander/Projects/OpenCV/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(643): error: a class or namespace qualified name is required

2 errors detected in the compilation of "/home/alexander/Projects/OpenCV/opencv-master/modules/core/src/cuda/gpu_mat_nd.cu".
CMake Error at cuda_compile_1_generated_gpu_mat_nd.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/alexander/Projects/OpenCV/opencv_contrib_build/modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_gpu_mat_nd.cu.o


modules/core/CMakeFiles/opencv_core.dir/build.make:82: recipe for target 'modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_gpu_mat_nd.cu.o' failed
make[3]: *** [modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_gpu_mat_nd.cu.o] Error 1
make[3]: *** Ожидание завершения заданий…
2 errors detected in the compilation of "/home/alexander/Projects/OpenCV/opencv-master/modules/core/src/cuda/gpu_mat.cu".
CMake Error at cuda_compile_1_generated_gpu_mat.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/alexander/Projects/OpenCV/opencv_contrib_build/modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_gpu_mat.cu.o


modules/core/CMakeFiles/opencv_core.dir/build.make:75: recipe for target 'modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_gpu_mat.cu.o' failed

chacha21 · 2024-05-02T09:19:49Z

I don't have such a problem with CUDA 12.4 under Visual Studio 2022
Could it be related to b330b6c ?

asmorkalov · 2024-07-17T08:31:27Z

Another issue with Cuda 12.5 on Ubuntu 20.04:

 40%] Building NVCC (Device) object modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_gpu_mat.cu.o
[ 40%] Building NVCC (Device) object modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_gpu_mat_nd.cu.o
/home/ksenia/Projects/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(625): error: a class or namespace qualified name is required
      __attribute__((device)) __inline__ __attribute__((always_inline)) typename T operator ()(typename TypeTraits<T2>::parameter_type ab) const
                                                                                 ^

/home/ksenia/Projects/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(625): error: a class or namespace qualified name is required
      __attribute__((device)) __inline__ __attribute__((always_inline)) typename T operator ()(typename TypeTraits<T2>::parameter_type ab) const
                                                                                 ^

/home/ksenia/Projects/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(643): error: a class or namespace qualified name is required
      __attribute__((device)) __inline__ __attribute__((always_inline)) typename T operator ()(typename TypeTraits<T2>::parameter_type ab) const
                                                                                 ^

/home/ksenia/Projects/opencv_contrib/modules/cudev/include/opencv2/cudev/functional/detail/../functional.hpp(643): error: a class or namespace qualified name is required
      __attribute__((device)) __inline__ __attribute__((always_inline)) typename T operator ()(typename TypeTraits<T2>::parameter_type ab) const
                                                                                 ^

2 errors detected in the compilation of "/home/ksenia/Projects/opencv/modules/core/src/cuda/gpu_mat_nd.cu".
2 errors detected in the compilation of "/home/ksenia/Projects/opencv/modules/core/src/cuda/gpu_mat.cu".
CMake Error at cuda_compile_1_generated_gpu_mat_nd.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/ksenia/Projects/opencv_contrib-build/modules/core/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_gpu_mat_nd.cu.o

I rebased the local branch to 4.x to include all patches for CUDA 12.x.

chacha21 · 2024-07-18T10:34:04Z

I think it must be related to some "tuple" name clashing in my calls to gridTransformTuple() and the make_tuple() or tie() helper.
I wish I could get rid of the complex "gridTransformTuple()" abstraction.
I will investigate.

the `make_tuple` or `tie()` helper returns a `cuda::std::tuple`, but `cuda` is then ambiguous between `::cuda` and `cv::cuda`. removing `using cv::cuda` will help

The usage of `typename` seems different among compilers

chacha21 · 2024-07-20T11:14:02Z

the PR:4.x / Ubuntu2004-ARM64 / BuildAndTest build failure does not seem to be related to the current pull request

asmorkalov added category: cuda optimization labels Dec 26, 2023

fixed compilation

b330b6c

additional "typename" disambiguifiers are required by some compilers

cudawarped reviewed Jan 1, 2024

View reviewed changes

chacha21 added 4 commits January 8, 2024 13:31

simplifications as suggested

7e1435b

use globPtr() and PtrStepSz<> to bypass confusing reshape() refactor tests

Merge branch 'cuda_phase_interleaved' of https://github.com/chacha21/…

997927f

…opencv_contrib into cuda_phase_interleaved

more simplifications as suggested

0552aed

fixed bug

094d517

the "empty mag" feature is useless for interleaved case get row/col size from angle mat rather than mag mat than could be empty in other cases

cudawarped suggested changes Jan 9, 2024

View reviewed changes

modifications as suggested

19c772f

code style and simplifications

cudawarped approved these changes Jan 9, 2024

View reviewed changes

Merge branch '4.x' into cuda_phase_interleaved

9b4b9dd

Merge remote-tracking branch 'upstream/4.x' into cuda_phase_interleaved

63b1f29

Merge remote-tracking branch 'upstream/4.x' into cuda_phase_interleaved

d4e341e

chacha21 and others added 3 commits July 18, 2024 14:16

disambiguification of tuple

65f75dc

the `make_tuple` or `tie()` helper returns a `cuda::std::tuple`, but `cuda` is then ambiguous between `::cuda` and `cv::cuda`. removing `using cv::cuda` will help

new attempt to fix compiler error under Ubuntu

0b65a8b

The usage of `typename` seems different among compilers

new attempt to please both msvc and gcc

cf284c8

asmorkalov self-assigned this Jul 22, 2024

asmorkalov merged commit 667a66e into opencv:4.x Jul 22, 2024
11 checks passed

asmorkalov mentioned this pull request Jul 25, 2024

5.x merge 4.x #3772

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add interleaved versions of phase/cartToPolar/polarToCart #3607

add interleaved versions of phase/cartToPolar/polarToCart #3607

chacha21 commented Dec 12, 2023

asmorkalov commented Dec 26, 2023

cudawarped commented Dec 26, 2023

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 1, 2024

cudawarped Jan 9, 2024

cudawarped Jan 9, 2024

cudawarped Jan 9, 2024

cudawarped left a comment

cudawarped commented Jan 9, 2024 •

edited

Loading

chacha21 commented Jan 9, 2024

asmorkalov commented Apr 17, 2024

chacha21 commented May 2, 2024 •

edited

Loading

asmorkalov commented Jul 17, 2024

chacha21 commented Jul 18, 2024 •

edited

Loading

chacha21 commented Jul 20, 2024 •

edited

Loading


		GpuMat dst = getOutputMat(_dst, xy.size(), CV_32FC1, stream);

		GpuMat_<float2> xyc(xy.reshape(2));


		GpuMat magAngle = getOutputMat(_magAngle, xy.size(), CV_32FC2, stream);

		GpuMat_<float2> xyc(xy.reshape(2));

add interleaved versions of phase/cartToPolar/polarToCart #3607

add interleaved versions of phase/cartToPolar/polarToCart #3607

Conversation

chacha21 commented Dec 12, 2023

Pull Request Readiness Checklist

asmorkalov commented Dec 26, 2023

cudawarped commented Dec 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped left a comment

Choose a reason for hiding this comment

cudawarped commented Jan 9, 2024 • edited Loading

chacha21 commented Jan 9, 2024

asmorkalov commented Apr 17, 2024

chacha21 commented May 2, 2024 • edited Loading

asmorkalov commented Jul 17, 2024

chacha21 commented Jul 18, 2024 • edited Loading

chacha21 commented Jul 20, 2024 • edited Loading

cudawarped commented Jan 9, 2024 •

edited

Loading

chacha21 commented May 2, 2024 •

edited

Loading

chacha21 commented Jul 18, 2024 •

edited

Loading

chacha21 commented Jul 20, 2024 •

edited

Loading