Does the transformation support MPI parallalization? #597
Replies: 5 comments 7 replies
-
Dear SJH,
No, we do not provide MPI, since we do not know of applications with
problem sizes exceeding one shared-mem node (which can do up to M=N=1e9),
or since bigger HPC applications that we know of only call FINUFFT locally
on each node as part of their distributed code. But many large NUFFT
problems are easily parallelizable (by summing outputs from separate
FINUFFT instances on each node) in at least two ways: over NU pts, and over
Fourier modes (the latter requires rephasing of inputs).
However, if both M and N are too large for a single node, and you want to
retain the quasi-linear scaling O(M + N ln N) then you'd need an MPI
spreader/interp, distributing the fine-grid array, and an MPI FFT (which
exists, eg PFFT). We don't have a lot of pressure to create the former - I
don't have a sense of how needed it is. I am not an MPI user myself.
Let us know your problem sizes (M,N, tol, dim, etc..) and maybe that will
provide motivation.
Best, Alex
…On Mon, Dec 2, 2024 at 11:49 PM SJH ***@***.***> wrote:
Hi,
I'm wondering if FINUFFT supports MPI parallelization? I have a
MPI-parallelized PDE solver (written in python) that I like to incorporate
FINUFFT (if possible) for interpolation. I went through the document,
seems like it only supports multi-threading (such as OpenMP), instead of
MPI?
—
Reply to this email directly, view it on GitHub
<#597>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSQJQMUOAFGDJKADGCT2DUZZ7AVCNFSM6AAAAABS45BN2OVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXGYYDCNRQGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Beta Was this translation helpful? Give feedback.
-
Dear Sijie, Thanks Libin for the detailed MPI ideas. I will bring it back to basics: N and M are as in the documentation, which you should read :)
This was using FFT=DUCC, which is faster for 3D. Also not upsampfac=1.25 not 2.0. What is your time budget? You see without MPI your task completes in about 0.5 sec. Best, Alex |
Beta Was this translation helpful? Give feedback.
-
Ok, good.
All-to-all will not be the best (but moving your 256^3*16 = 0.25GB should
take <0.1sec), but maybe for single-node you can try the contiguous RAM
layout that Libin explained.
For bigger problems, scaling multinode on MPI, would be to have separate
single-thread 3d2 transforms, each working with a subset of the array, then
you add (or merge) the answers. Exactly how that is done depends on your
layout (is the real space grid distributed over nodes, or the Fourier space
grid?).
Best, Alex
…On Mon, Dec 9, 2024 at 12:43 PM SJH ***@***.***> wrote:
Yes, I've been testing these transforms, and I can see that they are
extremely fast even with a single thread on my machine, and I believe this
will not be the computational bottleneck for my case (and thanks for the
clarification for using type 2; I also figured that out over the past
weekend).
The only reason I asked about MPI is that my PDE code runs in MPI, and I'd
like to incorporate finufft into the code and perform interpolation *on
the fly* while the code solving for PDEs. In other words, I'm not look
for using MPI to speedup finufft. As I mentioned and also by @lu1and10
<https://github.com/lu1and10>, it looks like the only way, for now, is to
perform MPI all to all communication, gather all data to a single core and
then perform type 2 transform on that one.
—
Reply to this email directly, view it on GitHub
<#597 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSSF2TIVVIRXVSUC7RT2EXJDVAVCNFSM6AAAAABS45BN2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNJRGEZTANI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Beta Was this translation helpful? Give feedback.
-
You have to write out the math of the type 2 transform, to see how to
handle separate blocks of Fourier space.
Eg, you need to multiply the outputs by a phase if the offset of your
Fourier grid is not the usual one (-N/2 <= k < N/2). This would allow
smaller FFTs on each process. You could then add the outputs.
Alternatively you could split by NU points, but then each process has to do
a full-size FFT. This only makes sense if M >> N (which I think is not your
case since M=1e6 for you but N>1e7).
…On Mon, Dec 9, 2024 at 2:40 PM SJH ***@***.***> wrote:
For bigger problems, scaling multinode on MPI, would be to have separate
single-thread 3d2 transforms, each working with a subset of the array, then
you add (or merge) the answers. Exactly how that is done depends on your
layout (is the real space grid distributed over nodes, or the Fourier space
grid?).
The solution (for the PDE) arrays (in both physical and fourier space) are
distributed. So is it doable (simply add/merge the results as you
mentioned)?
—
Reply to this email directly, view it on GitHub
<#597 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSSVFZTZJ3M6NTYLNKT2EXW35AVCNFSM6AAAAABS45BN2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNJRGIZTKMI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Beta Was this translation helpful? Give feedback.
-
Dear SJH,
If you want to chop up the Fourier grid this way, for a type-1 you have to
"pre-phase" the weights differently for each process, according to
exp(ia.x_j) where a is the k-space offset vector for that grid block. (I
imagine an MPI FFT has to do a similar thing). Otherwise you will just be
repeating the same block on each process :)
Let me know if still stuck, or you want to write/release such an MPI
wrapper - might be useful for the community.
Best, Alex
…On Wed, Dec 18, 2024 at 11:38 PM SJH ***@***.***> wrote:
@DiamonDinoia <https://github.com/DiamonDinoia> Basically I have a PDE
solver solving the incompressible Navier-Stokes equations in Fourier space.
The code is parallelized using MPI. In other words, each MPI process only
owns a portion of the Fourier modes/wavenumbers.
If I simply call finufft on each process, I guess it'll just compute
NUFFT based on corresponding portion of the Fourier modes, which is
incomplete. So I was wondering if finufft has the capability collecting
those "partial" results from each process and then compute the final result
(something like the MPI-parallelized regular FFT).
—
Reply to this email directly, view it on GitHub
<#597 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSXL2IONPRW6UHHLTYT2GJETXAVCNFSM6AAAAABS45BN2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNRRGI4TKNY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm wondering if
FINUFFT
supports MPI parallelization? I have a MPI-parallelized PDE solver (written in python) that I like to incorporateFINUFFT
(if possible) for interpolation in physical space. I went through the document, seems like it only supports multi-threading (such as OpenMP), instead of MPI?Beta Was this translation helpful? Give feedback.
All reactions