-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFTW with threads #48
Comments
Thank you for that information. I am running 8 instances (72 frequencies) of dumphfdl with a samplerate 192000 feeded by Red Pitaya (hpsdr). It reduced my cpu usage on my virtual machine. |
FFTW with 4 threads is the only way I can run 2 instances of dumphfdl (one
of them at 6 Msps) on a single board computer like Odroid N2+. With 1
thread I get massive buffer overruns and basically no decodes.
It's not about the overall CPU usage but about not saturating any of the
cores (which aren't too beefy in this type of hardware).
|
So leave threading enabled but make the number of threads a runtime option. Don't turn them off completely, as that will generate wisdom files incompatible with programs that do thread. Setting the number of threads to 1 has essentially the same effect while allowing sharing of a system wisdom file with programs that want to use more. 6 Ms/s seemed high, but then I realized you're doing your own multichannel downconversion internally. I am currently running a separate copy of dumphfdl for every channel, 106 in total, with ka9q-radio doing the downconversion. This works well, and certainly creates a lot of parallelism, but I need to compare total CPU use against fewer but wider channels, perhaps one per band. I use a 12 ks/s IQ input for each SSB signal (8 ks/s didn't work) which means I need a higher total sample rate for a bunch of nearby HFDL channels than one wider channel covering them all. BTW, if you need an analytic signal you can create one with a half-plane filter using fast convolution. This is how I do it: start with a real-input FFT to create a complex spectrum with hermitian symmetry (negative spectrum is mirror image of positive spectrum). Then remove the negative frequencies (with windowing to prevent time-domain ripples) and convert back to the time domain with a complex-output FFT. This would permit feeding dumphfdl with a conventional SSB receiver. KA9Q-radio uses fast convolution with a shared forward FFT to implement a multichannel digital downconverter. Even at a 64.8 (all of HF) or 129.6 Ms/s (HF-6m) A/D input sample rate I still use single-threaded FFTs, though I give the option to run several threads each performing independent FFTs, which is faster that multithreading individual FFTs. Usually even this isn't necessary; I have a NUC with an i5-8260U @ 1.60GHz doing 1.62 megapoint real-input FFTs 50 times/sec while using only ~40% of a single core. FFTW is amazing. |
I just ran the experiment with per-band dumphfdl fed from ka9q-radio. It's faster than one per channel, but not dramatically so. I'm using 12 bands with a total sample rate of 1.388 Ms/s vs 106 individual channels @ 12 ks/channel = 1.272 Ms/s. I guess it works both ways. |
I've added |
Thanks, that will do it for me! |
I see that by default dumphfdl builds with FFTW's internal multithreading option enabled, with 4 threads specified. Have you benchmarked this?
I also use FFTW heavily to perform fast convolution in ka9q-radio, and I found that internal multithreading didn't buy me much, at least with the huge FFTs I use (e.g., 1,620,000 points). Although it reduced the clock time required to perform a single FFT, the overall CPU utilization went up. Since I already perform a lot of independent FFTs in parallel in separate application threads I found it's better (for me) to have FFTW use only a single thread. Since I'm running lots of parallel copies of dumphfdl (fed by ka9q-radio channel threads) I went into my copy of fft_fftw.c and changed the number of threads to 1, leaving multithreading enabled.
I did this because of a gotcha. Wisdom files written with threads = 1 are NOT compatible with those written with multithreading completely turned off. This means you can't share a system-wide wisdom file (e.g., /etc/fftw/wisdomf) unless everybody agrees to use the same FFTW thread settings. There's a per-application wisdom file, but it doesn't look like you're using one. If you like, I could have it create one in /var/lib/hfdl/wisdom and send a pull request. I've already placed systable.conf in that directory, as this is the standard place in Linux to hold application-specific data files.
The text was updated successfully, but these errors were encountered: