MPI vs pytest #69

inducer · 2020-08-13T23:07:17Z

We'll need to run (pytest) tests under MPI, to test the distributed-memory functionality.
These need to run during CI.
We will also want to be able to run this on our target platforms, i.e. the big DOE machines.
We need to decide between "pytest inside MPI" (i.e. mpiexec python -m pytest) and "MPI inside pytest" (as meshmode currently does.

If we choose "pytest inside MPI", then pytest-mpi might come in handy.

The text was updated successfully, but these errors were encountered:

majosm · 2020-08-24T23:51:03Z

Update on this: I tried out pytest-mpi as an alternative to the current mpiexec-inside-test approach, and I'm not super happy with it. It seems to produce separate pytest output for each rank (similar to what you would see if you ran multiple separate pytests simulatenously), and in some cases it comes out a bit garbled. It also doesn't provide much in the way of options for specifying how many ranks/nodes/etc. to use.

So instead I took a stab at generalizing the current approach so that we can customize the behavior for different platforms. The proof-of-concept code can be seen here and here. Essentially what it does now is check for an environment variable set by the user (MPI_EXECUTOR_TYPE) that specifies which MPI execution method to use ('basic' for mpiexec, 'slurm' for srun, etc.) and then set up the launching command accordingly. (If the environment variable is not set, the tests are skipped.) I also moved the test function call out of main and into little on-the-fly scripts that get passed to python via the -c flag so that multiple MPI tests can be placed in the same file. Seems a little easier to understand what's happening that way too.

I'm not entirely satisfied with this yet (I suspect there's a way to further simplify the test_script stuff; looking for suggestions), but it seems like this could work. I would of course eventually move the executor definitions somewhere else so that they could be used by other packages (is there a good place to put these?).

Edit: I set up the slurm and LC-LSF executors to be used from inside an interactive job submitted by the user with salloc/lalloc.

MTCam · 2020-08-25T14:53:04Z

@majosm Here is what I'm doing currently in TEESD to handle batching, platform-dependent spawn commands, etc.

I would be pretty excited about improving that if you find a better way!

majosm · 2020-08-25T15:18:01Z

Parsl actually has some infrastructure for this too (Execution Providers and Launchers). I wonder if we could nudge them into splitting it off into a standalone package at some point.

majosm · 2020-08-31T21:36:21Z

@inducer Re: Different behavior for subprocess.call vs. os.system on lassen: I set up an example (source here). The script inside that gets executed via MPI prints its rank and also creates some empty files (to check whether it's just a stdout capturing issue or not). I tried two sets of tests; in the first I just print out the script source (command in print_script) in order to try to rule out any formatting issues (since the formatting is a little bit nasty at the moment), and in the second I actually run the script (command in run_script).

Results:

MPI + print_script:

Version 1: Works
Version 2: Works
Version 3: Doesn't work (prints lrun help message; error code 1)

MPI + run_script:

Version 1: Works
Version 2: Doesn't work (no stdout, doesn't create files, no error)
Version 3: Doesn't work (prints lrun help message; error code 1)

Seems like version 2 is having trouble running python from the subprocess for some reason. Not sure what's going on with version 3. Any ideas?

Edit: Version 3 works for both cases if I do " ".join(command). I guess I'm supposed to pass a single string if using shell=True?

majosm · 2020-09-01T16:16:14Z

Also, where would be a good place to stash these executor definitions?

majosm · 2020-09-01T20:48:28Z

Looks like mpi4py uses unittest, not pytest. And they call MPI outside the test script. I don't see any launcher-handling code that we could borrow (just some CI configuration scripts for a few different platforms).

I'll see if I can find any other Python codebases that use MPI.

majosm · 2020-09-01T22:00:46Z

Dang. Well, the pickle version was looking pretty nice until I set up a test that used an array context and ran into this:

        pickled_test = pickle.dumps(test).hex()
>       pickled_args = pickle.dumps(args).hex()
E       AttributeError: Can't pickle local object 'pytest_generate_tests_for_pyopencl_array_context.<locals>.ArrayContextFactory'

test_partition.py:131: AttributeError

Unless someone happens to know a workaround, I think that spells doom for this approach...

inducer assigned majosm Aug 13, 2020

MTCam mentioned this issue Aug 25, 2020

Automatically generate the correct platform-dependent parallel spawn command for examples #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI vs pytest #69

MPI vs pytest #69

inducer commented Aug 13, 2020

majosm commented Aug 24, 2020 •

edited

Loading

MTCam commented Aug 25, 2020 •

edited

Loading

majosm commented Aug 25, 2020 •

edited

Loading

majosm commented Aug 31, 2020 •

edited

Loading

majosm commented Sep 1, 2020

majosm commented Sep 1, 2020

majosm commented Sep 1, 2020

MPI vs pytest #69

MPI vs pytest #69

Comments

inducer commented Aug 13, 2020

majosm commented Aug 24, 2020 • edited Loading

MTCam commented Aug 25, 2020 • edited Loading

majosm commented Aug 25, 2020 • edited Loading

majosm commented Aug 31, 2020 • edited Loading

majosm commented Sep 1, 2020

majosm commented Sep 1, 2020

majosm commented Sep 1, 2020

majosm commented Aug 24, 2020 •

edited

Loading

MTCam commented Aug 25, 2020 •

edited

Loading

majosm commented Aug 25, 2020 •

edited

Loading

majosm commented Aug 31, 2020 •

edited

Loading