Final solution dump after exception can run into "file exists" error #167

inducer · 2020-12-03T22:42:03Z

This should not happen IMO.

inducer · 2020-12-03T22:45:39Z

To reproduce, run vortex-mpi.py from b2c9a41 with this diff:

diff --git a/examples/vortex-mpi.py b/examples/vortex-mpi.py
index b61148f..3e265cc 100644
--- a/examples/vortex-mpi.py
+++ b/examples/vortex-mpi.py
@@ -61,8 +61,8 @@ def main(ctx_factory=cl.create_some_context):
                 allocator=cl_tools.MemoryPool(cl_tools.ImmediateAllocator(queue)))
 
     dim = 2
-    nel_1d = 16
-    order = 3
+    nel_1d = 160
+    order = 5
     exittol = .09
     t_final = 0.1
     current_cfl = 1.0

(Observe how it's not adjusting the time step, hence the solution disagrees.)

Status: step=20 t=0.02000000000000001
------- P(-1.71, 1.42)
------- T(-0.00817, 0.00622)
------- dt=0.001 cfl=1.0
------- errors=0.427, 4.97, 0.449, 0.411
Checkpointing final state ...
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/andreask_work/src/env-3.8/lib/python3.8/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/home/andreask_work/src/env-3.8/lib/python3.8/site-packages/mpi4py/run.py", line 196, in main
    run_command_line(args)
  File "/home/andreask_work/src/env-3.8/lib/python3.8/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/usr/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "vortex-mpi.py", line 177, in <module>
    main()
  File "vortex-mpi.py", line 167, in main
    my_checkpoint(current_step, t=current_t,
  File "vortex-mpi.py", line 148, in my_checkpoint
    return exact_sim_checkpoint(discr, initializer, visualizer, eos,
  File "/home/andreask_work/src/mirgecom/mirgecom/simutil.py", line 129, in exact_sim_checkpoint
    visualizer.write_parallel_vtk_file(
  File "/home/andreask_work/src/meshmode/meshmode/discretization/visualization.py", line 468, in write_parallel_vtk_file
    self.write_vtk_file(
  File "/home/andreask_work/src/meshmode/meshmode/discretization/visualization.py", line 626, in write_vtk_file
    raise FileExistsError("output file '%s' already exists"
FileExistsError: output file 'vortex-000020-0000.vtu' already exists

MTCam · 2020-12-03T22:48:15Z

Correct, that's why we need this check:

    if current_t != checkpoint_t:
       if rank == 0:
          logger.info("Checkpointing final state ...")
       my_checkpoint(current_step, t=current_t,
                  dt=(current_t - checkpoint_t),
                  state=current_state)

This check was complained about and commented by someone. But it served a purpose, and that was to make sure we don't try to overwrite the soln file that was just written in the last step in case the time stepping stops right on an output step.

Edit:
We can't always count on the code stopping at an output step. When it doesn't, we want to make sure to write the last dump after stepping exits.

inducer · 2020-12-03T23:16:27Z

IMO we should find a solution that doesn't require repeating this logic in every driver. And without a comment, I (that someone you refer to IIRC) simply didn't get the logic, so it's not impossible that someone else might not either.

MTCam · 2020-12-04T00:31:37Z

IMO we should find a solution that doesn't require repeating this logic in every driver. And without a comment, I (that someone you refer to IIRC) simply didn't get the logic, so it's not impossible that someone else might not either.

Agreed. FWIW, I didn't think it was you personally that commented it. I think it very likely was me responding to complaints about it - but then I never properly fixed it up. Ideally, there would have been a test surrounding this to expose and address the need. Can we unit test the drivers?

I couldn't agree more about not leaving this to the drivers to get right or wrong. I'm also sensitive to exporting too much to the back-end, however.

majosm · 2020-12-04T03:52:37Z

I'm a little unclear on what the desired behavior actually should be in this case.

If the intent is to write a final dump after a mismatch has occurred, isn't this what should happen if the file already exists? The alternative would be to force it to be overwritten, which seems wrong if we're not allowing that behavior in other circumstances. I guess you could do a final dump to a different file on error (casename + "_error" or something) and allow that to be overwritten. Maybe not a great idea either since that would make it harder to load all the files into vis software.

majosm · 2020-12-04T04:08:21Z

I get it now. 🙂 The issue is that it gives that error even if the file didn't exist before the run started.

majosm mentioned this issue Dec 4, 2020

Refactor checkpoint code #168

Closed

majosm mentioned this issue Feb 22, 2021

Refactor checkpoint/simutils (attempt #2) #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final solution dump after exception can run into "file exists" error #167

Final solution dump after exception can run into "file exists" error #167

inducer commented Dec 3, 2020

inducer commented Dec 3, 2020

MTCam commented Dec 3, 2020 •

edited

Loading

inducer commented Dec 3, 2020

MTCam commented Dec 4, 2020

majosm commented Dec 4, 2020

majosm commented Dec 4, 2020

Final solution dump after exception can run into "file exists" error #167

Final solution dump after exception can run into "file exists" error #167

Comments

inducer commented Dec 3, 2020

inducer commented Dec 3, 2020

MTCam commented Dec 3, 2020 • edited Loading

inducer commented Dec 3, 2020

MTCam commented Dec 4, 2020

majosm commented Dec 4, 2020

majosm commented Dec 4, 2020

MTCam commented Dec 3, 2020 •

edited

Loading