-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need restart #139
Comments
I'd vote that this is out of scope for round one. It would take a while to implement, especially if we're not willing to centralize the mesh and DOF data on a single rank.
Why do we need to save these?
For multi-step time integration (i.e. not us, yet), this would entail also saving a good chunk of time stepper state (vs. just re-bootstrapping the time stepper). How important is this "exact restart"? |
Totally agree. It would be good to leave this on the radar, however. We need to handle changing resource availability for production runs. Even for lead-up science runs; consider the situation where a big resource is used to run several flow-throughs, then a much smaller resource is used to run many "shots" or ignition instances.
That's a great question. We can discuss it with JBF, Anderson, and Esteban - perhaps we can do this better (or we already have) - but here is the issue stated in a meta sort of way: Currently the temperature (T) is calculated as a function of state, and the last temperature (i.e. T = temperature(state, Tguess)). For Cantera, the user cannot specify Tguess! Cantera just uses the internal state that it kept from the last call of it! Because we use a single instance of Cantera to calculate many points, this means that the answers we get from Cantera depend on partitioning! (i.e. because partitioning affects the point ordering and each call of Cantera just starts its iterations from Tguess = Tlastpoint). Prometheus does one step better by providing an API to specify Tguess. So for us, Tguess = the temperature that the given point was the last time. Because we store T(i.e. at runtime and at I/O time), we have T available to use for Tguess, but if we don't store it, then our Tguess is lost.
Experience tells me that deterministic restart is quite important, but I can also imagine some cases in which that would not be a show-stopper. We should bring this up with the physics guys. |
A restart capability will be required in order to run sufficiently long enough for meaningful flow simulations. We will need these capabilities:
cc: @inducer @anderson2981
The text was updated successfully, but these errors were encountered: