Skip to content

Commit

Permalink
Update with some scaling data
Browse files Browse the repository at this point in the history
  • Loading branch information
MTCam committed Apr 6, 2022
1 parent 906a163 commit d0e5cb4
Show file tree
Hide file tree
Showing 55 changed files with 421 additions and 3 deletions.
11 changes: 9 additions & 2 deletions combozzle.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,9 +242,15 @@ def main(ctx_factory=cl.create_some_context, use_logmgr=True,
if input_file:
input_data = None
if rank == 0:
print(f"Reading user input file: {input_file}.")
with open(input_file) as f:
input_data = yaml.load(f, Loader=yaml.FullLoader)
input_data = comm.bcast(input_data, root=0)

try:
casename = input_data["casename"] # fixme: allow cl override
except KeyError:
pass
try:
dim = int(input_data["dim"])
except KeyError:
Expand Down Expand Up @@ -435,6 +441,7 @@ def main(ctx_factory=cl.create_some_context, use_logmgr=True,

if rank == 0:
print("#### Simluation control data: ####")
print(f"\tCasename: {casename}")
print(f"----- run control ------")
print(f"\t{grid_only=},{discr_only=},{inert_only=}")
print(f"\t{single_gas_only=},{dummy_rhs_only=}")
Expand Down Expand Up @@ -575,7 +582,7 @@ def main(ctx_factory=cl.create_some_context, use_logmgr=True,
generate_mesh)
local_nelements = local_mesh.nelements

print(f"{rank=},{local_nelements=},{global_nelements=}")
print(f"{rank=},{dim=},{order=},{local_nelements=},{global_nelements=}")
if grid_only:
return 0

Expand Down Expand Up @@ -617,7 +624,7 @@ def vol_max(x):

if rank == 0:
print(f"----- Discretization info ----")
print(f"Discr: {nodes.shape=}, {h_min=}, {h_max=}")
print(f"Discr: {nodes.shape=}, {order=}, {h_min=}, {h_max=}")
for i in range(nparts):
if rank == i:
print(f"{rank=},{local_nelements=},{global_nelements=}")
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions dist_scale/3d/combozzle.py
120 changes: 120 additions & 0 deletions dist_scale/3d/error_report.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
[mtcampbe@lassen30:3d]$ pwd
/g/g17/mtcampbe/ceesd-timing/drivers_bozzle/dist_scale/3d
[mtcampbe@lassen30:3d]$ rm run_config.yaml
[mtcampbe@lassen30:3d]$ ln -s run_config/weak_8.yaml ./run_config.yaml
[mtcampbe@lassen30:3d]$ jsrun -g 1 -a 1 -n 8 bash -c 'POCL_CACHE_DIR=$POCL_CACHE_DIR_ROOT/$$ python -O -m mpi4py ./combozzle.py -i run_config.yaml'
/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/grudge/grudge/array_context.py:59: UserWarning: Your loopy and meshmode branches are mismatched. Please make sure that you have the https://github.com/kaushikcfd/loopy/tree/pytato-array-context-transforms branch of loopy.
warn("Your loopy and meshmode branches are mismatched. "
( .... ) after a while and many msgs
build program: kernel 'axpbyz' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'axpbyz' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'if_positive' was part of a lengthy source build resulting from a binary cache miss (0.55 s)
build program: kernel 'axpbyz' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'axpbyz' was part of a lengthy source build resulting from a binary cache miss (0.52 s)
build program: kernel 'if_positive' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'if_positive' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'if_positive' was part of a lengthy source build resulting from a binary cache miss (0.51 s)
build program: kernel 'if_positive' was part of a lengthy source build resulting from a binary cache miss (0.50 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.60 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.63 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.65 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.68 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.69 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.73 s)
build program: kernel 'reduce_kernel_stage1' was part of a lengthy source build resulting from a binary cache miss (0.78 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.60 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.59 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
build program: kernel 'reduce_kernel_stage2' was part of a lengthy source build resulting from a binary cache miss (0.64 s)
[lassen30:57601] *** Process received signal ***
[lassen30:57601] Signal: Segmentation fault (11)
[lassen30:57601] Signal code: Address not mapped (1)
[lassen30:57601] Failing at address: 0x10419d260
<< Rank 6: Generating lwcore_cpu.3434071_2.6 on lassen30 Wed Apr 6 10:19:55 PDT 2022 (LLNL_COREDUMP_FORMAT_CPU=lwcore) >>
<< Rank 6: Generated lwcore_cpu.3434071_2.6 on lassen30 Wed Apr 6 10:19:58 PDT 2022 in 3 secs >>
<< Rank 6: Waiting 60 secs before aborting task on lassen30 Wed Apr 6 10:19:58 PDT 2022 (LLNL_COREDUMP_WAIT_FOR_OTHERS=60) >>

Traceback (most recent call last):
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 197, in _run_module_as_main
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 87, in _run_code
return _run_code(code, main_globals, None,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
exec(code, run_globals)
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/__main__.py", line 7, in <module>
main()
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
main()
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/run.py", line 198, in main
run_command_line(args)
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
run_command_line(args)
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/site-packages/mpi4py/run.py", line 47, in run_command_line
run_path(sys.argv[0], run_name='__main__')
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 268, in run_path
run_path(sys.argv[0], run_name='__main__')
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 97, in _run_module_code
return _run_module_code(code, init_globals, run_name,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 87, in _run_code
_run_code(code, mod_globals, init_globals,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "./combozzle.py", line 1164, in <module>
build program: kernel 'scalar_comparison_kernel' was part of a lengthy source build resulting from a binary cache miss (0.55 s)
exec(code, run_globals)
File "./combozzle.py", line 1164, in <module>
main(use_logmgr=args.log, use_leap=args.leap, input_file=input_file,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/mirgecom/mirgecom/mpi.py", line 157, in wrapped_func
main(use_logmgr=args.log, use_leap=args.leap, input_file=input_file,
File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/mirgecom/mirgecom/mpi.py", line 157, in wrapped_func
func(*args, **kwargs)
File "./combozzle.py", line 680, in main
func(*args, **kwargs)
File "./combozzle.py", line 680, in main
cantera_soln = cantera.Solution(phase_id="gas", source=mech_cti)
File "interfaces/cython/cantera/base.pyx", line 59, in cantera._cantera._SolutionBase.__cinit__
cantera_soln = cantera.Solution(phase_id="gas", source=mech_cti)
File "interfaces/cython/cantera/base.pyx", line 59, in cantera._cantera._SolutionBase.__cinit__
File "interfaces/cython/cantera/base.pyx", line 160, in cantera._cantera._SolutionBase._init_cti_xml
File "interfaces/cython/cantera/base.pyx", line 160, in cantera._cantera._SolutionBase._init_cti_xml
cantera._cantera.CanteraError:
***********************************************************************
CanteraError thrown by call_ctml_writer:
Error executing python while converting input file:
Python command was: '/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/bin/python'
exec_stream_t::start: timeout while waiting for child to report via status_pipe
Success
[code 0x0000 ()]
***********************************************************************

cantera._cantera.CanteraError:
***********************************************************************
CanteraError thrown by call_ctml_writer:
Error executing python while converting input file:
Python command was: '/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge.fusion/miniforge3/envs/timing.fusion/bin/python'
exec_stream_t::start: timeout while waiting for child to report via status_pipe
Success
[code 0x0000 ()]
***********************************************************************

Reading user input from file: run_config.yaml
rank=7,dim=3,order=3,local_nelements=49152,global_nelements=393216
rank=7,local_nelements=49152,global_nelements=393216
Reading user input from file: run_config.yaml
rank=6,dim=3,order=3,local_nelements=49152,global_nelements=393216
rank=6,local_nelements=49152,global_nelements=393216
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
with errorcode 1.
1 change: 1 addition & 0 deletions dist_scale/3d/mirge_batch_env.sh
1 change: 1 addition & 0 deletions dist_scale/3d/run_config.yaml
40 changes: 40 additions & 0 deletions dist_scale/3d/run_config/weak_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
order: 3
domain_xlen: 1.
domain_ylen: 1.
domain_zlen: 1.
chlen: .25
x_scale: 2
y_scale: 1
z_scale: 1
weak_scale: 4
h_scale: 1
boundary_report: 0
nspecies: 0
init_only: 0
grid_only: 0
discr_only: 0
inviscid_only: 0
inert_only: 0
single_gas_only: 0
dummy_rhs_only: 0
adiabatic_boundary: 0
periodic_boundary: 1
do_checkpoint: 0
nviz: 1000
nrestart: 10000
nstatus: 10000
sponge_on: 1
artificial_viscosity_on: 1
timestepping_on: 1
log_dependent: 0
current_dt: 1e-10
t_final: 2e-9
constant_cfl: 0
cfl: .1
integrator: euler
init_pressure: 101325
init_temperature: 1500
init_density: 1.0
health_pres_min: 1
health_pres_max: 1000000
casename: combustle
40 changes: 40 additions & 0 deletions dist_scale/3d/run_config/weak_16.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
order: 3
domain_xlen: 1.
domain_ylen: 1.
domain_zlen: 1.
chlen: .25
x_scale: 2
y_scale: 2
z_scale: 1
weak_scale: 8
h_scale: 1
boundary_report: 0
nspecies: 0
init_only: 0
grid_only: 0
discr_only: 0
inviscid_only: 0
inert_only: 0
single_gas_only: 0
dummy_rhs_only: 0
adiabatic_boundary: 0
periodic_boundary: 1
do_checkpoint: 0
nviz: 1000
nrestart: 10000
nstatus: 10000
sponge_on: 1
artificial_viscosity_on: 1
timestepping_on: 1
log_dependent: 0
current_dt: 1e-10
t_final: 2e-9
constant_cfl: 0
cfl: .1
integrator: euler
init_pressure: 101325
init_temperature: 1500
init_density: 1.0
health_pres_min: 1
health_pres_max: 1000000
casename: combustle
40 changes: 40 additions & 0 deletions dist_scale/3d/run_config/weak_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
order: 3
domain_xlen: 1.
domain_ylen: 1.
domain_zlen: 1.
chlen: .25
x_scale: 2
y_scale: 2
z_scale: 1
weak_scale: 4
h_scale: 1
boundary_report: 0
nspecies: 0
init_only: 0
grid_only: 0
discr_only: 0
inviscid_only: 0
inert_only: 0
single_gas_only: 0
dummy_rhs_only: 0
adiabatic_boundary: 0
periodic_boundary: 1
do_checkpoint: 0
nviz: 1000
nrestart: 10000
nstatus: 10000
sponge_on: 1
artificial_viscosity_on: 1
timestepping_on: 1
log_dependent: 0
current_dt: 1e-10
t_final: 2e-9
constant_cfl: 0
cfl: .1
integrator: euler
init_pressure: 101325
init_temperature: 1500
init_density: 1.0
health_pres_min: 1
health_pres_max: 1000000
casename: combustle
40 changes: 40 additions & 0 deletions dist_scale/3d/run_config/weak_4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
order: 3
domain_xlen: 1.
domain_ylen: 1.
domain_zlen: 1.
chlen: .25
x_scale: 1
y_scale: 1
z_scale: 1
weak_scale: 8
h_scale: 1
boundary_report: 0
nspecies: 0
init_only: 0
grid_only: 0
discr_only: 0
inviscid_only: 0
inert_only: 0
single_gas_only: 0
dummy_rhs_only: 0
adiabatic_boundary: 0
periodic_boundary: 1
do_checkpoint: 0
nviz: 1000
nrestart: 10000
nstatus: 10000
sponge_on: 1
artificial_viscosity_on: 1
timestepping_on: 1
log_dependent: 0
current_dt: 1e-10
t_final: 2e-9
constant_cfl: 0
cfl: .1
integrator: euler
init_pressure: 101325
init_temperature: 1500
init_density: 1.0
health_pres_min: 1
health_pres_max: 1000000
casename: combustle
40 changes: 40 additions & 0 deletions dist_scale/3d/run_config/weak_8.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
order: 3
domain_xlen: 1.
domain_ylen: 1.
domain_zlen: 1.
chlen: .25
x_scale: 2
y_scale: 1
z_scale: 1
weak_scale: 8
h_scale: 1
boundary_report: 0
nspecies: 0
init_only: 0
grid_only: 0
discr_only: 0
inviscid_only: 0
inert_only: 0
single_gas_only: 0
dummy_rhs_only: 0
adiabatic_boundary: 0
periodic_boundary: 1
do_checkpoint: 0
nviz: 1000
nrestart: 10000
nstatus: 10000
sponge_on: 1
artificial_viscosity_on: 1
timestepping_on: 1
log_dependent: 0
current_dt: 1e-10
t_final: 2e-9
constant_cfl: 0
cfl: .1
integrator: euler
init_pressure: 101325
init_temperature: 1500
init_density: 1.0
health_pres_min: 1
health_pres_max: 1000000
casename: combustle
1 change: 1 addition & 0 deletions dist_scale/3d/weak_scale_lazy_bsub.sh
15 changes: 15 additions & 0 deletions dist_scale/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Each directory here contains a test suite intended to run and capture
grid-scaling experiemnts for _MIRGE-Com_.

In each experiment directory you should find the following:

- config: directory containing all the yaml file inputs for combozzle
- run_logs: directory containing the data produced by actual runs
- *bsub.sh*: a script containing the Lassen batch job used to create the data
- combozzle.py: a symlink to the main combozzle python driver

In general, to run any given experiment on Lassen:
(warning: this will overwrite any data in the `run_logs` directory.)

> bsub *bsub.sh
Loading

0 comments on commit d0e5cb4

Please sign in to comment.