Skip to content

Commit

Permalink
Merge pull request #77 from sintefmath/dev
Browse files Browse the repository at this point in the history
Add high-level CUDA support, doc improvements, NLDD performance increases, CSR backend as default
  • Loading branch information
moyner authored Nov 12, 2024
2 parents ea3b38d + ff7e0c0 commit 178d314
Show file tree
Hide file tree
Showing 46 changed files with 1,346 additions and 343 deletions.
14 changes: 11 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "JutulDarcy"
uuid = "82210473-ab04-4dce-b31b-11573c4f8e0a"
authors = ["Olav Møyner <[email protected]>"]
version = "0.2.35"
version = "0.2.36"

[deps]
AlgebraicMultigrid = "2169fc97-5a83-5252-b627-83903c6c433c"
Expand Down Expand Up @@ -34,20 +34,26 @@ TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
Tullio = "bc48ee85-29a4-5162-ae0b-a64e1601d4bc"

[weakdeps]
AMGX = "c963dde9-0319-47f5-bf0c-b07d3c80ffa6"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
HYPRE = "b5ffcf37-a2bd-41ab-a3da-4bd9bc8ad771"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
PartitionedArrays = "5a9dfac6-5c52-46f7-8278-5e2210713be9"

[extensions]
JutulDarcyAMGXExt = "AMGX"
JutulDarcyCUDAExt = "CUDA"
JutulDarcyGLMakieExt = "GLMakie"
JutulDarcyMakieExt = "Makie"
JutulDarcyPartitionedArraysExt = ["PartitionedArrays", "MPI", "HYPRE"]

[compat]
AlgebraicMultigrid = "0.5.1, 0.6.0"
Artifacts = "1"
AMGX = "0.2"
CUDA = "5"
DataStructures = "0.18.13"
Dates = "1"
DelimitedFiles = "1.6"
Expand All @@ -56,7 +62,7 @@ ForwardDiff = "0.10.30"
GLMakie = "0.10.13"
GeoEnergyIO = "1.1.12"
HYPRE = "1.6.0, 1.7"
Jutul = "0.2.40"
Jutul = "0.2.42"
Krylov = "0.9.1"
LazyArtifacts = "1"
LinearAlgebra = "1"
Expand All @@ -78,6 +84,8 @@ Tullio = "0.3.4"
julia = "1.7"

[extras]
AMGX = "c963dde9-0319-47f5-bf0c-b07d3c80ffa6"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
HYPRE = "b5ffcf37-a2bd-41ab-a3da-4bd9bc8ad771"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
Expand All @@ -87,4 +95,4 @@ TestItemRunner = "f8b46487-2199-4994-9208-9a1283c18c0a"
TestItems = "1c621080-faea-4a02-84b6-bbd5e436b8fe"

[targets]
test = ["Test", "TestItemRunner", "TestItems", "HYPRE", "MPI", "PartitionedArrays"]
test = ["Test", "CUDA", "TestItemRunner", "TestItems", "HYPRE", "MPI", "PartitionedArrays"]
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ function build_jutul_darcy_docs(build_format = nothing;
],
"Further reading" => [
"man/advanced/mpi.md",
"man/advanced/gpu.md",
"man/advanced/compiled.md",
"Jutul functions" => "ref/jutul.md",
"Bibliography" => "extras/refs.md"
Expand Down
4 changes: 2 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ features:
link: /examples/intro_sensitivities
- icon: 🏃
title: High performance
details: Fast execution with support for MPI and thread parallelism
title: High performance on CPU & GPU
details: Fast execution with support for MPI, CUDA and thread parallelism
link: /man/advanced/mpi
---
````
Expand Down
57 changes: 57 additions & 0 deletions docs/src/man/advanced/gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# GPU support

JutulDarcy includes experimental support for running linear solves on the GPU. For many simulations, the linear systems are the most compute-intensive part and a natural choice for acceleration. At the moment, the support is limited to CUDA GPUs through [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl). For the most efficient CPR preconditioner, [AMGX.jl](https://github.com/JuliaGPU/AMGX.jl) is required which is currently limited to Linux systems. Windows users may have luck by running Julia inside [WSL](https://learn.microsoft.com/en-us/windows/wsl/install).

## How to use

If you have installed JutulDarcy, you should start by adding the CUDA and optionally the AMGX packages using the package manager:

```julia
using Pkg
Pkg.add("CUDA") # Requires a CUDA-capable GPU
Pkg.add("AMGX") # Requires CUDA + Linux
```

Once the packages have been added to the same environment as JutulDarcy, you can load them to enable GPU support. Let us grab the first ten steps of the EGG benchmark model:

```julia
using Jutul, JutulDarcy
dpth = JutulDarcy.GeoEnergyIO.test_input_file_path("EGG", "EGG.DATA")
case = setup_case_from_data_file(dpth)
case = case[1:10]
```

### Running on CPU

If we wanted to run this on CPU we would simply call `simulate_reservoir`:

```julia
result_cpu = simulate_reservoir(case);
```

### Running on GPU with block ILU(0)

If we now load `CUDA` we can run the same simulation using the CUDA-accelerated linear solver. By itself, CUDA only supports the ILU(0) preconditioner. JutulDarcy will automatically pick this preconditioner when CUDA is requested without AMGX, but we write it explicitly here:

```julia
using CUDA
result_ilu0_cuda = simulate_reservoir(case, linear_solver_backend = :cuda, precond = :ilu0);
```

### Running on GPU with CPR AMGX-ILU(0)

Loading the AMGX package makes a pure GPU-based two-stage CPR available. Again, we are explicit in requesting CPR, but if both `CUDA` and `AMGX` are available and functional this is redundant:

```julia
using AMGX
result_amgx_cuda = simulate_reservoir(case, linear_solver_backend = :cuda, precond = :cpr);
```

In short, load `AMGX` and `CUDA` and run `simulate_reservoir(case, linear_solver_backend = :cuda)` to get GPU results. The EGG model is quite small, so if you want to see significant performance increases, a larger case will be necessary. `AMGX` also contains a large number of options that can be configured for advanced users.

## Technical details and limitations

The GPU implementation relies on assembly on CPU and pinned memory to transfer onto the CPU. This means that the performance can be significantly improved by launching Julia with multiple threads to speed up the non-GPU parts of the code. AMGX is currently single-GPU only and does not work with MPI. Currently, only `Float64` is supported for CPR, but pure ILU(0) solves support `Float32` as well.

!!! warning "Experimental status"
Multiple successive runs with different `AMGX` instances have resulted in crashes when old instances are garbage collected. This part of the code is still considered experimental, with contributions welcome if you are using it.
4 changes: 1 addition & 3 deletions docs/src/man/advanced/mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ JutulDarcy can use threads by default, but advanced options can improve performa

## Overview of parallel support

There are two main ways of exploiting multiple cores in Jutul/JutulDarcy: Threads are automatically used for assembly and can be used for parts of the linear solve. If you require the best performance, you have to go to MPI where the linear solvers can use a parallel [BoomerAMG preconditioner](https://hypre.readthedocs.io/en/latest/solvers-boomeramg.html) via [HYPRE.jl](https://github.com/fredrikekre/HYPRE.jl).
There are two main ways of exploiting multiple cores in Jutul/JutulDarcy: Threads are automatically used for assembly and can be used for parts of the linear solve. If you require the best performance, you have to go to MPI where the linear solvers can use a parallel [BoomerAMG preconditioner](https://hypre.readthedocs.io/en/latest/solvers-boomeramg.html) via [HYPRE.jl](https://github.com/fredrikekre/HYPRE.jl). In addition, there is experimental GPU support described in [GPU support](@ref).

### MPI parallelization

Expand All @@ -25,7 +25,6 @@ Starting Julia with multiple threads (for example `julia --project. --threads=4`

Threads are easy to use and can give a bit of benefit for large models.


### Mixed-mode parallelism

You can mix the two approaches: Adding multiple threads to each MPI process can use threads to speed up assembly and property evaluations.
Expand All @@ -42,7 +41,6 @@ A few hints when you are looking at performance:

Example: 200k cell model on laptop: 1 process 235 s -> 4 processes 145s


## Solving with MPI in practice

There are a few adjustments needed before a script can be run in MPI.
Expand Down
7 changes: 4 additions & 3 deletions examples/co2_sloped.jl
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ krog_t = so.^2
krog = PhaseRelativePermeability(so, krog_t, label = :og)

# Higher resolution for second table:
sg = range(0, 1, 50)
sg = range(0, 1, 50);

# Evaluate Brooks-Corey to generate tables:
tab_krg_drain = brooks_corey_relperm.(sg, n = 2, residual = 0.1)
Expand Down Expand Up @@ -251,7 +251,7 @@ wd, states, t = simulate_reservoir(state0, model, dt,
parameters = parameters,
forces = forces,
max_timestep = 90day
)
);
# ## Plot the CO2 mole fraction
# We plot the overall CO2 mole fraction. We scale the color range to log10 to
# account for the fact that the mole fraction in cells made up of only the
Expand Down Expand Up @@ -339,7 +339,8 @@ for cell in 1:nc
x, y, z = centers[:, cell]
is_inside[cell] = sqrt((x - 720.0)^2 + 20*(z-70.0)^2) < 75
end
plot_cell_data(mesh, is_inside)
fig, ax, plt = plot_cell_data(mesh, is_inside)
fig
# ## Plot inventory in ellipsoid
# Note that a small mobile dip can be seen when free CO2 passes through this region.
inventory = co2_inventory(model, wd, states, t, cells = findall(is_inside))
Expand Down
22 changes: 18 additions & 4 deletions examples/data_input_file.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,42 @@
# regular JutulDarcy simulation, allowing modification and use of the case in
# differentiable workflows.
#
# We begin by loading the SPE9 dataset via the GeoEnergyIO package.
# We begin by loading the SPE9 dataset via the GeoEnergyIO package. This package
# includes a set of open datasets that can be used for testing and benchmarking.
# The SPE9 dataset is a 3D model with a corner-point grid and a set of wells
# produced by the Society of Petroleum Engineers. The specific version of the
# file included here is taken from the [OPM
# tests](https://github.com/OPM/opm-tests) repository.
using JutulDarcy, GeoEnergyIO
pth = GeoEnergyIO.test_input_file_path("SPE9", "SPE9.DATA")
pth = GeoEnergyIO.test_input_file_path("SPE9", "SPE9.DATA");
# ## Set up and run a simulation
# If we do not need the case, we could also have done:
# ws, states = simulate_data_file(pth)
case = setup_case_from_data_file(pth)
ws, states = simulate_reservoir(case)
ws, states = simulate_reservoir(case);
# ## Show the input data
# The input data takes the form of a Dict:
case.input_data
# We can also examine the for example RUNSPEC section, which is also represented
# as a Dict.
case.input_data["RUNSPEC"]
# ## Plot the simulation model
# These plot are interactive when run outside of the documentations.
# These plot are normally interactive, but if you are reading the published
# online documentation static screenshots will be inserted instead.
using GLMakie
plot_reservoir(case.model, states)
# ## Plot the well responses
# We can plot the well responses (rates and pressures) in an interactive viewer.
# Multiple wells can be plotted simultaneously, with options to select which
# units are to be used for plotting.
plot_well_results(ws)
# ## Plot the field responses
# Similar to the wells, we can also plot field-wide measurables. We plot the
# field gas production rate and the average pressure as the initial selection.
# If you are running this case interactively you can select which measurables to
# plot.
#
# We observe that the field pressure steadily decreases over time, as a result
# of the gas production. The drop in pressure is not uniform, as during the
# period where little gas is produced, the decrease in field pressure is slower.
plot_reservoir_measurables(case, ws, states, left = :fgpr, right = :pres)
16 changes: 5 additions & 11 deletions examples/five_spot_ensemble.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
# injector in one corner and the producer in the opposing corner, with a
# significant volume of fluids injected into the domain.
using JutulDarcy, Jutul
nx = 50
#-
nx = 50;
# ## Setup
# We define a function that, for a given porosity field, computes a solution
# with an estimated permeability field. For assumptions and derivation of the
Expand Down Expand Up @@ -55,10 +54,9 @@ function simulate_qfs(porosity = 0.2)
forces = setup_reservoir_forces(model, control = controls)
return simulate_reservoir(state0, model, dt, parameters = parameters, forces = forces, info_level = -1)
end
#-
# ## Simulate base case
# This will give the solution with uniform porosity of 0.2.
ws, states, report_time = simulate_qfs()
ws, states, report_time = simulate_qfs();
# ### Plot the solution of the base case
# We observe a radial flow pattern initially, before coning occurs near the
# producer well once the fluid has reached the opposite corner. The uniform
Expand All @@ -75,7 +73,6 @@ ax = Axis(fig[1, 2])
h = contourf!(ax, get_sat(states[nt]))
Colorbar(fig[1, end+1], h)
fig
#-
# ## Create 10 realizations
# We create a small set of realizations of the same model, with porosity that is
# uniformly varying between 0.05 and 0.3. This is not especially sophisticated
Expand All @@ -89,11 +86,10 @@ wells = []
report_step = nt
for i = 1:N
poro = 0.05 .+ 0.25*rand(Float64, (nx*nx))
ws, states, rt = simulate_qfs(poro)
push!(wells, ws)
push!(saturations, get_sat(states[report_step]))
ws_i, states_i, rt = simulate_qfs(poro)
push!(wells, ws_i)
push!(saturations, get_sat(states_i[report_step]))
end
#-
# ### Plot the oil rate at the producer over the ensemble
using Statistics
fig = Figure()
Expand All @@ -106,15 +102,13 @@ end
xlims!(ax, [mean(report_time), report_time[end]])
ylims!(ax, 0, 0.0075)
fig
#-
# ### Plot the average saturation over the ensemble
avg = mean(saturations)
fig = Figure()
h = nothing
ax = Axis(fig[1, 1])
h = contourf!(ax, avg)
fig
#-
# ### Plot the isocontour lines over the ensemble
fig = Figure()
h = nothing
Expand Down
2 changes: 1 addition & 1 deletion examples/model_coarsening.jl
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ fine_case = setup_case_from_data_file(data_pth);
fine_model = fine_case.model
fine_reservoir = reservoir_domain(fine_model)
fine_mesh = physical_representation(fine_reservoir)
ws, states = simulate_reservoir(fine_case, info_level = 1);
ws, states = simulate_reservoir(fine_case, info_level = -1);
# ## Coarsen the model and plot partition
# We coarsen the model using a partition size of 20x20x2 and the IJK method
# where the underlying structure of the mesh is used to subdivide the blocks.
Expand Down
13 changes: 7 additions & 6 deletions examples/optimize_simple_bl.jl
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
using Jutul
using JutulDarcy
using LinearAlgebra
using GLMakie
# # Example demonstrating optimzation of parameters against observations
# We create a simple test problem: A 1D nonlinear displacement. The observations
# are generated by solving the same problem with the true parameters. We then
# match the parameters against the observations using a different starting guess
# for the parameters, but otherwise the same physical description of the system.
using Jutul
using JutulDarcy
using LinearAlgebra
using GLMakie

function setup_bl(;nc = 100, time = 1.0, nstep = 100, poro = 0.1, perm = 9.8692e-14)
T = time
tstep = repeat([T/nstep], nstep)
Expand Down Expand Up @@ -39,10 +40,10 @@ poro_ref = 0.1
perm_ref = 9.8692e-14
# ## Set up and simulate reference
model_ref, state0_ref, parameters_ref, forces, tstep = setup_bl(nc = N, nstep = Nt, poro = poro_ref, perm = perm_ref)
states_ref, = simulate(state0_ref, model_ref, tstep, parameters = parameters_ref, forces = forces, info_level = -1)
states_ref, = simulate(state0_ref, model_ref, tstep, parameters = parameters_ref, forces = forces, info_level = -1);
# ## Set up another case where the porosity is different
model, state0, parameters, = setup_bl(nc = N, nstep = Nt, poro = 2*poro_ref, perm = 1.0*perm_ref)
states, rep = simulate(state0, model, tstep, parameters = parameters, forces = forces, info_level = -1)
states, rep = simulate(state0, model, tstep, parameters = parameters, forces = forces, info_level = -1);
# ## Plot the results
fig = Figure()
ax = Axis(fig[1, 1], title = "Saturation")
Expand Down
1 change: 0 additions & 1 deletion examples/relperms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,6 @@ lines(simulate_bl(model, prm), axis = (title = "Parametric LET function simulati
# ### Check out the parameters
# The LET parameters are now numerical parameters in the reservoir:
rmodel = reservoir_model(model)
display(rmodel)

# ## Conclusion
# We have explored a few aspects of relative permeabilities in JutulDarcy. There
Expand Down
2 changes: 0 additions & 2 deletions examples/two_phase_buckley_leverett.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ end
n, n_f = 100, 1000
states, model, report = solve_bl(nc = n)
print_stats(report)
#-
# ## Run refined version (1000 cells, 1000 steps)
# Using a grid with 100 cells will not yield a fully converged solution. We can
# increase the number of cells at the cost of increasing the runtime a bit. Note
Expand All @@ -61,7 +60,6 @@ print_stats(report)
# use an iterative solver.
states_refined, _, report_refined = solve_bl(nc = n_f);
print_stats(report_refined)
#-
# ## Plot results
# We plot the saturation front for the base case at different times together
# with the final solution for the refined model. In this case, refining the grid
Expand Down
Loading

2 comments on commit 178d314

@moyner
Copy link
Member Author

@moyner moyner commented on 178d314 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

Changes

  • Support for linear solves (CPR/ILU) on GPU for CUDA. See docs for more details.
  • Improved EDIT and TRANX/TRANY/TRANZ support
  • Bug fixes in CPR
  • Significantly improved performance for adaptive NLDD solvers
  • Example updates

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/119311

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.36 -m "<description of version>" 178d314d654bf842149b2311465aae350a56457b
git push origin v0.2.36

Please sign in to comment.