Skip to content

Commit

Permalink
add CONUS404 PGW access
Browse files Browse the repository at this point in the history
  • Loading branch information
amsnyder committed Oct 8, 2024
1 parent 72daccb commit 3741f30
Show file tree
Hide file tree
Showing 21 changed files with 600 additions and 30,384 deletions.
32 changes: 22 additions & 10 deletions _sources/dataset_access/CONUS404_ACCESS.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,36 @@
# CONUS404 and CONUS404 Bias-Adjusted Data Access
# CONUS404 Products Data Access

This section contains notebooks that demonstrate how to access and perform basic data manipulation for the [CONUS404 dataset](https://doi.org/10.5066/P9PHPK4F). The examples can also be applied to the [CONUS404 bias-adjusted dataset](https://doi.org/10.5066/P9JE61P7).
This section contains notebooks that demonstrate how to access and perform basic data manipulation for the [CONUS404 dataset](https://doi.org/10.5066/P9PHPK4F). The examples can also be applied to the [CONUS404 bias-adjusted dataset](https://doi.org/10.5066/P9JE61P7) and the [CONUS404 psuedo global warming (PGW) dataset](https://doi.org/10.5066/10.5066/P9HH85UU).

In the CONUS404 intake sub-catalog (see [here](../dataset_catalog/README.md) for an explainer of our intake data catalog), you will see entries for four CONUS404 datasets: `conus404-hourly`, `conus404-daily`, `conus404-monthly`, and `conus404-daily-diagnostic` data, as well as two CONUS404 bias-adjusted datasets: `conus404-hourly-ba`, `conus404-daily-ba`. Each of these datasets is duplicated in up to three different storage locations (as the [intake catalog section](../dataset_catalog/README.md) also describes).
In the CONUS404 intake sub-catalog (see [here](../dataset_catalog/README.md) for an explainer of our intake data catalog), you will see entries for:
- four CONUS404 datasets: `conus404-hourly`, `conus404-daily`, `conus404-monthly`, and `conus404-daily-diagnostic` data
- two CONUS404 bias-adjusted datasets: `conus404-hourly-ba`, `conus404-daily-ba`
- two CONUS404 PGW datasets: `conus404-pgw-hourly` and `conus404-pgw-daily-diagnostic`

## CONUS404 Data
The `conus404-hourly` data is a subset of the `wrfout` model output. For instantaneous variables, the data value at each time step represents the instantaneous value at the timestep. For accumulated variables, the data value represents the accumulated value up to the timestep (see the `integration_length` attribute attached to each accumulated variable for more details on the accumulation period).
Each of these datasets is duplicated in up to three different storage locations (as the [intake catalog section](../dataset_catalog/README.md) also describes).

The `conus404-daily-diagnostic` data is a subset from the `wrfxtrm` model output. These data represent the results of the past 24 hours, with the timestamp corresponding to the end time of the 24 hour period. Because the CONUS404 started at 1979-10-01_00:00:00, the first timestep (1979-10-01_00:00:00) for each variable is all zeros.

Both of these datasets are described in the official [CONUS404 data release](https://doi.org/10.5066/P9PHPK4F).
**We recommend that you regularly check our [CONUS404 changelog](./CONUS404_CHANGELOG) to see any updates that have been made to the zarr stores.** We do not anticipate regular changes to the dataset, but we may need to fix an occasional bug or update the dataset with additional years of data.

We also have `conus404-daily` and `conus404-monthly` files, which are just resampled from the `conus404-hourly` data. To create the `conus404-daily` zarr, instantaneous variables are aggregated from 00:00:00 UTC to 11:00:00 UTC, while accumulated variables are aggregated from 01:00:00 UTC to 12:00:00 UTC of the next day.
## CONUS404 Data
CONUS404 is a unique, high-resolution hydro-climate dataset appropriate for forcing hydrological models and conducting meteorological analysis over the contiguous United States. Users should review the official [CONUS404 data release](https://doi.org/10.5066/P9PHPK4F) to understand the dataset before working with the zarr stores provided in our intake catalog.

The `conus404-hourly` data is a subset of the wrfout model output and `conus404-daily-diagnostic` is a subset from the wrfxtrm model output, both of which are described in the official data release. We also provide `conus404-daily` and `conus404-monthly` files, which are just resampled from the `conus404-hourly` data.

**Please note that the values in the ACLWDNB, ACLWUPB, ACSWDNB, ACSWDNT, and ACSWUPB variables available in the zarr store differ from the original model output.** These variables have been re-calculated to reflect the accumulated value since the model start, as directed in the WRF manual. An attribute has been added to each of these variables in the zarr store to denote the accumulation period for the variable.

**We recommend that you regularly check our [CONUS404 changelog](./CONUS404_CHANGELOG) to see any updates that have been made to the zarr stores.** We do not anticipate regular changes to the dataset, but we may need to fix an occasional bug or update the dataset with additional years of data.

## CONUS404 Bias-Adjusted Data
The `conus404-hourly-ba` data contains bias-adjusted temperature and precipiation data from the CONUS404 dataset, which is described in the official [CONUS404 bias adjusted data release](https://doi.org/10.5066/P9JE61P7). The `conus404-daily-ba` files are resampled from the `conus404-hourly-ba` data.
The `conus404-hourly-ba` data contains bias-adjusted temperature and precipiation data from the CONUS404 dataset, which is described in the official [CONUS404 bias adjusted data release](https://doi.org/10.5066/P9JE61P7). Users should review the official data release to understand the dataset before working with the zarr stores provided in our intake catalog.

The `conus404-daily-ba` files are resampled from the `conus404-hourly-ba` data.

## CONUS404 PGW Data
The CONUS404 pseudo-global warming (PGW) dataset is a future-perturbed hydro-climate dataset, created as a follow on to the CONUS404 dataset. The CONUS404 PGW dataset represents the weather from 1980 to 2021 under a warmer and wetter climate environment and provides an opportunity to explore the event-based climate change impacts when used with the CONUS404 historical data. Users should review the official [CONUS404 PGW data release](https://doi.org/10.5066/10.5066/P9HH85UU) to understand the dataset before working with the zarr stores provided in our intake catalog.

The `conus404-pgw-hourly` data is a subset of the wrfout model output and `conus404-pgw-daily-diagnostic` is a subset from the wrfxtrm model output, both of which are described in the official data release.

**Please note that the values in the ACLWDNB, ACLWUPB, ACSWDNB, ACSWDNT, and ACSWUPB variables available in the zarr store differ from the original model output.** These variables have been re-calculated to reflect the accumulated value since the model start, as directed in the WRF manual. An attribute has been added to each of these variables in the zarr store to denote the accumulation period for the variable.

## Example Notebooks
We currently have five notebooks to help demonstrate how to work with these datasets in a python workflow:
Expand Down
15 changes: 1 addition & 14 deletions _sources/dataset_access/conus404_point_selection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -164,19 +164,6 @@
"ds_var = ds[var]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bccdb700-23ef-4a47-bd0e-720d280486c6",
"metadata": {},
"outputs": [],
"source": [
"# get projection info\n",
"ds = ds.metpy.parse_cf()\n",
"crs = ds[var[0]].metpy.cartopy_crs\n",
"#crs"
]
},
{
"cell_type": "markdown",
"id": "92fb005e-b473-4a3d-8594-1d815b04c8b7",
Expand Down Expand Up @@ -581,7 +568,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.0"
},
"vscode": {
"interpreter": {
Expand Down
30 changes: 17 additions & 13 deletions _sources/dataset_catalog/subcatalogs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,22 +45,26 @@ list(cat)
```
producing a list of all the CONUS404 dataset versions:
```
['conus404-hourly-onprem',
['conus404-hourly-onprem-hw',
'conus404-hourly-cloud',
'conus404-hourly-osn',
'conus404-daily-diagnostic-onprem',
'conus404-daily-diagnostic-onprem-hw',
'conus404-daily-diagnostic-cloud',
'conus404-daily-diagnostic-osn',
'conus404-daily-onprem',
'conus404-daily-onprem-hw',
'conus404-daily-cloud',
'conus404-daily-osn',
'conus404-monthly-onprem',
'conus404-monthly-onprem-hw',
'conus404-monthly-cloud',
'conus404-monthly-osn',
'conus404-hourly-ba-onprem',
'conus404-hourly-ba-onprem-hw',
'conus404-hourly-ba-osn',
'conus404-daily-ba-onprem',
'conus404-daily-ba-osn']
'conus404-daily-ba-osn',
'conus404-pgw-hourly-onprem-hw',
'conus404-pgw-hourly-osn',
'conus404-daily-diagnostic-onprem-hw',
'conus404-daily-diagnostic-osn']
```

The characteristics of indivdual datasets can be explored:
Expand All @@ -75,14 +79,14 @@ conus404-hourly-osn:
storage_options:
anon: true
client_kwargs:
endpoint_url: https://renc.osn.xsede.org
endpoint_url: https://usgs.osn.mghpcc.org/
requester_pays: false
urlpath: s3://rsignellbucket2/hytest/conus404/conus404_hourly_202302.zarr
description: 'CONUS404 Hydro Variable subset, 40 years of hourly values. These files
were created wrfout model output files (see ScienceBase data release for more
details: https://www.sciencebase.gov/catalog/item/6372cd09d34ed907bf6c6ab1). This
dataset is stored on AWS S3 cloud storage in a requester-pays bucket. You can
work with this data for free in any environment (there are no egress fees).'
urlpath: s3://hytest/conus404/conus404_hourly.zarr
description: "CONUS404 Hydro Variable subset, 43 years of hourly values. These files\
\ were created wrfout model output files (see ScienceBase data release for more\
\ details: https://doi.org/10.5066/P9PHPK4F). This data is stored on HyTEST\u2019\
s Open Storage Network (OSN) pod. This data can be read with the S3 API and is\
\ free to work with in any computing environment (there are no egress fees)."
driver: intake_xarray.xzarr.ZarrSource
metadata:
catalog_dir: https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/subcatalogs
Expand Down
20 changes: 6 additions & 14 deletions _sources/environment_set_up/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The HyTEST workflows are built largely on the [Pangeo](https://pangeo.io/) softw

If you do not have access to USGS computational resources, we recommend you follow [these instructions](QuickStart-General.md) to set up a computational environment that can run our workflows.

## USGS Cloud Environment (only avaialble to USGS staff)
## USGS Cloud Environment (only available to USGS staff)

If you are working in the cloud, we expect you to work in a JupyterHub instance that has already been set up. If you are a USGS Water Mission Area (WMA) employee, you have a few
options for accessing a JupyterHub instance. These are described below, and more detailed instructions to set each option up are included in the hyperlinked name of the environment:
Expand All @@ -21,12 +21,9 @@ options for accessing a JupyterHub instance. These are described below, and more
(CTek) team in the Enterprise Technology Office (ETO). Anyone can put in a request to update these
shared kernels, but sometimes the update process can take several weeks.
2) [Nebari](QuickStart-Cloud-Nebari.md)<br>
The HyTEST project is currently working on deploying an instance of [Nebari](https://www.nebari.dev/) JupyterHub. HyTEST will
allow some of our priority users to test out this space once it is created. We will also provide instructions for how your project can deploy its own instance of Nebari JupyterHub. Nebari includes a shared space in the JupyterHub deployement, where all users can access and run each other's files/scripts. Any
user can also quickly modify and update a set of shared kernels so that all users are working in the same development environment. This work is currently in development, and these instructions will be updated once this is available.
The HyTEST project has successfully deployed an instance of [Nebari](https://www.nebari.dev/) JupyterHub in the USGS cloud account. Nebari includes both a private (per user) and shared file space in the JupyterHub environment. Any user can also quickly modify and update a set of shared kernels so that all members of a team can use the same development environment. HyTEST will allow some of our priority users to test their workflows in this space; please contact Amelia Snyder ([email protected]) if you would like to request access to this space. The HyTEST team is currently working on developing instructions for how your project can deploy its own instance of Nebari JupyterHub in the USGS cloud.
3) Your own JupyterHub Instance<br>
Your can also work with your own deployment of a JupyterHub environment. You will need to
make sure you have a kernel with all of the required packages set up to run these notebooks.
Your can also work with your own deployment of a JupyterHub environment. You will need to make sure you have a kernel with all of the required packages set up to run these notebooks.


## USGS HPC Environment (only avaialble to USGS staff)
Expand All @@ -38,13 +35,8 @@ Note that to access any of the HPCs, you must be on the DOI Network. If you are
You will need to access an enviroment which emulates the Jupyter server -- where the notebooks will reside and execute -- using the HPC hardware. There are many ways to do this. Here are three options, in increasing order of complexity and flexibility:

1) [Open OnDemand](OpenOnDemand.md)<br>
This option provides the most effortless access to HPC hardware using a web interface. However, this only runs on the [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html) supercomputer.
This option provides the most effortless access to HPC hardware using a web interface. However, this only runs on the [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html) and [Hovenweep](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/hovenweep.html) supercomputers.
2) [Jupyter Forward](JupyterForward.md)<br>
This option gives you more control over how to launch the server, and on which host (can be
run on [`Denali`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/denali.html), [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html), or other hosts to which you have a login). This setup requires that you
install the jupyter-forward software on your PC.
This option gives you more control over how to launch the server, and on which host (can be run on [`Denali`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/denali.html), [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html), [Hovenweep](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/hovenweep.html), or other hosts to which you have a login). This setup requires that you install the jupyter-forward software on your PC.
3) [Custom Server Script](StartScript.md)<br>
This option lets you completely customize your HPC compute environment and invoke the Jupyter
server from a command shell on the HPC. This requires familiarity with the HPC command line, file
editing, etc. This (can be
run on [`Denali`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/denali.html), [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html), or other hosts to which you have a login).
This option lets you completely customize your HPC compute environment and invoke the Jupyter server from a command shell on the HPC. This requires familiarity with the HPC command line, file editing, etc. This (can be run on [`Denali`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/denali.html), [`Tallgrass`](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/tallgrass.html), [Hovenweep](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/hovenweep.html), or other hosts to which you have a login).
Loading

0 comments on commit 3741f30

Please sign in to comment.