From 519dd8433530ef953bf8c26b0900effdf4019a6b Mon Sep 17 00:00:00 2001 From: Andrew Date: Thu, 3 Nov 2022 10:28:54 -0500 Subject: [PATCH 1/2] Updated data prep and methods for zonal stats --- .../conus404_eval_data_preparation.ipynb | 52 ++++++++----------- .../conus404_zonal_point_statistics.ipynb | 15 +++++- 2 files changed, 37 insertions(+), 30 deletions(-) diff --git a/dataset_preprocessing/tutorials/CONUS404/conus404_eval_data_preparation.ipynb b/dataset_preprocessing/tutorials/CONUS404/conus404_eval_data_preparation.ipynb index b769f323..7321e13c 100644 --- a/dataset_preprocessing/tutorials/CONUS404/conus404_eval_data_preparation.ipynb +++ b/dataset_preprocessing/tutorials/CONUS404/conus404_eval_data_preparation.ipynb @@ -6,28 +6,22 @@ "source": [ "# Preparing CONUS404 and reference data for aggregation and sampling\n", "\n", - "Short paragraph describing what is about to happen\n", - "\n", - "
\n", - " Guide to pre-requisites and learning outcomes...<click to expand>\n", - " \n", - " \n", - " \n", - " \n", - "
Pre-Requisites\n", - " To get the most out of this notebook, you should already have an understanding of these topics: \n", - "
    \n", - "
  • pre-req one\n", - "
  • pre-req two\n", - "
\n", - "
Expected Results\n", - " At the end of this notebook, you should be able to: \n", - "
    \n", - "
  • outcome one\n", - "
  • outcome two\n", - "
\n", - "
\n", - "
" + "\n", + "\n", + "The data preparation step is needed in order to align the datasets for analysis. The specific \n", + "steps needed to prepare a given dataset may differ, depending on the source and the variable of\n", + "interest. \n", + "\n", + "Some steps might include: \n", + "\n", + "* Organizing the time-series index such that the time steps for both simulated and observed are congruent\n", + " * This may involve interpolation to estimate a more granular time-step than is found in the source data\n", + " * More often, an agregating function is used to 'down-sample' the dataset to a coarser time step (days vs hours).\n", + "* Coordinate aggregation units between simulated and observed \n", + " * Gridded data may be sampled per HUC-12, HUC-6, etc. to match modeled data indexed by these units. \n", + "* Re-Chunking the data to make time-series analysis more efficient (see [here](/dev/null) for a primer on re-chunking).\n", + "\n", + "## **Step 0: Importing libaries**" ] }, { @@ -2000,7 +1994,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Retrieving data from HPC or the Cloud\n", + "## **Step 2: Retrieving data from HPC or the Cloud**\n", "#### The process varies based on where the notebook is being run but generally looks this:\n", "1. (Done already) Connect to workspace (local, HPC, or QHUB) and open notebook\n", "2. Start Dask client \n", @@ -2013,7 +2007,7 @@ "metadata": {}, "source": [ "# Update to helper function after repo consolidation\n", - "## **Start a Dask client using an appropriate Dask Cluster** \n", + "### Start a Dask client using an appropriate Dask Cluster\n", "This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud." ] }, @@ -2191,7 +2185,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## **Retrieve CONUS404 from source and tranform it to a Dask array**" + "### Retrieve CONUS404 from source and tranform it to a Dask array" ] }, { @@ -2245,7 +2239,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## **Explore the data** \n", + "## **Step 3: Explore the data** \n", "(sometimes called exploratory data analysis (EDA) or exploratory spatial data analysis (ESDA) when it contains cartographic data)" ] }, @@ -2345,7 +2339,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Importing geographic extents\n", + "## **Step 3: Importing geographic extents**\n", "Sometimes geographic data is brought in and used to clip a larger dataset to an area of interest (AOI). \n", "\n", "Let's look at two ways this can be done: a user-defined polygon or using the pyNHD package. Data can also be brought in other ways such as a local file or an API request. These are covered in other tutorials such as [Reading and Writing Files (GeoPandas)](https://geopandas.org/en/stable/docs/user_guide/io.html) or using the Python [requests](https://requests.readthedocs.io/en/latest/) package for [Programattically Accesing Geospatial Data Using APIs](https://www.earthdatascience.org/courses/use-data-open-source-python/intro-to-apis/spatial-data-using-apis/).\n", @@ -2521,7 +2515,7 @@ "tags": [] }, "source": [ - "## **Putting it together: Process CONUS404 to variable and research spatial extent**\n", + "## **Step 4: Putting it together: Process CONUS404 to variable and spatial extent**\n", "In this section we are going to put together some skills we have learned so far: bring in CONUS404, select our variables, then clip to our spatial extent. This assumes that the notebook is being run on the ESIP QHub. If being run on HPC then comment/uncomment the datasets as needed.\n", "\n", "Variables: Accumulated precipitation (PREC_ACC_NC), air temperature (TK), and (calculated) surface net radiation (RNET)
\n", @@ -8876,7 +8870,7 @@ "tags": [] }, "source": [ - "### Prepare reference data\n", + "## **Step 5: Prepare reference data**\n", "\n", "Now that the CONUS404 dataset has been preprocessed, we need to do the same with datasets used for comparison with the forcings data. In this section, data will be brought in from several sources and preprocessed in data-type-appropriate ways.\n", "\n", diff --git a/dataset_preprocessing/tutorials/CONUS404/conus404_zonal_point_statistics.ipynb b/dataset_preprocessing/tutorials/CONUS404/conus404_zonal_point_statistics.ipynb index e64700b3..dea61522 100644 --- a/dataset_preprocessing/tutorials/CONUS404/conus404_zonal_point_statistics.ipynb +++ b/dataset_preprocessing/tutorials/CONUS404/conus404_zonal_point_statistics.ipynb @@ -6,8 +6,21 @@ "source": [ "# Create zonal statistics and point extractions for comparing CONUS404 and reference datasets\n", "\n", - "Short paragraph describing what is about to happen\n", + "\n", "\n", + "The pre-processing step is needed in order to align the two datasets for analysis. The specific \n", + "steps needed to prepare a given dataset may differ, depending on the source and the variable of\n", + "interest. \n", + "\n", + "Some steps might include: \n", + "\n", + "* Organizing the time-series index such that the time steps for both simulated and observed are congruent\n", + " * This may involve interpolation to estimate a more granular time-step than is found in the source data\n", + " * More often, an agregating function is used to 'down-sample' the dataset to a coarser time step (days vs hours).\n", + "* Coordinate aggregation units between simulated and observed \n", + " * Gridded data may be sampled per HUC-12, HUC-6, etc. to match modeled data indexed by these units. \n", + " * Index formats may be adjusted (e.g. a 'gage_id' may be 'USGS-01104200' in one data set, vs '01104200' in another)\n", + "* Re-Chunking the data to make time-series analysis more efficient (see [here](/dev/null) for a primer on re-chunking).\n", "
\n", " Guide to pre-requisites and learning outcomes...<click to expand>\n", " \n", From abbb504058e0a125117719f22e22bbfd81903e81 Mon Sep 17 00:00:00 2001 From: "g.trantham" Date: Mon, 7 Nov 2022 07:58:08 -0600 Subject: [PATCH 2/2] fix: filename change --- .../tutorials/nwm_streamflow/03_Vizualization.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/model_evaluation/tutorials/nwm_streamflow/03_Vizualization.ipynb b/model_evaluation/tutorials/nwm_streamflow/03_Vizualization.ipynb index 2399f07a..38bd6322 100644 --- a/model_evaluation/tutorials/nwm_streamflow/03_Vizualization.ipynb +++ b/model_evaluation/tutorials/nwm_streamflow/03_Vizualization.ipynb @@ -593,7 +593,7 @@ "outputs": [], "source": [ "#Data is loaded \n", - "DScore = pd.read_csv(r'./DScore_streamflow_benchmark.csv', dtype={'site_no':str} ).set_index('site_no', drop=False)\n", + "DScore = pd.read_csv(r'./DScore_streamflow_example.csv', dtype={'site_no':str} ).set_index('site_no', drop=False)\n", "# Merge benchmarks with cobalt data to form a single table, indexed by site_no\n", "metrics = DScore.columns.tolist()[1:] \n", "DScore = DScore.merge(\n", @@ -750,7 +750,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.10.6 ('hytest')", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" },