Update with New intro notebook

hytest-org · Oct 23, 2024 · 48aa24c · 48aa24c
1 parent aa65148
commit 48aa24c
Show file tree

Hide file tree

Showing 6 changed files with 152 additions and 280 deletions.
diff --git a/101/WhyChunk.ipynb b/101/WhyChunk.ipynb
diff --git a/101/index.md b/101/index.md
@@ -1,21 +1,10 @@
-# Chunking 101
+# Introduction to Chunking
 
-A gentle introduction to concepts and workflows. 
-
-This introductory chapter will illustrate some key concepts for writing 
-chunked data (in zarr format) to object storage in 'the cloud'.  We'll be
-eventually be writing to an OSN storage device using the S3 API, although
-you could, in theory, write anywhere (including a local file system). 
-
-The illustration dataset will be PRISM(v2), accessed via its OpenDAP 
-endpoint at <https://cida.usgs.gov/thredds/dodsC/prism_v2.html>
-
-
-Buckle up... we will get up to speed fast. 
+In this first series of notebooks, we will go over basic introductory topics associated with chunking.
+As you will soon learn, "chunking" is an an essential part of the data preparation workflow, particularly for large datasets.
+The key concepts you should understand have after this series include:
 
 ```{tableofcontents}
 ```
 
-
-The dask performance report for the total conversion workflow is [here](../performance_reports/OpenDAP_to_S3-perfreport.html)
-
+Buckle up... we will get up to speed fast.
diff --git a/_toc.yml b/_toc.yml
@@ -2,20 +2,24 @@ format: jb-book
 root: index
 
 chapters:
-- file: about/index
 - file: 101/index
   sections:
   - file: 101/WhyChunk
-  - file: 101/ExamineSourceData
-  - file: 101/EffectSizeShape
-  - file: 101/OpenDAP_to_S3
-  - file: 101/Compression
-  - file: 101/SecondaryExample
+  # - file: 101/ExamineSourceData
+  # - file: 101/EffectSizeShape
+  # - file: 101/ReadWriteChunkedFiles
+  # - file: 101/Compression
+  # - file: 101/Rechunking
+  # - file: 101/OpenDAP_to_S3
+  # - file: 101/SecondaryExample
+- file: 201/index
+  # sections:
+  # - file: 201/TBD
 - file: back/index
   sections:
-  - file: helpers.md
-    sections:
-    - file: utils
-    - file: AWS
-    - file: StartNebariCluster
-  - file: back/Appendix_A  
+  # - file: helpers.md
+    # sections:
+  #   - file: utils
+  #   - file: AWS
+  #   - file: StartNebariCluster
+  - file: back/Glossary  
diff --git a/about/index.md b/about/index.md
diff --git a/back/Glossary.md b/back/Glossary.md
@@ -1 +1,6 @@
 # Glossary
+
+- **Chunking**: The process of breaking down large amounts of data into smaller, more manageable pieces.
+- **Chunk**: Smaller, more manageable pieces of a larger dataset.
+- **Larger-than-memory**: A dataset whose memory footprint is too large to fit into memory all at once.
+- **Rechunking**: The process of changing the current chunking pattern of a dataset to another chunking pattern.
diff --git a/index.md b/index.md
@@ -1,17 +1,24 @@
-# Data Chunking
+# A Data Chunking Tutorial
 
-"Chunking" large datasets is an essential workflow in the data peparation stage of
-analysis.  Some of the large datasets are written with a chunking pattern which 
-is optimized for writing (i.e. how they are created -- model outputs, etc), and
-performs poorly for reading.  This depends on the analysis. 
+If you have found your way here, then you are probably interested in learning more about data chunking.
+In this tutorial, we will go over all levels of information on data chunking,
+from the basic introductions on the topic to complex methods of selecting optimal chunk sizes and rechunking on the cloud.
+Much of what is covered in this tutorial replicates concepts covered in a variety of materials that we cite as we go.
+However, that material has been adapted to use data that looks like data you might encounter in a HyTEST workflow.
 
-Re-chunking is a useful strategy to re-write the dataset in such a way to optimize
-a particular kind of analysis (i.e. time-series vs spatial). 
+The content is split into two primary section:
 
+ - [Introduction to Chunking](101/index.md)
+ - [Advanced Topics in Chunking](201/index.md)
 
+In [Introduction to Chunking](101/index.md), we discuss all of the basic introductory topics associated with chunking.
+As for [Advanced Topics in Chunking](201/index.md), we dive into some more advanced topics related to chunking,
+which require a firm understanding of introductory topics.
+
+Feel free to read this tutorial in order (which has been set up for those new to chunking) or jump directly to the topic that interests you:
 
 ```{tableofcontents}
 ```
 
------
-Download the environment YAML file [here](env.yml)
+If you find any issues or errors in this tutorial or have any ideas for material that should be included,
+please open an issue using GitHub icon in the upper right.