Skip to content

Commit

Permalink
[#7]: describes helpers a little better
Browse files Browse the repository at this point in the history
  • Loading branch information
g.trantham committed Apr 7, 2023
1 parent 1051c6e commit 769c51b
Show file tree
Hide file tree
Showing 4 changed files with 137 additions and 17 deletions.
19 changes: 17 additions & 2 deletions AWS.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,14 @@
"# AWS Credentials Helper\n",
"\n",
"This notebook helps set AWS credentials based on already-specified \n",
"environment variables for profile and S3 endpoint. "
"environment variables for profile and S3 endpoint. \n",
"\n",
"Before this notebook is called, you can specify a particular profile and \n",
"endpoint you'd like to use. Do this by setting the appropriate environment\n",
"variables: `AWS_PROFILE` and `AWS_S3_ENDPOINT`. \n",
"\n",
"If these environment variables are not set, defaults will be used (as specified\n",
"in the code block below). "
]
},
{
Expand Down Expand Up @@ -39,10 +46,18 @@
" logging.error(\"Problem parsing the AWS credentials file. \")"
]
},
{
"cell_type": "markdown",
"id": "4f7f0646-2a0a-4a1d-aae2-9045dd50622e",
"metadata": {},
"source": [
"It is extremely important that you **never** set any of the access keys or secrets directly -- we never want to include any of those values as string literals in any code. This code is committed to a public repository, so doing this would essentially publish those secrets. **ALWAYS** parse the config file as demonstrated above in order to obtain the access key and the secret access key. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e446e487-dd85-4d7d-a8b5-10d63a23dde2",
"id": "f473e813-dfff-4e91-a87c-bf7f74a414e8",
"metadata": {},
"outputs": [],
"source": []
Expand Down
99 changes: 91 additions & 8 deletions StartNebariCluster.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,72 @@
{
"cell_type": "code",
"execution_count": null,
"id": "71c2ef84-bed9-4dd6-9919-8cab74075035",
"id": "2d30b864-d9d5-4bbc-a679-18796516712d",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import logging \n",
"try:\n",
" from dask_gateway import Gateway\n",
"except ImportError:\n",
" logging.error(\"Unable to import Dask Gateway. Are you running in a cloud compute environment?\\n\")\n",
" raise\n",
"os.environ['DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION'] = \"1.0\"\n",
" raise\n"
]
},
{
"cell_type": "markdown",
"id": "c7b4e8d1-6e81-4592-8331-b61d4ef12cfd",
"metadata": {},
"source": [
"## Dask Gateway Options\n",
"\n",
"The cluster scheduler on nebari makes use of a `Gateway`. This handles the \n",
"instantiation of clusters of workers, and gives us a way to monitor their\n",
"progress. Gateways are not used on all clustered systems (KubeCluster is\n",
"one alternative you might find on other cloud platforms -- such as `pangeo.chs.usgs.gov`). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b6c6f4e-a945-43f6-a56c-15d6d1b02a7a",
"metadata": {},
"outputs": [],
"source": [
"gateway = Gateway()\n",
"os.environ['DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION'] = \"1.0\"\n",
"_options = gateway.cluster_options()\n",
"_options.conda_environment='users/users-pangeo' ##<< this is the conda environment we use on nebari.\n",
"_options.profile = 'Medium Worker'\n",
"_options.profile = 'Medium Worker'"
]
},
{
"cell_type": "markdown",
"id": "a895e027-cf9a-4d52-907b-6891339f48c4",
"metadata": {
"tags": []
},
"source": [
"## AWS Environment Variables\n",
"By default, the cluster does not hand the entire set of environment variables to\n",
"each of the workers. This is an important default to override in the case of the\n",
"AWS configuration parameters. \n",
"\n",
"Because individual workers in the cluster do not have access to the standard file\n",
"system (where `~/.aws/credentials` is), the workers do not have a way to obtain\n",
"their AWS credentials unless we hand them over as environment variables. So... we\n",
"have to establish key variables in the environment, and explicity pass those to\n",
"the cluster workers at the time the cluster is started: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47b9fc40-c749-4d12-b9fe-a7eccd6b11fa",
"metadata": {},
"outputs": [],
"source": [
"_env_to_add={}\n",
"aws_env_vars=['AWS_ACCESS_KEY_ID',\n",
" 'AWS_SECRET_ACCESS_KEY',\n",
Expand All @@ -40,11 +89,45 @@
"for _e in aws_env_vars:\n",
" if _e in os.environ:\n",
" _env_to_add[_e] = os.environ[_e]\n",
"_options.environment_vars = _env_to_add \n",
"_options.environment_vars = _env_to_add "
]
},
{
"cell_type": "markdown",
"id": "ea7a9269-501b-40a7-9478-e42792600549",
"metadata": {},
"source": [
"## Cluster Start"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a59614f-774e-4943-886b-251f628fb042",
"metadata": {},
"outputs": [],
"source": [
"cluster = gateway.new_cluster(_options) ##<< create cluster via the dask gateway\n",
"cluster.adapt(minimum=10, maximum=30) ##<< Sets scaling parameters. \n",
"client = cluster.get_client()\n",
"\n",
"client = cluster.get_client()"
]
},
{
"cell_type": "markdown",
"id": "80bf775c-c8b5-49d8-b31f-bdf734297426",
"metadata": {},
"source": [
"## Notify\n",
"Give the user the link by which they can monitor the cluster workers' progress and status. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "394a8ff6-a154-4679-b217-22366c24ea51",
"metadata": {},
"outputs": [],
"source": [
"print(\"The 'cluster' object can be used to adjust cluster behavior. i.e. 'cluster.adapt(minimum=10)'\")\n",
"print(\"The 'client' object can be used to directly interact with the cluster. i.e. 'client.submit(func)' \")\n",
"print(f\"The link to view the client dashboard is:\\n> {client.dashboard_link}\")"
Expand Down
17 changes: 15 additions & 2 deletions helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is a collection of helpers/demo scripts to show how we like to
perform common tasks that might be replicated within any notebook
in this repository:
in this repository.

```{tableofcontents}
```
Expand All @@ -11,7 +11,7 @@ in this repository:

Supposing that you want to start the dask cluster on the `nebari` cloud
hosting environment. You could do that by hand using a dask `Gateway()`,
or you could just:
or you could use our boilerplate:

```
%run StartNebariCluster.ipynb
Expand All @@ -20,3 +20,16 @@ or you could just:
That cell "magic" will run and load the named notebook within the context
of the notebook calling it. It is similar to module loading in python, but
allows us to document helper operations in their own notebooks.

This mechanism allows us to start the cluter in exactly the same way every
time, and with only one line of code in each notebook that needs a cluster.
See the {doc}`StartNebariCluster` itself to see how you might modify its
behavior without having to rewrite or replicate it.

## Why?

This mechanism is similar to a more typical python module import. We like
the running of external notebooks because it allows us to use jupyter-book
friendly explanations of each helper and its use. And it also prevents
notebooks from having to modify `sys.path` in order to import modules using
the standard mechanism.
19 changes: 14 additions & 5 deletions utils.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,16 @@
"id": "5d838fc2-b153-4511-b7a5-b7ce360f72ba",
"metadata": {},
"source": [
"# Utility Functions\n"
"# Utility Functions\n",
"\n",
"This notebook contains a collection of minor utility functions that can help\n",
"with common tasks in other notebook. To use these helper functions: \n",
"\n",
"```python\n",
"%run utils.ipynb\n",
"```\n",
"Then you will have access to the functions defined below. You may call them \n",
"as if they were defined in the notebook which \"ran\" this notebook. "
]
},
{
Expand All @@ -26,16 +35,16 @@
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"from importlib.metadata import version as _v\n",
"from sys import version as _sysver\n",
"from importlib.metadata import version as _ver\n",
"\n",
"def _versions(mlist=None):\n",
" print(\"Python :\", sys.version.replace('\\n', '')) \n",
" print(\"Python :\", _sysver.replace('\\n', '')) \n",
" if not mlist:\n",
" mlist = ['dask', 'xarray', 'fsspec', 'zarr', 's3fs' ]\n",
" for m in sorted(mlist):\n",
" try:\n",
" print(f\"{m:10s} : {_v(m)}\")\n",
" print(f\"{m:10s} : {_ver(m)}\")\n",
" except ModuleNotFoundError:\n",
" print(f\"{m:10s} : --\")"
]
Expand Down

0 comments on commit 769c51b

Please sign in to comment.