- Conda package manager (installed through miniforge)
The easiest way to install miniforge on MacOS is using Homebrew: brew install miniforge
. Alternatively, you can download and run an installer the appropriate installer for your platform from the miniforge repository.
- Clone this repo
- Navigate to the project directory using the command line prompt
- Run
conda env create --file ./environment.yml
to install dependencies in a virtual environment - Activate the virtual environment:
conda activate climate-data
- Install Python packages
python -m pip install -r requirements.txt
To retrieve data from the Climate Data Store, you need to create an ECMWF account first. Once you have created the account, log in and go to your profile page, where you'll find your API token.
Create a ~/.datapirc
conguration file, and paste the following, replacing the key with your API token from the previous step.
Note: this project uses the newer datapi package instead of cdsapi
url: https://cds.climate.copernicus.eu/api
key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
With the tool installed you can now run guclimate retrieve verify
. And you should see output similar to this:
Verifying CDS credentials
Credentials verified {...}
Install the guclimate CLI from source:
- Make sure you're in the project directory
- Run
pip install -e .
You should now be able to run the tool. Try:
guclimate --help
- To add a new dependency, add it to environment.yml and run the following command in the project directory:
conda env update --file environment.yml --prune
python -m run_tests
To retrieve data from the CDS api and process it, you need to create a recipe. Recipes are written in YAML.
At its most basic, a recipe consists of a name and a description. For example:
---
name: "Daily temperatures 2024"
description: "Get daily temperatures (global mean) for 2024"
To retrieve CDS data, define a new key under the retrieve
keyword, to refer to your dataset. In this case we'll call it daily_mean_temp
:
---
name: "Daily temperatures 2024"
description: "Get daily temperatures (global mean) for 2024"
retrieve:
daily_mean_temp:
Next we need to define the parameters for the data we want to retrieve. The parameter names and values should match what you find in the CDS web interface (click Show API Request Code).
For instance, this is what we would find in the web interface:
import cdsapi
dataset = "derived-era5-single-levels-daily-statistics"
request = {
"product_type": "reanalysis",
"variable": ["2m_temperature"],
"daily_statistic": "daily_mean",
"time_zone": "utc+00:00",
"frequency": "1_hourly"
}
client = cdsapi.Client()
client.retrieve(dataset, request).download()
And this is what it looks like in our recipe:
---
name: "Daily temperatures chart"
description: "Get daily temperatures (global mean) for 2024"
retrieve:
daily_mean_temp:
product: derived-era5-single-levels-daily-statistics
product_type: reanalysis
variable: 2m_temperature
daily_statistic: daily_mean
time_zone: "utc+00:00"
frequency: 1_hourly
We also need to define the timeframe that we're interested in, using the year
, month
and day
parameters. These parameters are treated slightly differently, so that we can simply write:
month: 1-12
rather than:
month: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Here's what our recipe looks like when we add those parameters:
---
name: "Daily temperatures chart"
description: "Get daily temperatures (global mean) for 2024"
retrieve:
daily_mean_temp:
product: derived-era5-single-levels-daily-statistics
product_type: reanalysis
variable: 2m_temperature
daily_statistic: daily_mean
time_zone: "utc+00:00"
frequency: 1_hourly
year: 2024
month: 1-12
day: 1-31
By default the CDS data covers the entire world, but for some purposes it is useful to extract a sub-region using the area
parameter. C3S defines a few standard European regions which they use for their reports.
Sticking with the example from above, we need data for the whole world. If we wanted to make that explicit we could define the request as follows:
name: "Daily temperatures chart"
description: "Get daily temperatures (global mean) for 2024"
retrieve:
daily_mean_temp:
product: derived-era5-single-levels-daily-statistics
product_type: reanalysis
variable: 2m_temperature
daily_statistic: daily_mean
time_zone: "utc+00:00"
frequency: 1_hourly
year: 2024
month: 1-12
day: 1-31
area: [90, -180, -90, 180]
The benefit of doing this is that the retrieved dataset will have its longitude coordinates converted to a -180, 180
scale, from the default 0, 360
scale.