pycontrails.datalib.ecmwf.arco_era5

Support for ARCO ERA5.

This module supports:

  • Downloading ARCO ERA5 model level data for specific times and pressure level variables.

  • Downloading ARCO ERA5 single level data for specific times and single level variables.

  • Interpolating model level data to a target lat-lon grid and pressure levels.

  • Local caching of the downloaded and interpolated data as netCDF files.

  • Opening cached data as a pycontrails.MetDataset object.

This module requires the following additional dependencies:

Functions

open_arco_era5_model_level_data(times, ...)

Open ARCO ERA5 model level data for a specific time and variables.

open_arco_era5_single_level(times, variables)

Open ARCO ERA5 single level data for a specific date and variables.

Classes

ERA5ARCO(time, variables[, pressure_levels, ...])

ARCO ERA5 data accessed remotely through Google Cloud Storage.

class pycontrails.datalib.ecmwf.arco_era5.ERA5ARCO(time, variables, pressure_levels=None, cachestore=<object object>)

Bases: ECMWFAPI

ARCO ERA5 data accessed remotely through Google Cloud Storage.

This is a high-level interface to access and cache ARCO ERA5 for a predefined set of times, variables, and pressure levels.

Added in version 0.50.0.

Parameters:
  • time (TimeInput) – Time of the data to open.

  • variables (VariableInput) – List of variables to open.

  • pressure_levels (PressureLevelInput, optional) – Target pressure levels, [\(hPa\)]. For pressure level data, this should be a sorted (increasing or decreasing) list of integers. For single level data, this should be -1. By default, the pressure levels are set to the pressure levels at each model level between 20,000 and 50,000 ft assuming a constant surface pressure.

  • cachestore (CacheStore, optional) – Cache store to use. By default, a new disk cache store is used. If None, no caching is done. In this case, the data returned by open_metdataset() is not loaded into memory.

References

[Carver and Merose, 2023]

cache_dataset(dataset)

Cache data from data source.

Parameters:

dataset (xarray.Dataset) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

create_cachepath(t)

Return cachepath to local data file based on datetime.

Parameters:

t (datetime) – Datetime of datafile

Returns:

str – Path to cached data file

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

download_dataset(times)

Download data from data source for input times.

Parameters:

times (list[datetime]) – List of datetimes to download a store in cache

grid

Lat / Lon grid spacing

property hash

Generate a unique hash for this datasource.

Returns:

str – Unique hash for met instance (sha1)

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

If using a cloud cache store (i.e. cache.GCPCacheStore), this is where the datafile will be mirrored to a local file for access.

Parameters:
  • t (datetime) – Datetime of datafile

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

Returns:

bool – True if data file exists for datetime with all variables and pressure levels, False otherwise

property is_single_level

Return True if the datasource is single level data.

Added in version 0.50.0.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

Parameters:
  • disk_paths (str | list[str] | pathlib.Path | list[pathlib.Path]) – list of string paths to local files to open

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

    • lock: False

Returns:

xarray.Dataset – Open xarray dataset

open_metdataset(dataset=None, xr_kwargs=None, **kwargs)

Open MetDataset from data source.

This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.

Parameters:
  • dataset (xr.Dataset | None, optional) – Input xr.Dataset loaded manually. The dataset must have the same format as the original data source API or files.

  • xr_kwargs (dict[str, Any] | None, optional) – Dictionary of keyword arguments passed into xarray.open_mfdataset() when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored if dataset is input.

  • **kwargs (Any) – Keyword arguments passed through directly into MetDataset constructor.

Returns:

MetDataset – Meteorology dataset

paths

Path to local source files to load. Set to the paths of files cached in cachestore if no paths input is provided on init.

property pressure_level_variables

Variables available in the ARCO ERA5 model level data.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

pressure_levels

List of pressure levels. Set to [-1] for data without level coordinate. Use parse_pressure_levels() to handle PressureLevelInput.

set_metadata(ds)

Set met source metadata on ds.attrs.

This is called within the open_metdataset() method to set metadata on the returned MetDataset instance.

Parameters:

ds (xr.Dataset | MetDataset) – Dataset to set metadata on. Mutated in place.

property single_level_variables

Variables available in the ARCO ERA5 single level data.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

property supported_pressure_levels

Pressure levels available from datasource.

Returns:

list[int] | None – List of integer pressure levels for class. If None, no pressure level information available for class.

property supported_variables

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

property variable_ecmwfids

Return a list of variable ecmwf_ids.

Returns:

list[int] – List of int ECMWF param ids.

property variable_shortnames

Return a list of variable short names.

Returns:

list[str] – Lst of variable short names.

property variable_standardnames

Return a list of variable standard names.

Returns:

list[str] – Lst of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.

pycontrails.datalib.ecmwf.arco_era5.open_arco_era5_model_level_data(times, variables, pressure_levels)

Open ARCO ERA5 model level data for a specific time and variables.

Data is not loaded into memory, and the data is not cached.

Parameters:
  • times (list[datetime.datetime]) – Time of the data to open.

  • variables (list[met_var.MetVariable]) – List of variables to open. Unsupported variables are ignored.

  • pressure_levels (npt.ArrayLike) – Target pressure levels, [\(hPa\)].

Returns:

xarray.Dataset – Dataset with the requested variables on the target grid and pressure levels. Data is reformatted for MetDataset conventions.

References

pycontrails.datalib.ecmwf.arco_era5.open_arco_era5_single_level(times, variables)

Open ARCO ERA5 single level data for a specific date and variables.

Data is not loaded into memory, and the data is not cached.

Parameters:
  • times (list[datetime.date]) – Time of the data to open.

  • variables (list[met_var.MetVariable]) – List of variables to open.

Returns:

xarray.Dataset – Dataset with the requested variables. Data is reformatted for MetDataset conventions.

Raises:

FileNotFoundError – If the variable is not found at the requested date. This could indicate that the variable is not available in the ARCO ERA5 dataset, or that the time requested is outside the available range.