pycontrails.datalib.ecmwf.HRES

class pycontrails.datalib.ecmwf.HRES(time, variables, pressure_levels=-1, paths=None, cachepath=None, grid=0.25, stream='oper', field_type='fc', forecast_time=None, cachestore=<object object>, url=None, key=None, email=None)

Bases: ECMWFAPI

Class to support HRES data access, download, and organization.

Requires account with ECMWF and API key.

API credentials set in local ~/.ecmwfapirc file:

{
    "url": "https://api.ecmwf.int/v1",
    "email": "<email>",
    "key": "<key>"
}

Credentials can also be provided directly url key, and email keyword args.

See ecmwf-api-client documentation for more information.

Parameters:
  • time (metsource.TimeInput | None) – The time range for data retrieval, either a single datetime or (start, end) datetime range. Input must be a datetime-like or tuple of datetime-like (datetime, pandas.Timestamp, numpy.datetime64) specifying the (start, end) of the date range, inclusive. If forecast_time is unspecified, the forecast time will be assumed to be the nearest synoptic hour: 00, 06, 12, 18. All subsequent times will be downloaded for relative to forecast_time. If None, paths must be defined and all time coordinates will be loaded from files.

  • variables (metsource.VariableInput) – Variable name (i.e. “air_temperature”, [“air_temperature, relative_humidity”]) See pressure_level_variables for the list of available variables.

  • pressure_levels (metsource.PressureLevelInput, optional) – Pressure levels for data, in hPa (mbar) Set to -1 for to download surface level parameters. Defaults to -1.

  • paths (str | list[str] | pathlib.Path | list[pathlib.Path] | None, optional) – Path to CDS NetCDF files to load manually. Can include glob patterns to load specific files. Defaults to None, which looks for files in the cachestore or CDS.

  • grid (float, optional) – Specify latitude/longitude grid spacing in data. Defaults to 0.25.

  • stream (str, optional) – “oper” = atmospheric model/HRES, “enfo” = ensemble forecast. Defaults to “oper” (HRES),

  • field_type (str, optional) – Field type can be e.g. forecast (fc), perturbed forecast (pf), control forecast (cf), analysis (an). Defaults to “fc”.

  • forecast_time (DatetimeLike, optional) – Specify forecast run by runtime. Defaults to None.

  • cachestore (cache.CacheStore | None, optional) – Cache data store for staging data files. Defaults to cache.DiskCacheStore. If None, cache is turned off.

  • url (str) – Override ecmwf-api-client url

  • key (str) – Override ecmwf-api-client key

  • email (str) – Override ecmwf-api-client email

Notes

MARS key word definitions

  • class: in most cases this will be operational data, or “od”

  • stream: “enfo” = ensemble forecast, “oper” = atmospheric model/HRES

  • expver: experimental version, production data is 1 or 2

  • date: there are numerous acceptible date formats

  • time: forecast base time, always in synoptic time (0,6,12,18 UTC)

  • type: forecast (oper), perturbed or control forecast (enfo only), or analysis

  • levtype: options include surface, pressure levels, or model levels

  • levelist: list of levels in format specified by levtype levelist

  • param: list of variables in catalog number, long name or short name

  • step: hourly time steps from base forecast time

  • number: for ensemble forecasts, ensemble numbers

  • format: specify netcdf instead of default grib, DEPRECATED format

  • grid: specify model return grid spacing

Local paths are loaded using xarray.open_mfdataset(). Pass xr_kwargs inputs to open_metdataset() to customize file loading.

Examples

>>> from datetime import datetime
>>> from pycontrails import GCPCacheStore
>>> from pycontrails.datalib.ecmwf import HRES
>>> # Store data files to local disk (default behavior)
>>> times = (datetime(2021, 5, 1, 2), datetime(2021, 5, 1, 3))
>>> hres = HRES(times, variables="air_temperature", pressure_levels=[300, 250])
>>> # Cache files to google cloud storage
>>> gcp_cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="ecmwf",
... )
>>> hres = HRES(
...     times,
...     variables="air_temperature",
...     pressure_levels=[300, 250],
...     cachestore=gcp_cache
... )
__init__(time, variables, pressure_levels=-1, paths=None, cachepath=None, grid=0.25, stream='oper', field_type='fc', forecast_time=None, cachestore=<object object>, url=None, key=None, email=None)

Methods

__init__(time, variables[, pressure_levels, ...])

cache_dataset(dataset)

Cache data from data source.

create_cachepath(t)

Return cachepath to local data file based on datetime.

create_synoptic_time_ranges(timesteps)

Create synoptic time bounds encompassing date range.

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

download_dataset(times)

Download data from data source for input times.

generate_mars_request([forecast_time, ...])

Generate MARS request in MARS request syntax.

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

list_from_mars()

List metadata on query from MARS.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

open_metdataset([dataset, xr_kwargs])

Open MetDataset from data source.

set_metadata(ds)

Set met source metadata on ds.attrs.

Attributes

email

field_type

Field type, forecast ("fc"), perturbed forecast ("pf"), control forecast ("cf"), analysis ("an").

forecast_time

Forecast run time, either specified or assigned by the closest previous forecast run

key

server

Handle to ECMWFService client

stream

stream type, "oper" = atmospheric model/HRES, "enfo" = ensemble forecast.

url

grid

Lat / Lon grid spacing

hash

Generate a unique hash for this datasource.

is_single_level

Return True if the datasource is single level data.

paths

Path to local source files to load.

pressure_level_variables

ECMWF pressure level parameters.

pressure_levels

List of pressure levels.

single_level_variables

ECMWF surface level parameters.

step_offset

Difference between forecast_time and first timestep.

steps

Forecast steps from forecast_time corresponding within input time.

supported_pressure_levels

Get pressure levels available from MARS.

supported_variables

Parameters available from data source.

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

variable_ecmwfids

Return a list of variable ecmwf_ids.

variable_shortnames

Return a list of variable short names.

variable_standardnames

Return a list of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

cache_dataset(dataset)

Cache data from data source.

Parameters:

dataset (xarray.Dataset) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.

cachestore

Cache store for intermediates while processing data source If None, cache is turned off.

create_cachepath(t)

Return cachepath to local data file based on datetime.

Parameters:

t (datetime) – Datetime of datafile

Returns:

str – Path to cached data file

static create_synoptic_time_ranges(timesteps)

Create synoptic time bounds encompassing date range.

Extracts time bounds for synoptic time range ([00:00, 11:59], [12:00, 23:59]) for a list of input timesteps.

Parameters:

timesteps (list[pd.Timestamp]) – List of timesteps formatted as pd.Timestamps. Often this it the output from pd.date_range()

Returns:

list[tuple[pd.Timestamp, pd.Timestamp]] – List of tuple time bounds that can be used as inputs to HRES(time=...)

download(**xr_kwargs)

Confirm all data files are downloaded and available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

download_dataset(times)

Download data from data source for input times.

Parameters:

times (list[:class:`datetime]`) – List of datetimes to download and store in cache datastore

email
field_type

Field type, forecast (“fc”), perturbed forecast (“pf”), control forecast (“cf”), analysis (“an”).

forecast_time

Forecast run time, either specified or assigned by the closest previous forecast run

generate_mars_request(forecast_time=None, steps=None, request_type='retrieve', request_format='mars')

Generate MARS request in MARS request syntax.

Parameters:
  • forecast_time (datetime, optional) – Base datetime for the forecast. Defaults to forecast_time.

  • steps (list[int], optional) – list of steps. Defaults to steps.

  • request_type (str, optional) – “retrieve” for download request or “list” for metadata request. Defaults to “retrieve”.

  • request_format (str, optional) – “mars” for MARS string format, or “dict” for dict version. Defaults to “mars”.

Returns:

str | dict[str, Any] – Returns MARS query string if request_format is “mars”. Returns dict query if request_format is “dict”

Notes

Brief overview of MARS request syntax

grid

Lat / Lon grid spacing

property hash

Generate a unique hash for this datasource.

Returns:

str – Unique hash for met instance (sha1)

is_datafile_cached(t, **xr_kwargs)

Check datafile defined by datetime for variables and pressure levels in class.

If using a cloud cache store (i.e. cache.GCPCacheStore), this is where the datafile will be mirrored to a local file for access.

Parameters:
  • t (datetime) – Datetime of datafile

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

Returns:

bool – True if data file exists for datetime with all variables and pressure levels, False otherwise

property is_single_level

Return True if the datasource is single level data.

Added in version 0.50.0.

key
list_from_mars()

List metadata on query from MARS.

Returns:

str – Metadata for MARS request. Note this is queued the same as data requests.

list_timesteps_cached(**xr_kwargs)

Get a list of data files available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

list_timesteps_not_cached(**xr_kwargs)

Get a list of data files not available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

open_dataset(disk_paths, **xr_kwargs)

Open multi-file dataset in xarray.

Parameters:
  • disk_paths (str | list[str] | pathlib.Path | list[pathlib.Path]) – list of string paths to local files to open

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

    • lock: False

Returns:

xarray.Dataset – Open xarray dataset

open_metdataset(dataset=None, xr_kwargs=None, **kwargs)

Open MetDataset from data source.

This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.

Parameters:
  • dataset (xr.Dataset | None, optional) – Input xr.Dataset loaded manually. The dataset must have the same format as the original data source API or files.

  • xr_kwargs (dict[str, Any] | None, optional) – Dictionary of keyword arguments passed into xarray.open_mfdataset() when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored if dataset is input.

  • **kwargs (Any) – Keyword arguments passed through directly into MetDataset constructor.

Returns:

MetDataset – Meteorology dataset

paths

Path to local source files to load. Set to the paths of files cached in cachestore if no paths input is provided on init.

property pressure_level_variables

ECMWF pressure level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

pressure_levels

List of pressure levels. Set to [-1] for data without level coordinate. Use parse_pressure_levels() to handle PressureLevelInput.

server

Handle to ECMWFService client

set_metadata(ds)

Set met source metadata on ds.attrs.

This is called within the open_metdataset() method to set metadata on the returned MetDataset instance.

Parameters:

ds (xr.Dataset | MetDataset) – Dataset to set metadata on. Mutated in place.

property single_level_variables

ECMWF surface level parameters.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

property step_offset

Difference between forecast_time and first timestep.

Returns:

int – Number of steps to offset in order to retrieve data starting from input time. Returns 0 if timesteps is empty when loading from paths.

property steps

Forecast steps from forecast_time corresponding within input time.

Returns:

list[int] – List of forecast steps relative to forecast_time

stream

stream type, “oper” = atmospheric model/HRES, “enfo” = ensemble forecast.

property supported_pressure_levels

Get pressure levels available from MARS.

Returns:

list[int] – List of integer pressure level values

property supported_variables

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

timesteps

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

url
property variable_ecmwfids

Return a list of variable ecmwf_ids.

Returns:

list[int] – List of int ECMWF param ids.

property variable_shortnames

Return a list of variable short names.

Returns:

list[str] – Lst of variable short names.

property variable_standardnames

Return a list of variable standard names.

Returns:

list[str] – Lst of variable standard names.

variables

Variables requested from data source Use parse_variables() to handle VariableInput.