pycontrails.core.datalib#

Datalib utilities.

Module Attributes

NETCDF_ENGINE

NetCDF engine to use for parsing netcdf files

DEFAULT_CHUNKS

Default chunking strategy when opening datasets with xarray

OPEN_IN_PARALLEL

Whether to open multi-file datasets in parallel

OPEN_WITH_LOCK

Whether to use file locking when opening multi-file datasets

Functions

parse_grid(grid, supported)

Parse input grid spacing.

parse_pressure_levels(pressure_levels[, ...])

Check input pressure levels are consistent type and ensure levels exist in ECMWF data source.

parse_timesteps(time[, freq])

Parse time input into set of time steps.

parse_variables(variables, supported)

Parse input variables.

round_hour(time, hour)

Round time to the nearest whole hour before input time.

Classes

MetDataSource(time, variables[, ...])

Abstract class for wrapping meteorology data sources.

pycontrails.core.datalib.DEFAULT_CHUNKS = {'time': 1}#

Default chunking strategy when opening datasets with xarray

class pycontrails.core.datalib.MetDataSource(time, variables, pressure_levels=[-1], paths=None, grid=None, **kwargs)#

Bases: ABC

Abstract class for wrapping meteorology data sources.

abstract cache_dataset(dataset)#

Cache data from data source.

Parameters:

dataset (xarray.Dataset) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.

cachestore#

Cache store for intermediates while processing data source If None, cache is turned off.

abstract create_cachepath(t)#

Return cachepath to local data file based on datetime.

Parameters:

t (datetime) – Datetime of datafile

Returns:

str – Path to cached data file

download(**xr_kwargs)#

Confirm all data files are downloaded and available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

abstract download_dataset(times)#

Download data from data source for input times.

Parameters:

times (list[datetime]) – List of datetimes to download a store in cache

grid#

Lat / Lon grid spacing

property hash#

Generate a unique hash for this datasource.

Returns:

str – Unique hash for met instance (sha1)

is_datafile_cached(t, **xr_kwargs)#

Check datafile defined by datetime for variables and pressure levels in class.

If using a cloud cache store (i.e. cache.GCPCacheStore), this is where the datafile will be mirrored to a local file for access.

Parameters:
  • t (datetime) – Datetime of datafile

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: True

Returns:

bool – True if data file exists for datetime with all variables and pressure levels, False otherwise

list_timesteps_cached(**xr_kwargs)#

Get a list of data files available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

list_timesteps_not_cached(**xr_kwargs)#

Get a list of data files not available locally in the cachestore.

Parameters:

**xr_kwargs – Passed into xarray.open_dataset() via is_datafile_cached().

open_dataset(disk_paths, **xr_kwargs)#

Open multi-file dataset in xarray.

Parameters:
  • disk_paths (str | list[str] | pathlib.Path | list[pathlib.Path]) – list of string paths to local files to open

  • **xr_kwargs (Any) – Additional kwargs passed directly to xarray.open_mfdataset() when opening files. By default, the following values are used if not specified:

    • chunks: {“time”: 1}

    • engine: “netcdf4”

    • parallel: False

    • lock: False

Returns:

xarray.Dataset – Open xarray dataset

abstract open_metdataset(dataset=None, xr_kwargs=None, **kwargs)#

Open MetDataset from data source.

This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.

Parameters:
  • dataset (xr.Dataset | None, optional) – Input xr.Dataset loaded manually. The dataset must have the same format as the original data source API or files.

  • xr_kwargs (dict[str, int] | None, optional) – Dictionary of keyword arguments passed into xarray.open_mfdataset() when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored if dataset is input.

  • **kwargs (Any) – Keyword arguments passed through directly into MetDataset constructor.

Returns:

MetDataset – Meteorology dataset

paths#

Path to local source files to load. Set to the paths of files cached in cachestore if no paths input is provided on init.

property pressure_level_variables#

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

pressure_levels#

List of pressure levels. Set to [-1] for data without level coordinate. Use parse_pressure_levels() to handle PressureLevelInput.

abstract set_metadata(ds)#

Set met source metadata on ds.attrs.

This is called within the open_metdataset() method to set metadata on the returned MetDataset instance.

Parameters:

ds (xr.Dataset | MetDataset) – Dataset to set metadata on. Mutated in place.

property single_level_variables#

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

property supported_pressure_levels#

Pressure levels available from datasource.

Returns:

list[int] | None – List of integer pressure levels for class. If None, no pressure level information available for class.

property supported_variables#

Parameters available from data source.

Returns:

list[MetVariable] | None – List of MetVariable available in datasource

timesteps#

List of individual timesteps from data source derived from time Use parse_time() to handle TimeInput.

property variable_shortnames#

Return a list of variable short names.

Returns:

list[str] – Lst of variable short names.

property variable_standardnames#

Return a list of variable standard names.

Returns:

list[str] – Lst of variable standard names.

variables#

Variables requested from data source Use parse_variables() to handle VariableInput.

pycontrails.core.datalib.NETCDF_ENGINE = 'netcdf4'#

NetCDF engine to use for parsing netcdf files

pycontrails.core.datalib.OPEN_IN_PARALLEL = False#

Whether to open multi-file datasets in parallel

pycontrails.core.datalib.OPEN_WITH_LOCK = False#

Whether to use file locking when opening multi-file datasets

pycontrails.core.datalib.parse_grid(grid, supported)#

Parse input grid spacing.

Parameters:
  • grid (float) – Input grid float

  • supported (Sequence[float]) – Sequence of support grid values

Returns:

float – Parsed grid spacing

Raises:

ValueError – Raises ValueError when grid is not in supported

pycontrails.core.datalib.parse_pressure_levels(pressure_levels, supported=None)#

Check input pressure levels are consistent type and ensure levels exist in ECMWF data source.

Parameters:
  • pressure_levels (PressureLevelInput) – Input pressure levels for data, in hPa (mbar) Set to [-1] to represent surface level.

  • supported (list[int], optional) – List of supported pressures levels in data source

Returns:

list[int] – List of integer pressure levels supported by ECMWF data source

Raises:

ValueError – Raises ValueError if pressure level is not supported by ECMWF data source

pycontrails.core.datalib.parse_timesteps(time, freq='1H')#

Parse time input into set of time steps.

If input time is length 2, this creates a range of equally spaced time points between [start, end] with interval freq.

Parameters:
  • time (TimeInput | None) – Input datetime(s) specifying the time or time range of the data [start, end]. Either a single datetime-like or tuple of datetime-like with the first value the start of the date range and second value the end of the time range. Input values can be any type compatible with pandas.to_datetime().

  • freq (str | None, optional) – Timestep interval in range. See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases for a list of frequency aliases. If None, returns input time as a list. Defaults to “1H”.

Returns:

list[datetime] – List of unique datetimes. If input time is None, returns an empty list

Raises:

ValueError – Raises when the time has len > 2 or when time elements fail to be parsed with pd.to_datetime

pycontrails.core.datalib.parse_variables(variables, supported)#

Parse input variables.

Parameters:
  • variables (VariableInput) – Variable name, or sequence of variable names. i.e. "air_temperature", ["air_temperature, relative_humidity"], [130], [AirTemperature], [[EastwardWind, NorthwardWind]] If an element is a list of MetVariable, the first MetVariable that is supported will be chosen.

  • supported (list[MetVariable]) – Supported MetVariable.

Returns:

list[MetVariable] – List of MetVariable

Raises:

ValueError – Raises ValueError if variable is not supported

pycontrails.core.datalib.round_hour(time, hour)#

Round time to the nearest whole hour before input time.

Parameters:
  • time (datetime) – Input time

  • hour (int) – Hour to round down time

Returns:

datetime – Rounded time

Raises:

ValueError – Description