pycontrails.MetDataset

class pycontrails.MetDataset(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)

Bases: MetBase

Meteorological dataset with multiple variables.

Composition around xr.Dataset to enforce certain variables and dimensions for internal usage

Parameters:
  • data (xarray.Dataset) – xarray.Dataset containing meteorological variables and coordinates

  • cachestore (CacheStore, optional) – Cache datastore for staging intermediates with save(). Defaults to None.

  • wrap_longitude (bool, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval [-180, 180]. Defaults to False.

  • copy (bool, optional) – Copy data on construction. Defaults to True.

  • attrs (dict[str, Any], optional) – Attributes to add to data.attrs. Defaults to None. Generally, pycontrails Models may use the following attributes:

    • provider: Name of the data provider (e.g. “ECMWF”).

    • dataset: Name of the dataset (e.g. “ERA5”).

    • product: Name of the product type (e.g. “reanalysis”).

  • **attrs_kwargs (Any) – Keyword arguments to add to data.attrs. Defaults to None.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> from pycontrails.datalib.ecmwf import ERA5
>>> time = ("2022-03-01T00", "2022-03-01T02")
>>> variables = ["air_temperature", "specific_humidity"]
>>> pressure_levels = [200, 250, 300]
>>> era5 = ERA5(time, variables, pressure_levels)
>>> # Open directly as `MetDataset`
>>> met = era5.open_metdataset()
>>> # Use `data` attribute to access `xarray` object
>>> assert isinstance(met.data, xr.Dataset)
>>> # Alternatively, open with `xarray` and cast to `MetDataset`
>>> ds = xr.open_mfdataset(era5._cachepaths)
>>> met = MetDataset(ds)
>>> # Access sub-`DataArrays`
>>> mda = met["t"]  # `MetDataArray` instance, needed for interpolation operations
>>> da = mda.data  # Underlying `xarray` object
>>> # Check out a few values
>>> da[5:8, 5:8, 1, 1].values
array([[224.08959005, 224.41374427, 224.75945349],
       [224.09456429, 224.42037658, 224.76525676],
       [224.10036756, 224.42617985, 224.77106004]])
>>> # Mean temperature over entire array
>>> da.mean().load().item()
223.5083
__init__(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)

Methods

__init__(data[, cachestore, wrap_longitude, ...])

broadcast_coords(name)

Broadcast coordinates along other dimensions.

copy()

Create a shallow copy of the current class.

downselect(bbox)

Downselect met data within spatial bounding box.

downselect_met(met, *[, longitude_buffer, ...])

Downselect met to encompass a spatiotemporal region of the data.

ensure_vars(vars[, raise_error])

Ensure variables exist in xr.Dataset.

from_coords(longitude, latitude, level, time)

Create a MetDataset containing a coordinate skeleton from coordinate arrays.

from_zarr(store, **kwargs)

Create a MetDataset from a path to a Zarr store.

get(key[, default_value])

Shortcut to data.get(k, v)() method.

load(hash[, cachestore, chunks])

Load saved intermediate from cachestore.

save(**kwargs)

Save intermediate to cachestore as netcdf.

standardize_variables(variables)

Standardize variables in-place.

to_vector([transfer_attrs])

Convert a MetDataset to a GeoVectorDataset by raveling data.

update([other])

Shortcut to data.update().

wrap_longitude()

Wrap longitude coordinates.

Attributes

attrs

Pass through to self.data.attrs.

cachestore

Cache datastore to use for save() or load()

coords

Get coordinates of underlying data coordinates.

data

DataArray or Dataset

dataset_attr

Look up the 'dataset' attribute with a custom error message.

dim_order

Default dimension order for DataArray or Dataset (x, y, z, t)

hash

Generate a unique hash for this met instance.

indexes

Low level access to underlying data indexes.

is_single_level

Check if instance contains "single level" or "surface level" data.

is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

is_zarr

Check if underlying data is sourced from a Zarr group.

product_attr

Look up the 'product' attribute with a custom error message.

provider_attr

Look up the 'provider' attribute with a custom error message.

shape

Return the shape of the dimensions.

size

Return the size of (each) array in underlying data.

property attrs

Pass through to self.data.attrs.

broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

cachestore

Cache datastore to use for save() or load()

property coords

Get coordinates of underlying data coordinates.

Only return non-dimension coordinates.

See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates

Returns:

dict[str, np.ndarray] – Dictionary of coordinates

copy()

Create a shallow copy of the current class.

See xarray.Dataset.copy() for reference.

Returns:

Self – Copy of the current class

data

DataArray or Dataset

property dataset_attr

Look up the ‘dataset’ attribute with a custom error message.

Returns:

str – Dataset of the data. If not one of ‘ERA5’, ‘HRES’, ‘IFS’, or ‘GFS’, a warning is issued.

dim_order = ('longitude', 'latitude', 'level', 'time')

Default dimension order for DataArray or Dataset (x, y, z, t)

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

Self – Return downselected data

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))

Downselect met to encompass a spatiotemporal region of the data.

Warning

This method is analogous to GeoVectorDataset.downselect_met(). It does not change the instance data, but instead operates on the met input. This method is different from downselect() which operates on the instance data.

Changed in version 0.54.5: Data is no longer copied when downselecting.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

ensure_vars(vars, raise_error=True)

Ensure variables exist in xr.Dataset.

Parameters:
  • vars (MetVariable | str | Sequence[MetVariable | str | list[MetVariable]]) – List of MetVariable (or string key), or individual MetVariable (or string key). If vars contains an element with a list[MetVariable], then only one variable in the list must be present in dataset.

  • raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

list[str] – List of met keys verified in MetDataset. Returns an empty list if any MetVariable is missing.

Raises:

KeyError – Raises when dataset does not contain variable in vars

classmethod from_coords(longitude, latitude, level, time)

Create a MetDataset containing a coordinate skeleton from coordinate arrays.

Parameters:
  • longitude, latitude (npt.ArrayLike | float) – Horizontal coordinates, in [\(\deg\)]

  • level (npt.ArrayLike | float) – Vertical coordinate, in [\(hPa\)]

  • time (npt.ArrayLike | np.datetime64,) – Temporal coordinates, in [\(UTC\)]. Will be sorted.

Returns:

Self – MetDataset with no variables.

Examples

>>> # Create skeleton MetDataset
>>> longitude = np.arange(0, 10, 0.5)
>>> latitude = np.arange(0, 10, 0.5)
>>> level = [250, 300]
>>> time = np.datetime64("2019-01-01")
>>> met = MetDataset.from_coords(longitude, latitude, level, time)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 360B
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    *empty*
>>> met.shape
(20, 20, 2, 1)
>>> met.size
800
>>> # Fill it up with some constant data
>>> met["temperature"] = xr.DataArray(np.full(met.shape, 234.5), coords=met.coords)
>>> met["humidity"] = xr.DataArray(np.full(met.shape, 0.5), coords=met.coords)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 13kB
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    temperature   (longitude, latitude, level, time) float64 6kB 234.5 ... 234.5
    humidity      (longitude, latitude, level, time) float64 6kB 0.5 0.5 ... 0.5
>>> # Convert to a GeoVectorDataset
>>> vector = met.to_vector()
>>> vector.dataframe.head()
longitude  latitude  level       time  temperature  humidity
0        0.0       0.0  250.0 2019-01-01        234.5       0.5
1        0.0       0.0  300.0 2019-01-01        234.5       0.5
2        0.0       0.5  250.0 2019-01-01        234.5       0.5
3        0.0       0.5  300.0 2019-01-01        234.5       0.5
4        0.0       1.0  250.0 2019-01-01        234.5       0.5
classmethod from_zarr(store, **kwargs)

Create a MetDataset from a path to a Zarr store.

Parameters:
Returns:

Self – MetDataset with data from Zarr store.

get(key, default_value=None)

Shortcut to data.get(k, v)() method.

Parameters:
  • key (str) – Key to get from data

  • default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values returned from data.get(key, default_value)

property hash

Generate a unique hash for this met instance.

Note this is not as robust as it could be since repr cuts off.

Returns:

str – Unique hash for met instance (sha1)

property indexes

Low level access to underlying data indexes.

This method is typically is faster for accessing coordinate indexes.

Added in version 0.25.2.

Returns:

dict[Hashable, pd.Index] – Dictionary of indexes.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 13))
>>> variables = "air_temperature", "specific_humidity"
>>> levels = [200, 300]
>>> era5 = ERA5(times, variables, levels)
>>> mds = era5.open_metdataset()
>>> mds.indexes["level"].to_numpy()
array([200., 300.])
>>> mda = mds["air_temperature"]
>>> mda.indexes["level"].to_numpy()
array([200., 300.])
property is_single_level

Check if instance contains “single level” or “surface level” data.

This method checks if level dimension contains a single value equal to -1, the pycontrails convention for surface only data.

Returns:

bool – If instance contains single level data.

property is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

Assumes the longitude dimension is sorted (this is established by the MetDataset or MetDataArray constructor).

Returns:

bool – True if longitude coordinates cover [-180, 180]

property is_zarr

Check if underlying data is sourced from a Zarr group.

Implementation is very brittle, and may break as external libraries change.

Some dask intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.

Added in version 0.26.0.

Returns:

bool – If data is based on a Zarr group.

classmethod load(hash, cachestore=None, chunks=None)

Load saved intermediate from cachestore.

Parameters:
  • hash (str) – Saved hash to load.

  • cachestore (CacheStore, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.

  • chunks (dict[str: int], optional) – Chunks kwarg passed to xarray.open_mfdataset() when opening files.

Returns:

Self – New MetDataArray with loaded data.

property product_attr

Look up the ‘product’ attribute with a custom error message.

Returns:

str – Product of the data. If not one of ‘forecast’, ‘ensemble’, or ‘reanalysis’, a warning is issued.

property provider_attr

Look up the ‘provider’ attribute with a custom error message.

Returns:

str – Provider of the data. If not one of ‘ECMWF’ or ‘NCEP’, a warning is issued.

save(**kwargs)

Save intermediate to cachestore as netcdf.

Load and restore using load().

Parameters:

**kwargs (Any) – Keyword arguments passed directly to xarray.Dataset.to_netcdf()

Returns:

list[str] – Returns filenames saved

property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

standardize_variables(variables)

Standardize variables in-place.

Parameters:

variables (Iterable[MetVariable]) – Data source variables

to_vector(transfer_attrs=True)

Convert a MetDataset to a GeoVectorDataset by raveling data.

If data is lazy, it will be loaded.

Parameters:

transfer_attrs (bool, optional) – Transfer attributes from data to output GeoVectorDataset. By default, True, meaning that attributes are transferred.

Returns:

GeoVectorDataset – Converted GeoVectorDataset. The variables on the returned instance include all of those on the input instance, plus the four core spatial temporal variables.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = "2022-03-01",  "2022-03-01T01"
>>> variables = ["air_temperature", "specific_humidity"]
>>> levels = [250, 200]
>>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels)
>>> met = era5.open_metdataset()
>>> met.to_vector(transfer_attrs=False)
GeoVectorDataset [6 keys x 4152960 length, 0 attributes]
    Keys: longitude, latitude, level, time, air_temperature, ..., specific_humidity
    Attributes:
    time                [2022-03-01 00:00:00, 2022-03-01 01:00:00]
    longitude           [-180.0, 179.75]
    latitude            [-90.0, 90.0]
    altitude            [10362.8, 11783.9]
update(other=None, **kwargs)

Shortcut to data.update().

See xarray.Dataset.update() for reference.

Parameters:
  • other (MutableMapping) – Variables with which to update this dataset

  • **kwargs (Any) – Variables defined by keyword arguments. If a variable exists both in other and as a keyword argument, the keyword argument takes precedence.

See also

-

meth:xarray.Dataset.update

wrap_longitude()

Wrap longitude coordinates.

Returns:

Self – Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped