pycontrails.MetDataset

class pycontrails.MetDataset(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)

Bases: MetBase

Meteorological dataset with multiple variables.

Composition around xr.Dataset to enforce certain variables and dimensions for internal usage

Parameters:
  • data (xarray.Dataset) – xarray.Dataset containing meteorological variables and coordinates

  • cachestore (CacheStore, optional) – Cache datastore for staging intermediates with save(). Defaults to None.

  • wrap_longitude (bool, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval [-180, 180]. Defaults to False.

  • copy (bool, optional) – Copy data on construction. Defaults to True.

  • attrs (dict[str, Any], optional) – Attributes to add to data.attrs. Defaults to None. Generally, pycontrails Models may use the following attributes:

    • provider: Name of the data provider (e.g. “ECMWF”).

    • dataset: Name of the dataset (e.g. “ERA5”).

    • product: Name of the product type (e.g. “reanalysis”).

  • **attrs_kwargs (Any) – Keyword arguments to add to data.attrs. Defaults to None.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> from pycontrails.datalib.ecmwf import ERA5
>>> time = ("2022-03-01T00", "2022-03-01T02")
>>> variables = ["air_temperature", "specific_humidity"]
>>> pressure_levels = [200, 250, 300]
>>> era5 = ERA5(time, variables, pressure_levels)
>>> # Open directly as `MetDataset`
>>> met = era5.open_metdataset()
>>> # Use `data` attribute to access `xarray` object
>>> assert isinstance(met.data, xr.Dataset)
>>> # Alternatively, open with `xarray` and cast to `MetDataset`
>>> ds = xr.open_mfdataset(era5._cachepaths)
>>> met = MetDataset(ds)
>>> # Access sub-`DataArrays`
>>> mda = met["t"]  # `MetDataArray` instance, needed for interpolation operations
>>> da = mda.data  # Underlying `xarray` object
>>> # Check out a few values
>>> da[5:8, 5:8, 1, 1].values
array([[224.08959005, 224.41374427, 224.75945349],
       [224.09456429, 224.42037658, 224.76525676],
       [224.10036756, 224.42617985, 224.77106004]])
>>> # Mean temperature over entire array
>>> da.mean().load().item()
223.5083
__init__(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)

Methods

__init__(data[, cachestore, wrap_longitude, ...])

broadcast_coords(name)

Broadcast coordinates along other dimensions.

copy()

Create a copy of the current class.

downselect(bbox)

Downselect met data within spatial bounding box.

downselect_met(met, *[, longitude_buffer, ...])

Downselect met to encompass a spatiotemporal region of the data.

ensure_vars(vars[, raise_error])

Ensure variables exist in xr.Dataset.

from_coords(longitude, latitude, level, time)

Create a MetDataset containing a coordinate skeleton from coordinate arrays.

from_zarr(store, **kwargs)

Create a MetDataset from a path to a Zarr store.

get(key[, default_value])

Shortcut to data.get(k, v)() method.

load(hash[, cachestore, chunks])

Load saved intermediate from cachestore.

save(**kwargs)

Save intermediate to cachestore as netcdf.

standardize_variables(variables)

Standardize variables in-place.

to_vector([transfer_attrs])

Convert a MetDataset to a GeoVectorDataset by raveling data.

update([other])

Shortcut to data.update().

wrap_longitude()

Wrap longitude coordinates.

Attributes

attrs

Pass through to self.data.attrs.

coords

Get coordinates of underlying data coordinates.

dataset_attr

Look up the 'dataset' attribute with a custom error message.

dim_order

Default dimension order for DataArray or Dataset (x, y, z, t)

hash

Generate a unique hash for this met instance.

indexes

Low level access to underlying data indexes.

is_single_level

Check if instance contains "single level" or "surface level" data.

is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

is_zarr

Check if underlying data is sourced from a Zarr group.

product_attr

Look up the 'product' attribute with a custom error message.

provider_attr

Look up the 'provider' attribute with a custom error message.

shape

Return the shape of the dimensions.

size

Return the size of (each) array in underlying data.

variables

See indexes.

data

DataArray or Dataset

cachestore

Cache datastore to use for save() or load()

broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

copy()

Create a copy of the current class.

Returns:

MetDataset – MetDataset copy

data

DataArray or Dataset

property dataset_attr

Look up the ‘dataset’ attribute with a custom error message.

Returns:

str – Dataset of the data. If not one of ‘ERA5’, ‘HRES’, ‘IFS’, or ‘GFS’, a warning is issued.

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

MetBase – Return downselected data

ensure_vars(vars, raise_error=True)

Ensure variables exist in xr.Dataset.

Parameters:
  • vars (MetVariable | str | Sequence[MetVariable | str | list[MetVariable]]) – List of MetVariable (or string key), or individual MetVariable (or string key). If vars contains an element with a list[MetVariable], then only one variable in the list must be present in dataset.

  • raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

list[str] – List of met keys verified in MetDataset. Returns an empty list if any MetVariable is missing.

Raises:

KeyError – Raises when dataset does not contain variable in vars

classmethod from_coords(longitude, latitude, level, time)

Create a MetDataset containing a coordinate skeleton from coordinate arrays.

Parameters:
  • longitude, latitude (npt.ArrayLike | float) – Horizontal coordinates, in [\(\deg\)]

  • level (npt.ArrayLike | float) – Vertical coordinate, in [\(hPa\)]

  • time (npt.ArrayLike | np.datetime64,) – Temporal coordinates, in [\(UTC\)]. Will be sorted.

Returns:

MetDataset – MetDataset with no variables.

Examples

>>> # Create skeleton MetDataset
>>> longitude = np.arange(0, 10, 0.5)
>>> latitude = np.arange(0, 10, 0.5)
>>> level = [250, 300]
>>> time = np.datetime64("2019-01-01")
>>> met = MetDataset.from_coords(longitude, latitude, level, time)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 360B
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    *empty*
>>> met.shape
(20, 20, 2, 1)
>>> met.size
800
>>> # Fill it up with some constant data
>>> met["temperature"] = xr.DataArray(np.full(met.shape, 234.5), coords=met.coords)
>>> met["humidity"] = xr.DataArray(np.full(met.shape, 0.5), coords=met.coords)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 13kB
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    temperature   (longitude, latitude, level, time) float64 6kB 234.5 ... 234.5
    humidity      (longitude, latitude, level, time) float64 6kB 0.5 0.5 ... 0.5
>>> # Convert to a GeoVectorDataset
>>> vector = met.to_vector()
>>> vector.dataframe.head()
longitude  latitude  level       time  temperature  humidity
0        0.0       0.0  250.0 2019-01-01        234.5       0.5
1        0.0       0.0  300.0 2019-01-01        234.5       0.5
2        0.0       0.5  250.0 2019-01-01        234.5       0.5
3        0.0       0.5  300.0 2019-01-01        234.5       0.5
4        0.0       1.0  250.0 2019-01-01        234.5       0.5
classmethod from_zarr(store, **kwargs)

Create a MetDataset from a path to a Zarr store.

Parameters:
Returns:

MetDataset – MetDataset with data from Zarr store.

get(key, default_value=None)

Shortcut to data.get(k, v)() method.

Parameters:
  • key (str) – Key to get from data

  • default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values returned from data.get(key, default_value)

classmethod load(hash, cachestore=None, chunks=None)

Load saved intermediate from cachestore.

Parameters:
  • hash (str) – Saved hash to load.

  • cachestore (CacheStore, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.

  • chunks (dict[str: int], optional) – Chunks kwarg passed to xarray.open_mfdataset() when opening files.

Returns:

MetDataset – New MetDataArray with loaded data.

property product_attr

Look up the ‘product’ attribute with a custom error message.

Returns:

str – Product of the data. If not one of ‘forecast’, ‘ensemble’, or ‘reanalysis’, a warning is issued.

property provider_attr

Look up the ‘provider’ attribute with a custom error message.

Returns:

str – Provider of the data. If not one of ‘ECMWF’ or ‘NCEP’, a warning is issued.

save(**kwargs)

Save intermediate to cachestore as netcdf.

Load and restore using load().

Parameters:

**kwargs (Any) – Keyword arguments passed directly to xarray.Dataset.to_netcdf()

Returns:

list[str] – Returns filenames saved

property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

standardize_variables(variables)

Standardize variables in-place.

Parameters:

variables (Iterable[MetVariable]) – Data source variables

to_vector(transfer_attrs=True)

Convert a MetDataset to a GeoVectorDataset by raveling data.

If data is lazy, it will be loaded.

Parameters:

transfer_attrs (bool, optional) – Transfer attributes from data to output GeoVectorDataset. By default, True, meaning that attributes are transferred.

Returns:

GeoVectorDataset – Converted GeoVectorDataset. The variables on the returned instance include all of those on the input instance, plus the four core spatial temporal variables.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = "2022-03-01",  "2022-03-01T01"
>>> variables = ["air_temperature", "specific_humidity"]
>>> levels = [250, 200]
>>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels)
>>> met = era5.open_metdataset()
>>> met.to_vector(transfer_attrs=False)
GeoVectorDataset [6 keys x 4152960 length, 1 attributes]
    Keys: longitude, latitude, level, time, air_temperature, ..., specific_humidity
    Attributes:
    time                [2022-03-01 00:00:00, 2022-03-01 01:00:00]
    longitude           [-180.0, 179.75]
    latitude            [-90.0, 90.0]
    altitude            [10362.8, 11783.9]
    crs                 EPSG:4326
update(other=None, **kwargs)

Shortcut to data.update().

See xarray.Dataset.update() for reference.

Parameters:
  • other (MutableMapping) – Variables with which to update this dataset

  • **kwargs (Any) – Variables defined by keyword arguments. If a variable exists both in other and as a keyword argument, the keyword argument takes precedence.

See also

-

meth:xarray.Dataset.update

wrap_longitude()

Wrap longitude coordinates.

Returns:

MetDataset – Copy of MetDataset with wrapped longitude values. Returns copy of current MetDataset when longitude values are already wrapped