pycontrails.MetDataset¶
- class pycontrails.MetDataset(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)¶
Bases:
MetBase
Meteorological dataset with multiple variables.
Composition around xr.Dataset to enforce certain variables and dimensions for internal usage
- Parameters:
data (
xarray.Dataset
) –xarray.Dataset
containing meteorological variables and coordinatescachestore (
CacheStore
, optional) – Cache datastore for staging intermediates withsave()
. Defaults to None.wrap_longitude (
bool
, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval[-180, 180]
. Defaults to False.copy (
bool
, optional) – Copy data on construction. Defaults to True.attrs (
dict[str
,Any]
, optional) – Attributes to add todata.attrs
. Defaults to None. Generally, pycontrailsModels
may use the following attributes:provider
: Name of the data provider (e.g. “ECMWF”).dataset
: Name of the dataset (e.g. “ERA5”).product
: Name of the product type (e.g. “reanalysis”).
**attrs_kwargs (
Any
) – Keyword arguments to add todata.attrs
. Defaults to None.
Examples
>>> import numpy as np >>> import pandas as pd >>> import xarray as xr >>> from pycontrails.datalib.ecmwf import ERA5
>>> time = ("2022-03-01T00", "2022-03-01T02") >>> variables = ["air_temperature", "specific_humidity"] >>> pressure_levels = [200, 250, 300] >>> era5 = ERA5(time, variables, pressure_levels)
>>> # Open directly as `MetDataset` >>> met = era5.open_metdataset() >>> # Use `data` attribute to access `xarray` object >>> assert isinstance(met.data, xr.Dataset)
>>> # Alternatively, open with `xarray` and cast to `MetDataset` >>> ds = xr.open_mfdataset(era5._cachepaths) >>> met = MetDataset(ds)
>>> # Access sub-`DataArrays` >>> mda = met["t"] # `MetDataArray` instance, needed for interpolation operations >>> da = mda.data # Underlying `xarray` object
>>> # Check out a few values >>> da[5:8, 5:8, 1, 1].values array([[224.08959005, 224.41374427, 224.75945349], [224.09456429, 224.42037658, 224.76525676], [224.10036756, 224.42617985, 224.77106004]])
>>> # Mean temperature over entire array >>> da.mean().load().item() 223.5083
- __init__(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)¶
Methods
__init__
(data[, cachestore, wrap_longitude, ...])broadcast_coords
(name)Broadcast coordinates along other dimensions.
copy
()Create a copy of the current class.
downselect
(bbox)Downselect met data within spatial bounding box.
downselect_met
(met, *[, longitude_buffer, ...])Downselect
met
to encompass a spatiotemporal region of the data.ensure_vars
(vars[, raise_error])Ensure variables exist in xr.Dataset.
from_coords
(longitude, latitude, level, time)Create a
MetDataset
containing a coordinate skeleton from coordinate arrays.from_zarr
(store, **kwargs)Create a
MetDataset
from a path to a Zarr store.get
(key[, default_value])Shortcut to
data.get(k, v)()
method.load
(hash[, cachestore, chunks])Load saved intermediate from
cachestore
.save
(**kwargs)Save intermediate to
cachestore
as netcdf.standardize_variables
(variables)Standardize variables in-place.
to_vector
([transfer_attrs])Convert a
MetDataset
to aGeoVectorDataset
by raveling data.update
([other])Shortcut to
data.update()
.Wrap longitude coordinates.
Attributes
attrs
Pass through to
self.data.attrs
.coords
Get coordinates of underlying
data
coordinates.Look up the 'dataset' attribute with a custom error message.
dim_order
Default dimension order for DataArray or Dataset (x, y, z, t)
hash
Generate a unique hash for this met instance.
indexes
Low level access to underlying
data
indexes.is_single_level
Check if instance contains "single level" or "surface level" data.
is_wrapped
Check if the longitude dimension covers the closed interval
[-180, 180]
.is_zarr
Check if underlying
data
is sourced from a Zarr group.Look up the 'product' attribute with a custom error message.
Look up the 'provider' attribute with a custom error message.
Return the shape of the dimensions.
Return the size of (each) array in underlying
data
.DataArray or Dataset
cachestore
- broadcast_coords(name)¶
Broadcast coordinates along other dimensions.
- Parameters:
name (
str
) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.- Returns:
xarray.DataArray
– DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.
- copy()¶
Create a copy of the current class.
- Returns:
MetDataset
– MetDataset copy
- data¶
DataArray or Dataset
- property dataset_attr¶
Look up the ‘dataset’ attribute with a custom error message.
- Returns:
str
– Dataset of the data. If not one of ‘ERA5’, ‘HRES’, ‘IFS’, or ‘GFS’, a warning is issued.
- downselect(bbox)¶
Downselect met data within spatial bounding box.
- Parameters:
bbox (
list[float]
) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].- Returns:
MetBase
– Return downselected data
- ensure_vars(vars, raise_error=True)¶
Ensure variables exist in xr.Dataset.
- Parameters:
vars (
MetVariable | str | Sequence[MetVariable | str | list[MetVariable]]
) – List of MetVariable (or string key), or individual MetVariable (or string key). Ifvars
contains an element with a list[MetVariable], then only one variable in the list must be present in dataset.raise_error (
bool
, optional) – Raise KeyError if data does not contain variables. Defaults to True.
- Returns:
list[str]
– List of met keys verified in MetDataset. Returns an empty list if any MetVariable is missing.- Raises:
KeyError – Raises when dataset does not contain variable in
vars
- classmethod from_coords(longitude, latitude, level, time)¶
Create a
MetDataset
containing a coordinate skeleton from coordinate arrays.- Parameters:
longitude, latitude (
npt.ArrayLike | float
) – Horizontal coordinates, in [\(\deg\)]level (
npt.ArrayLike | float
) – Vertical coordinate, in [\(hPa\)]time (
npt.ArrayLike | np.datetime64,
) – Temporal coordinates, in [\(UTC\)]. Will be sorted.
- Returns:
Self
– MetDataset with no variables.
Examples
>>> # Create skeleton MetDataset >>> longitude = np.arange(0, 10, 0.5) >>> latitude = np.arange(0, 10, 0.5) >>> level = [250, 300] >>> time = np.datetime64("2019-01-01") >>> met = MetDataset.from_coords(longitude, latitude, level, time) >>> met MetDataset with data: <xarray.Dataset> Size: 360B Dimensions: (longitude: 20, latitude: 20, level: 2, time: 1) Coordinates: * longitude (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * latitude (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * level (level) float64 16B 250.0 300.0 * time (time) datetime64[ns] 8B 2019-01-01 air_pressure (level) float32 8B 2.5e+04 3e+04 altitude (level) float32 8B 1.036e+04 9.164e+03 Data variables: *empty*
>>> met.shape (20, 20, 2, 1)
>>> met.size 800
>>> # Fill it up with some constant data >>> met["temperature"] = xr.DataArray(np.full(met.shape, 234.5), coords=met.coords) >>> met["humidity"] = xr.DataArray(np.full(met.shape, 0.5), coords=met.coords) >>> met MetDataset with data: <xarray.Dataset> Size: 13kB Dimensions: (longitude: 20, latitude: 20, level: 2, time: 1) Coordinates: * longitude (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * latitude (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * level (level) float64 16B 250.0 300.0 * time (time) datetime64[ns] 8B 2019-01-01 air_pressure (level) float32 8B 2.5e+04 3e+04 altitude (level) float32 8B 1.036e+04 9.164e+03 Data variables: temperature (longitude, latitude, level, time) float64 6kB 234.5 ... 234.5 humidity (longitude, latitude, level, time) float64 6kB 0.5 0.5 ... 0.5
>>> # Convert to a GeoVectorDataset >>> vector = met.to_vector() >>> vector.dataframe.head() longitude latitude level time temperature humidity 0 0.0 0.0 250.0 2019-01-01 234.5 0.5 1 0.0 0.0 300.0 2019-01-01 234.5 0.5 2 0.0 0.5 250.0 2019-01-01 234.5 0.5 3 0.0 0.5 300.0 2019-01-01 234.5 0.5 4 0.0 1.0 250.0 2019-01-01 234.5 0.5
- classmethod from_zarr(store, **kwargs)¶
Create a
MetDataset
from a path to a Zarr store.- Parameters:
store (
Any
) – Path to Zarr store. Passed intoxarray.open_zarr()
.**kwargs (
Any
) – Other keyword only arguments passed intoxarray.open_zarr()
.
- Returns:
Self
– MetDataset with data from Zarr store.
- get(key, default_value=None)¶
Shortcut to
data.get(k, v)()
method.
- classmethod load(hash, cachestore=None, chunks=None)¶
Load saved intermediate from
cachestore
.- Parameters:
hash (
str
) – Saved hash to load.cachestore (
CacheStore
, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.chunks (
dict[str
:int]
, optional) – Chunks kwarg passed toxarray.open_mfdataset()
when opening files.
- Returns:
Self
– New MetDataArray with loaded data.
- property product_attr¶
Look up the ‘product’ attribute with a custom error message.
- Returns:
str
– Product of the data. If not one of ‘forecast’, ‘ensemble’, or ‘reanalysis’, a warning is issued.
- property provider_attr¶
Look up the ‘provider’ attribute with a custom error message.
- Returns:
str
– Provider of the data. If not one of ‘ECMWF’ or ‘NCEP’, a warning is issued.
- save(**kwargs)¶
Save intermediate to
cachestore
as netcdf.Load and restore using
load()
.- Parameters:
**kwargs (
Any
) – Keyword arguments passed directly toxarray.Dataset.to_netcdf()
- Returns:
list[str]
– Returns filenames saved
- property shape¶
Return the shape of the dimensions.
- property size¶
Return the size of (each) array in underlying
data
.- Returns:
int
– Total number of grid points in underlying data
- standardize_variables(variables)¶
Standardize variables in-place.
- Parameters:
variables (
Iterable[MetVariable]
) – Data source variables
See also
- to_vector(transfer_attrs=True)¶
Convert a
MetDataset
to aGeoVectorDataset
by raveling data.If
data
is lazy, it will be loaded.- Parameters:
transfer_attrs (
bool
, optional) – Transfer attributes fromdata
to outputGeoVectorDataset
. By default, True, meaning that attributes are transferred.- Returns:
GeoVectorDataset
– ConvertedGeoVectorDataset
. The variables on the returned instance include all of those on the input instance, plus the four core spatial temporal variables.
Examples
>>> from pycontrails.datalib.ecmwf import ERA5 >>> times = "2022-03-01", "2022-03-01T01" >>> variables = ["air_temperature", "specific_humidity"] >>> levels = [250, 200] >>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels) >>> met = era5.open_metdataset() >>> met.to_vector(transfer_attrs=False) GeoVectorDataset [6 keys x 4152960 length, 0 attributes] Keys: longitude, latitude, level, time, air_temperature, ..., specific_humidity Attributes: time [2022-03-01 00:00:00, 2022-03-01 01:00:00] longitude [-180.0, 179.75] latitude [-90.0, 90.0] altitude [10362.8, 11783.9]
- update(other=None, **kwargs)¶
Shortcut to
data.update()
.See
xarray.Dataset.update()
for reference.- Parameters:
other (
MutableMapping
) – Variables with which to update this dataset**kwargs (
Any
) – Variables defined by keyword arguments. If a variable exists both inother
and as a keyword argument, the keyword argument takes precedence.
See also
-
meth:xarray.Dataset.update
- wrap_longitude()¶
Wrap longitude coordinates.
- Returns:
MetDataset
– Copy of MetDataset with wrapped longitude values. Returns copy of current MetDataset when longitude values are already wrapped