pycontrails.MetDataset¶
- class pycontrails.MetDataset(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)¶
Bases:
MetBase
Meteorological dataset with multiple variables.
Composition around xr.Dataset to enforce certain variables and dimensions for internal usage
- Parameters:
data (
xarray.Dataset
) –xarray.Dataset
containing meteorological variables and coordinatescachestore (
CacheStore
, optional) – Cache datastore for staging intermediates withsave()
. Defaults to None.wrap_longitude (
bool
, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval[-180, 180]
. Defaults to False.copy (
bool
, optional) – Copy data on construction. Defaults to True.attrs (
dict[str
,Any]
, optional) – Attributes to add todata.attrs
. Defaults to None. Generally, pycontrailsModels
may use the following attributes:provider
: Name of the data provider (e.g. “ECMWF”).dataset
: Name of the dataset (e.g. “ERA5”).product
: Name of the product type (e.g. “reanalysis”).
**attrs_kwargs (
Any
) – Keyword arguments to add todata.attrs
. Defaults to None.
Examples
>>> import numpy as np >>> import pandas as pd >>> import xarray as xr >>> from pycontrails.datalib.ecmwf import ERA5
>>> time = ("2022-03-01T00", "2022-03-01T02") >>> variables = ["air_temperature", "specific_humidity"] >>> pressure_levels = [200, 250, 300] >>> era5 = ERA5(time, variables, pressure_levels)
>>> # Open directly as `MetDataset` >>> met = era5.open_metdataset() >>> # Use `data` attribute to access `xarray` object >>> assert isinstance(met.data, xr.Dataset)
>>> # Alternatively, open with `xarray` and cast to `MetDataset` >>> ds = xr.open_mfdataset(era5._cachepaths) >>> met = MetDataset(ds)
>>> # Access sub-`DataArrays` >>> mda = met["t"] # `MetDataArray` instance, needed for interpolation operations >>> da = mda.data # Underlying `xarray` object
>>> # Check out a few values >>> da[5:8, 5:8, 1, 1].values array([[224.08959005, 224.41374427, 224.75945349], [224.09456429, 224.42037658, 224.76525676], [224.10036756, 224.42617985, 224.77106004]])
>>> # Mean temperature over entire array >>> da.mean().load().item() 223.5083
- __init__(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)¶
Methods
__init__
(data[, cachestore, wrap_longitude, ...])broadcast_coords
(name)Broadcast coordinates along other dimensions.
copy
()Create a shallow copy of the current class.
downselect
(bbox)Downselect met data within spatial bounding box.
downselect_met
(met, *[, longitude_buffer, ...])Downselect
met
to encompass a spatiotemporal region of the data.ensure_vars
(vars[, raise_error])Ensure variables exist in xr.Dataset.
from_coords
(longitude, latitude, level, time)Create a
MetDataset
containing a coordinate skeleton from coordinate arrays.from_zarr
(store, **kwargs)Create a
MetDataset
from a path to a Zarr store.get
(key[, default_value])Shortcut to
data.get(k, v)()
method.load
(hash[, cachestore, chunks])Load saved intermediate from
cachestore
.save
(**kwargs)Save intermediate to
cachestore
as netcdf.standardize_variables
(variables)Standardize variables in-place.
to_vector
([transfer_attrs])Convert a
MetDataset
to aGeoVectorDataset
by raveling data.update
([other])Shortcut to
data.update()
.Wrap longitude coordinates.
Attributes
Pass through to
self.data.attrs
.Get coordinates of underlying
data
coordinates.DataArray or Dataset
Look up the 'dataset' attribute with a custom error message.
Default dimension order for DataArray or Dataset (x, y, z, t)
Generate a unique hash for this met instance.
Low level access to underlying
data
indexes.Check if instance contains "single level" or "surface level" data.
Check if the longitude dimension covers the closed interval
[-180, 180]
.Check if underlying
data
is sourced from a Zarr group.Look up the 'product' attribute with a custom error message.
Look up the 'provider' attribute with a custom error message.
Return the shape of the dimensions.
Return the size of (each) array in underlying
data
.- property attrs¶
Pass through to
self.data.attrs
.
- broadcast_coords(name)¶
Broadcast coordinates along other dimensions.
- Parameters:
name (
str
) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.- Returns:
xarray.DataArray
– DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.
- property coords¶
Get coordinates of underlying
data
coordinates.Only return non-dimension coordinates.
See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates
- Returns:
dict[str
,np.ndarray]
– Dictionary of coordinates
- copy()¶
Create a shallow copy of the current class.
See
xarray.Dataset.copy()
for reference.- Returns:
Self
– Copy of the current class
- data¶
DataArray or Dataset
- property dataset_attr¶
Look up the ‘dataset’ attribute with a custom error message.
- Returns:
str
– Dataset of the data. If not one of ‘ERA5’, ‘HRES’, ‘IFS’, or ‘GFS’, a warning is issued.
- dim_order = ('longitude', 'latitude', 'level', 'time')¶
Default dimension order for DataArray or Dataset (x, y, z, t)
- downselect(bbox)¶
Downselect met data within spatial bounding box.
- Parameters:
bbox (
list[float]
) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].- Returns:
Self
– Return downselected data
- downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))¶
Downselect
met
to encompass a spatiotemporal region of the data.Warning
This method is analogous to
GeoVectorDataset.downselect_met()
. It does not change the instance data, but instead operates on themet
input. This method is different fromdownselect()
which operates on the instance data.Changed in version 0.54.5: Data is no longer copied when downselecting.
- Parameters:
met (
MetDataset | MetDataArray
) – MetDataset or MetDataArray to downselect.longitude_buffer (
tuple[float
,float]
, optional) – Extend longitude domain past bylongitude_buffer[0]
on the low side andlongitude_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
degrees.latitude_buffer (
tuple[float
,float]
, optional) – Extend latitude domain past bylatitude_buffer[0]
on the low side andlatitude_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
degrees.level_buffer (
tuple[float
,float]
, optional) – Extend level domain past bylevel_buffer[0]
on the low side andlevel_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
[\(hPa\)].time_buffer (
tuple[np.timedelta64
,np.timedelta64]
, optional) – Extend time domain past bytime_buffer[0]
on the low side andtime_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(np.timedelta64(0, "h"), np.timedelta64(0, "h"))
.
- Returns:
MetDataset | MetDataArray
– Copy of downselected MetDataset or MetDataArray.
- ensure_vars(vars, raise_error=True)¶
Ensure variables exist in xr.Dataset.
- Parameters:
vars (
MetVariable | str | Sequence[MetVariable | str | list[MetVariable]]
) – List of MetVariable (or string key), or individual MetVariable (or string key). Ifvars
contains an element with a list[MetVariable], then only one variable in the list must be present in dataset.raise_error (
bool
, optional) – Raise KeyError if data does not contain variables. Defaults to True.
- Returns:
list[str]
– List of met keys verified in MetDataset. Returns an empty list if any MetVariable is missing.- Raises:
KeyError – Raises when dataset does not contain variable in
vars
- classmethod from_coords(longitude, latitude, level, time)¶
Create a
MetDataset
containing a coordinate skeleton from coordinate arrays.- Parameters:
longitude, latitude (
npt.ArrayLike | float
) – Horizontal coordinates, in [\(\deg\)]level (
npt.ArrayLike | float
) – Vertical coordinate, in [\(hPa\)]time (
npt.ArrayLike | np.datetime64,
) – Temporal coordinates, in [\(UTC\)]. Will be sorted.
- Returns:
Self
– MetDataset with no variables.
Examples
>>> # Create skeleton MetDataset >>> longitude = np.arange(0, 10, 0.5) >>> latitude = np.arange(0, 10, 0.5) >>> level = [250, 300] >>> time = np.datetime64("2019-01-01") >>> met = MetDataset.from_coords(longitude, latitude, level, time) >>> met MetDataset with data: <xarray.Dataset> Size: 360B Dimensions: (longitude: 20, latitude: 20, level: 2, time: 1) Coordinates: * longitude (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * latitude (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * level (level) float64 16B 250.0 300.0 * time (time) datetime64[ns] 8B 2019-01-01 air_pressure (level) float32 8B 2.5e+04 3e+04 altitude (level) float32 8B 1.036e+04 9.164e+03 Data variables: *empty*
>>> met.shape (20, 20, 2, 1)
>>> met.size 800
>>> # Fill it up with some constant data >>> met["temperature"] = xr.DataArray(np.full(met.shape, 234.5), coords=met.coords) >>> met["humidity"] = xr.DataArray(np.full(met.shape, 0.5), coords=met.coords) >>> met MetDataset with data: <xarray.Dataset> Size: 13kB Dimensions: (longitude: 20, latitude: 20, level: 2, time: 1) Coordinates: * longitude (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * latitude (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5 * level (level) float64 16B 250.0 300.0 * time (time) datetime64[ns] 8B 2019-01-01 air_pressure (level) float32 8B 2.5e+04 3e+04 altitude (level) float32 8B 1.036e+04 9.164e+03 Data variables: temperature (longitude, latitude, level, time) float64 6kB 234.5 ... 234.5 humidity (longitude, latitude, level, time) float64 6kB 0.5 0.5 ... 0.5
>>> # Convert to a GeoVectorDataset >>> vector = met.to_vector() >>> vector.dataframe.head() longitude latitude level time temperature humidity 0 0.0 0.0 250.0 2019-01-01 234.5 0.5 1 0.0 0.0 300.0 2019-01-01 234.5 0.5 2 0.0 0.5 250.0 2019-01-01 234.5 0.5 3 0.0 0.5 300.0 2019-01-01 234.5 0.5 4 0.0 1.0 250.0 2019-01-01 234.5 0.5
- classmethod from_zarr(store, **kwargs)¶
Create a
MetDataset
from a path to a Zarr store.- Parameters:
store (
Any
) – Path to Zarr store. Passed intoxarray.open_zarr()
.**kwargs (
Any
) – Other keyword only arguments passed intoxarray.open_zarr()
.
- Returns:
Self
– MetDataset with data from Zarr store.
- get(key, default_value=None)¶
Shortcut to
data.get(k, v)()
method.
- property hash¶
Generate a unique hash for this met instance.
Note this is not as robust as it could be since repr cuts off.
- Returns:
str
– Unique hash for met instance (sha1)
- property indexes¶
Low level access to underlying
data
indexes.This method is typically is faster for accessing coordinate indexes.
Added in version 0.25.2.
- Returns:
dict[Hashable
,pd.Index]
– Dictionary of indexes.
Examples
>>> from pycontrails.datalib.ecmwf import ERA5 >>> times = (datetime(2022, 3, 1, 12), datetime(2022, 3, 1, 13)) >>> variables = "air_temperature", "specific_humidity" >>> levels = [200, 300] >>> era5 = ERA5(times, variables, levels) >>> mds = era5.open_metdataset() >>> mds.indexes["level"].to_numpy() array([200., 300.])
>>> mda = mds["air_temperature"] >>> mda.indexes["level"].to_numpy() array([200., 300.])
- property is_single_level¶
Check if instance contains “single level” or “surface level” data.
This method checks if
level
dimension contains a single value equal to -1, the pycontrails convention for surface only data.- Returns:
bool
– If instance contains single level data.
- property is_wrapped¶
Check if the longitude dimension covers the closed interval
[-180, 180]
.Assumes the longitude dimension is sorted (this is established by the
MetDataset
orMetDataArray
constructor).- Returns:
bool
– True if longitude coordinates cover[-180, 180]
- property is_zarr¶
Check if underlying
data
is sourced from a Zarr group.Implementation is very brittle, and may break as external libraries change.
Some
dask
intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.Added in version 0.26.0.
- Returns:
bool
– Ifdata
is based on a Zarr group.
- classmethod load(hash, cachestore=None, chunks=None)¶
Load saved intermediate from
cachestore
.- Parameters:
hash (
str
) – Saved hash to load.cachestore (
CacheStore
, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.chunks (
dict[str
:int]
, optional) – Chunks kwarg passed toxarray.open_mfdataset()
when opening files.
- Returns:
Self
– New MetDataArray with loaded data.
- property product_attr¶
Look up the ‘product’ attribute with a custom error message.
- Returns:
str
– Product of the data. If not one of ‘forecast’, ‘ensemble’, or ‘reanalysis’, a warning is issued.
- property provider_attr¶
Look up the ‘provider’ attribute with a custom error message.
- Returns:
str
– Provider of the data. If not one of ‘ECMWF’ or ‘NCEP’, a warning is issued.
- save(**kwargs)¶
Save intermediate to
cachestore
as netcdf.Load and restore using
load()
.- Parameters:
**kwargs (
Any
) – Keyword arguments passed directly toxarray.Dataset.to_netcdf()
- Returns:
list[str]
– Returns filenames saved
- property shape¶
Return the shape of the dimensions.
- property size¶
Return the size of (each) array in underlying
data
.- Returns:
int
– Total number of grid points in underlying data
- standardize_variables(variables)¶
Standardize variables in-place.
- Parameters:
variables (
Iterable[MetVariable]
) – Data source variables
See also
- to_vector(transfer_attrs=True)¶
Convert a
MetDataset
to aGeoVectorDataset
by raveling data.If
data
is lazy, it will be loaded.- Parameters:
transfer_attrs (
bool
, optional) – Transfer attributes fromdata
to outputGeoVectorDataset
. By default, True, meaning that attributes are transferred.- Returns:
GeoVectorDataset
– ConvertedGeoVectorDataset
. The variables on the returned instance include all of those on the input instance, plus the four core spatial temporal variables.
Examples
>>> from pycontrails.datalib.ecmwf import ERA5 >>> times = "2022-03-01", "2022-03-01T01" >>> variables = ["air_temperature", "specific_humidity"] >>> levels = [250, 200] >>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels) >>> met = era5.open_metdataset() >>> met.to_vector(transfer_attrs=False) GeoVectorDataset [6 keys x 4152960 length, 0 attributes] Keys: longitude, latitude, level, time, air_temperature, ..., specific_humidity Attributes: time [2022-03-01 00:00:00, 2022-03-01 01:00:00] longitude [-180.0, 179.75] latitude [-90.0, 90.0] altitude [10362.8, 11783.9]
- update(other=None, **kwargs)¶
Shortcut to
data.update()
.See
xarray.Dataset.update()
for reference.- Parameters:
other (
MutableMapping
) – Variables with which to update this dataset**kwargs (
Any
) – Variables defined by keyword arguments. If a variable exists both inother
and as a keyword argument, the keyword argument takes precedence.
See also
-
meth:xarray.Dataset.update
- wrap_longitude()¶
Wrap longitude coordinates.
- Returns:
Self
– Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped