pycontrails.core.met

Meteorology data models.

Functions

downselect(data, bbox)

Downselect xr.Dataset or xr.DataArray with spatial bounding box.

maybe_downselect_mds(big_mds, little_mds, t0, t1)

Possibly downselect big_mds in the time domain to cover [t0, t1].

originates_from_ecmwf(met)

Check if data appears to have originated from an ECMWF source.

shift_longitude(data[, bound])

Shift longitude values from any input domain to [bound, 360 + bound) domain.

standardize_variables(ds, variables)

Rename all variables in dataset from short name to standard name.

Classes

MetBase()

Abstract class for building Meteorology Data handling classes.

MetDataArray(data[, cachestore, ...])

Meteorological DataArray of single variable.

MetDataset(data[, cachestore, ...])

Meteorological dataset with multiple variables.

class pycontrails.core.met.MetBase

Bases: ABC, Generic[XArrayType]

Abstract class for building Meteorology Data handling classes.

All support here should be generic to work on xr.DataArray and xr.Dataset.

property attrs

Pass through to self.data.attrs.

abstract broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

cachestore

Cache datastore to use for save() or load()

property coords

Get coordinates of underlying data coordinates.

Only return non-dimension coordinates.

See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates

Returns:

dict[str, np.ndarray] – Dictionary of coordinates

copy()

Create a shallow copy of the current class.

See xarray.Dataset.copy() for reference.

Returns:

Self – Copy of the current class

data

DataArray or Dataset

dim_order = ('longitude', 'latitude', 'level', 'time')

Default dimension order for DataArray or Dataset (x, y, z, t)

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

Self – Return downselected data

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))

Downselect met to encompass a spatiotemporal region of the data.

Warning

This method is analogous to GeoVectorDataset.downselect_met(). It does not change the instance data, but instead operates on the met input. This method is different from downselect() which operates on the instance data.

Changed in version 0.54.5: Data is no longer copied when downselecting.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

property hash

Generate a unique hash for this met instance.

Note this is not as robust as it could be since repr cuts off.

Returns:

str – Unique hash for met instance (sha1)

property indexes

Low level access to underlying data indexes.

This method is typically is faster for accessing coordinate indexes.

Added in version 0.25.2.

Returns:

dict[Hashable, pd.Index] – Dictionary of indexes.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 13))
>>> variables = "air_temperature", "specific_humidity"
>>> levels = [200, 300]
>>> era5 = ERA5(times, variables, levels)
>>> mds = era5.open_metdataset()
>>> mds.indexes["level"].to_numpy()
array([200., 300.])
>>> mda = mds["air_temperature"]
>>> mda.indexes["level"].to_numpy()
array([200., 300.])
property is_single_level

Check if instance contains “single level” or “surface level” data.

This method checks if level dimension contains a single value equal to -1, the pycontrails convention for surface only data.

Returns:

bool – If instance contains single level data.

property is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

Assumes the longitude dimension is sorted (this is established by the MetDataset or MetDataArray constructor).

Returns:

bool – True if longitude coordinates cover [-180, 180]

property is_zarr

Check if underlying data is sourced from a Zarr group.

Implementation is very brittle, and may break as external libraries change.

Some dask intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.

Added in version 0.26.0.

Returns:

bool – If data is based on a Zarr group.

abstract property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

abstract property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

wrap_longitude()

Wrap longitude coordinates.

Returns:

Self – Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped

class pycontrails.core.met.MetDataArray(data, cachestore=None, wrap_longitude=False, copy=True, name=None)

Bases: MetBase

Meteorological DataArray of single variable.

Wrapper around xarray.DataArray to enforce certain variables and dimensions for internal usage.

Changed in version 0.54.4: Remove validate parameter. Validation is now always performed.

Parameters:
  • data (ArrayLike) – xr.DataArray or other array-like data source. When array-like input is provided, input **kwargs passed directly to xr.DataArray constructor.

  • cachestore (CacheStore, optional) – Cache datastore for staging intermediates with save(). Defaults to DiskCacheStore.

  • wrap_longitude (bool, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval [-180, 180]. Defaults to False.

  • copy (bool, optional) – Copy data parameter on construction, by default True. If data is lazy-loaded via dask, this parameter has no effect. If data is already loaded into memory, a copy of the data (rather than a view) may be created if True.

  • name (Hashable, optional) – Name of the data variable. If not specified, the name will be set to “met”.

Examples

>>> import numpy as np
>>> import xarray as xr
>>> rng = np.random.default_rng(seed=456)
>>> # Cook up random xarray object
>>> coords = {
...     "longitude": np.arange(-20, 20),
...     "latitude": np.arange(-30, 30),
...     "level": [220, 240, 260, 280],
...     "time": [np.datetime64("2021-08-01T12", "ns"), np.datetime64("2021-08-01T16", "ns")]
...     }
>>> da = xr.DataArray(rng.random((40, 60, 4, 2)), dims=coords.keys(), coords=coords)
>>> # Cast to `MetDataArray` in order to interpolate
>>> from pycontrails import MetDataArray
>>> mda = MetDataArray(da)
>>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13"))
array([0.52358215])
>>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13"), method='nearest')
array([0.4188465])
>>> da.sel(longitude=-11, latitude=6, level=240, time=np.datetime64("2021-08-01T12")).item()
0.41884649899766946
property attrs

Pass through to self.data.attrs.

property binary

Determine if all data is a binary value (0, 1).

Returns:

bool – True if all data values are binary value (0, 1)

broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

cachestore

Cache datastore to use for save() or load()

property coords

Get coordinates of underlying data coordinates.

Only return non-dimension coordinates.

See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates

Returns:

dict[str, np.ndarray] – Dictionary of coordinates

copy()

Create a shallow copy of the current class.

See xarray.Dataset.copy() for reference.

Returns:

Self – Copy of the current class

data

DataArray or Dataset

dim_order = ('longitude', 'latitude', 'level', 'time')

Default dimension order for DataArray or Dataset (x, y, z, t)

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

Self – Return downselected data

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))

Downselect met to encompass a spatiotemporal region of the data.

Warning

This method is analogous to GeoVectorDataset.downselect_met(). It does not change the instance data, but instead operates on the met input. This method is different from downselect() which operates on the instance data.

Changed in version 0.54.5: Data is no longer copied when downselecting.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

find_edges()

Find edges of regions.

Returns:

Self – MetDataArray with a binary field, 1 on the edge of the regions, 0 outside and inside the regions.

Raises:

NotImplementedError – If the instance is not binary.

property hash

Generate a unique hash for this met instance.

Note this is not as robust as it could be since repr cuts off.

Returns:

str – Unique hash for met instance (sha1)

property in_memory

Check if underlying data is loaded into memory.

This method uses protected attributes of underlying xarray objects, and may be subject to deprecation.

Changed in version 0.26.0: Rename from is_loaded to in_memory.

Returns:

bool – If underlying data exists as an np.ndarray in memory.

property indexes

Low level access to underlying data indexes.

This method is typically is faster for accessing coordinate indexes.

Added in version 0.25.2.

Returns:

dict[Hashable, pd.Index] – Dictionary of indexes.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 13))
>>> variables = "air_temperature", "specific_humidity"
>>> levels = [200, 300]
>>> era5 = ERA5(times, variables, levels)
>>> mds = era5.open_metdataset()
>>> mds.indexes["level"].to_numpy()
array([200., 300.])
>>> mda = mds["air_temperature"]
>>> mda.indexes["level"].to_numpy()
array([200., 300.])
interpolate(longitude, latitude, level, time, *, method='linear', bounds_error=False, fill_value=nan, localize=False, lowmem=False, indices=None, return_indices=False)

Interpolate values over underlying DataArray.

Zero dimensional coordinates are reshaped to 1D arrays.

If lowmem == False, method automatically loads underlying data into memory. Otherwise, method iterates through smaller subsets of data and releases subsets from memory once interpolation against each subset is finished.

If method == "nearest", the out array will have the same dtype as the underlying data.

If method == "linear", the out array will be promoted to the most precise dtype of:

  • underlying data

  • data.longitude

  • data.latitude

  • data.level

  • longitude

  • latitude

Added in version 0.24: This method can now handle singleton dimensions with method == "linear". Previously these degenerate dimensions caused nan values to be returned.

Parameters:
  • longitude (float | npt.NDArray[np.floating]) – Longitude values to interpolate. Assumed to be 0 or 1 dimensional.

  • latitude (float | npt.NDArray[np.floating]) – Latitude values to interpolate. Assumed to be 0 or 1 dimensional.

  • level (float | npt.NDArray[np.floating]) – Level values to interpolate. Assumed to be 0 or 1 dimensional.

  • time (np.datetime64 | npt.NDArray[np.datetime64]) – Time values to interpolate. Assumed to be 0 or 1 dimensional.

  • method (str, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Defaults to “linear”.

  • bounds_error (bool, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Defaults to False.

  • fill_value (float | np.float64, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Set to None to extrapolate outside the boundary when method is nearest. Defaults to np.nan.

  • localize (bool, optional) – Experimental. If True, downselect gridded data to smallest bounding box containing all points. By default False.

  • lowmem (bool, optional) – Experimental. If True, iterate through points binned by the time coordinate of the grided data, and downselect gridded data to the smallest bounding box containing each binned set of point before loading into memory. This can significantly reduce memory consumption with large numbers of points at the cost of increased runtime. By default False.

  • indices (tuple | None, optional) – Experimental. See interpolation.interp(). None by default.

  • return_indices (bool, optional) – Experimental. See interpolation.interp(). False by default. Note that values returned differ when lowmem=True and lowmem=False, so output should only be re-used in calls with the same lowmem value.

Returns:

numpy.ndarray – Interpolated values

See also

GeoVectorDataset.intersect_met()

Examples

>>> from datetime import datetime
>>> import numpy as np
>>> import pandas as pd
>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 15))
>>> variables = "air_temperature"
>>> levels = [200, 250, 300]
>>> era5 = ERA5(times, variables, levels)
>>> met = era5.open_metdataset()
>>> mda = met["air_temperature"]
>>> # Interpolation at a grid point agrees with value
>>> mda.interpolate(1, 2, 300, np.datetime64('2022-03-01T14:00'))
array([241.91972984])
>>> da = mda.data
>>> da.sel(longitude=1, latitude=2, level=300, time=np.datetime64('2022-03-01T14')).item()
241.9197298421629
>>> # Interpolation off grid
>>> mda.interpolate(1.1, 2.1, 290, np.datetime64('2022-03-01 13:10'))
array([239.83793798])
>>> # Interpolate along path
>>> longitude = np.linspace(1, 2, 10)
>>> latitude = np.linspace(2, 3, 10)
>>> level = np.linspace(200, 300, 10)
>>> time = pd.date_range("2022-03-01T14", periods=10, freq="5min")
>>> mda.interpolate(longitude, latitude, level, time)
array([220.44347694, 223.08900738, 225.74338924, 228.41642088,
       231.10858599, 233.54857391, 235.71504913, 237.86478872,
       239.99274623, 242.10792167])
>>> # Can easily switch to alternative low-memory implementation
>>> mda.interpolate(longitude, latitude, level, time, lowmem=True)
array([220.44347694, 223.08900738, 225.74338924, 228.41642088,
       231.10858599, 233.54857391, 235.71504913, 237.86478872,
       239.99274623, 242.10792167])
property is_single_level

Check if instance contains “single level” or “surface level” data.

This method checks if level dimension contains a single value equal to -1, the pycontrails convention for surface only data.

Returns:

bool – If instance contains single level data.

property is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

Assumes the longitude dimension is sorted (this is established by the MetDataset or MetDataArray constructor).

Returns:

bool – True if longitude coordinates cover [-180, 180]

property is_zarr

Check if underlying data is sourced from a Zarr group.

Implementation is very brittle, and may break as external libraries change.

Some dask intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.

Added in version 0.26.0.

Returns:

bool – If data is based on a Zarr group.

classmethod load(hash, cachestore=None, chunks=None)

Load saved intermediate from cachestore.

Parameters:
  • hash (str) – Saved hash to load.

  • cachestore (CacheStore, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.

  • chunks (dict[str, int], optional) – Chunks kwarg passed to xarray.open_mfdataset() when opening files.

Returns:

MetDataArray – New MetDataArray with loaded data.

property name

Return the DataArray name.

Returns:

Hashable – DataArray name

property proportion

Compute proportion of points with value 1.

Returns:

float – Proportion of points with value 1

Raises:

NotImplementedError – If instance does not contain binary data.

save(**kwargs)

Save intermediate to cachestore as netcdf.

Load and restore using load().

Parameters:

**kwargs (Any) – Keyword arguments passed directly to xarray.save_mfdataset()

Returns:

list[str] – Returns filenames of saved files

property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

to_polygon_feature(level=None, time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)

Create GeoJSON Feature artifact from spatial array on a single level and time slice.

Computed polygons always contain an exterior linear ring as defined by the GeoJSON Polygon specification <https://www.rfc-editor.org/rfc/rfc7946.html#section-3.1.6>. Polygons may also contain interior linear rings (holes). This method does not support nesting beyond the GeoJSON specification. See the pycontrails.core.polygon for additional polygon support.

Changed in version 0.25.12: Previous implementation include several additional parameters which have been removed:

New implementation includes new parameters previously lacking:

  • fill_value

  • min_area

  • include_altitude

Changed in version 0.38.0: Change default value of epsilon from 0.15 to 0.

Changed in version 0.41.0: Convert continuous fields to binary fields before computing polygons. The parameters max_area and epsilon are now expressed in terms of longitude/latitude units instead of pixels.

Parameters:
  • level (float, optional) – Level slice to create polygons. If the “level” coordinate is length 1, then the single level slice will be selected automatically.

  • time (datetime, optional) – Time slice to create polygons. If the “time” coordinate is length 1, then the single time slice will be selected automatically.

  • fill_value (float, optional) – Value used for filling missing data and for padding the underlying data array. Set to np.nan by default, which ensures that regions with missing data are never included in polygons.

  • iso_value (float, optional) – Value in field to create iso-surface. Defaults to the average of the min and max value of the array. (This is the same convention as used by skimage.)

  • min_area (float, optional) – Minimum area of each polygon. Polygons with area less than min_area are not included in the output. The unit of this parameter is in longitude/latitude degrees squared. Set to 0 to omit any polygon filtering based on a minimal area conditional. By default, 0.0.

  • epsilon (float, optional) – Control the extent to which the polygon is simplified. A value of 0 does not alter the geometry of the polygon. The unit of this parameter is in longitude/latitude degrees. By default, 0.0.

  • lower_bound (bool, optional) – Whether to use iso_value as a lower or upper bound on values in polygon interiors. By default, True.

  • precision (int, optional) – Number of decimal places to round coordinates to. If None, no rounding is performed.

  • interiors (bool, optional) – If True, include interior linear rings (holes) in the output. True by default.

  • convex_hull (bool, optional) – EXPERIMENTAL. If True, compute the convex hull of each polygon. Only implemented for depth=1. False by default. A warning is issued if the underlying algorithm fails to make valid polygons after computing the convex hull.

  • include_altitude (bool, optional) – If True, include the array altitude [\(m\)] as a z-coordinate in the GeoJSON output <https://www.rfc-editor.org/rfc/rfc7946#section-3.1.1>. False by default.

  • properties (dict, optional) – Additional properties to include in the GeoJSON output. By default, None.

Returns:

dict[str, Any] – Python representation of GeoJSON Feature with MultiPolygon geometry.

Notes

Cocip and CocipGrid set some quantities to 0 and other quantities to np.nan in regions where no contrails form. When computing polygons from Cocip or CocipGrid output, take care that the choice of fill_value correctly includes or excludes contrail-free regions. See the Cocip documentation for details about np.nan in model output.

See also

to_polyhedra(), polygons.find_multipolygons()

Examples

>>> from pprint import pprint
>>> from pycontrails.datalib.ecmwf import ERA5
>>> era5 = ERA5("2022-03-01", variables="air_temperature", pressure_levels=250)
>>> mda = era5.open_metdataset()["air_temperature"]
>>> mda.shape
(1440, 721, 1, 1)
>>> pprint(mda.to_polygon_feature(iso_value=239.5, precision=2, epsilon=0.1))
{'geometry': {'coordinates': [[[[167.88, -22.5],
                                [167.75, -22.38],
                                [167.62, -22.5],
                                [167.75, -22.62],
                                [167.88, -22.5]]],
                              [[[43.38, -33.5],
                                [43.5, -34.12],
                                [43.62, -33.5],
                                [43.5, -33.38],
                                [43.38, -33.5]]]],
              'type': 'MultiPolygon'},
 'properties': {},
 'type': 'Feature'}
to_polygon_feature_collection(time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)

Create GeoJSON FeatureCollection artifact from spatial array at time slice.

See the to_polygon_feature() method for a description of the parameters.

Returns:

dict[str, Any] – Python representation of GeoJSON FeatureCollection. This dictionary is comprised of individual GeoJON Features, one per self.data["level"].

to_polyhedra(*, time=None, iso_value=0.0, simplify_fraction=1.0, lower_bound=True, return_type='geojson', path=None, altitude_scale=1.0, output_vertex_normals=False, closed=True)

Create a collection of polyhedra from spatial array corresponding to a single time slice.

Parameters:
  • time (datetime, optional) – Time slice to create mesh. If the “time” coordinate is length 1, then the single time slice will be selected automatically.

  • iso_value (float, optional) – Value in field to create iso-surface. Defaults to 0.

  • simplify_fraction (float, optional) – Apply open3d simplify_quadric_decimation method to simplify the polyhedra geometry. This parameter must be in the half-open interval (0.0, 1.0]. Defaults to 1.0, corresponding to no reduction.

  • lower_bound (bool, optional) – Whether to use iso_value as a lower or upper bound on values in polyhedra interiors. By default, True.

  • return_type (str, optional) – Must be one of “geojson” or “mesh”. Defaults to “geojson”. If “geojson”, this method returns a dictionary representation of a geojson MultiPolygon object whose polygons are polyhedra faces. If “mesh”, this method returns an open3d TriangleMesh instance.

  • path (str, optional) – Output geojson or mesh to file. If return_type is “mesh”, see Open3D File I/O for Mesh for file type options.

  • altitude_scale (float, optional) – Rescale the altitude dimension of the mesh, [\(m\)]

  • output_vertex_normals (bool, optional) – If path is defined, write out vertex normals. Defaults to False.

  • closed (bool, optional) – If True, pad spatial array along all axes to ensure polyhedra are “closed”. This flag often gives rise to cleaner visualizations. Defaults to True.

Returns:

dict | open3d.geometry.TriangleMesh – Python representation of geojson object or Open3D Triangle Mesh depending on the return_type parameter.

Raises:

Notes

Uses the scikit-image Marching Cubes algorithm to reconstruct a surface from the point-cloud like arrays.

property values

Return underlying numpy array.

This methods loads data if it is not already in memory.

Returns:

numpy.ndarray – Underlying numpy array

wrap_longitude()

Wrap longitude coordinates.

Returns:

Self – Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped

class pycontrails.core.met.MetDataset(data, cachestore=None, wrap_longitude=False, copy=True, attrs=None, **attrs_kwargs)

Bases: MetBase

Meteorological dataset with multiple variables.

Composition around xr.Dataset to enforce certain variables and dimensions for internal usage

Parameters:
  • data (xarray.Dataset) – xarray.Dataset containing meteorological variables and coordinates

  • cachestore (CacheStore, optional) – Cache datastore for staging intermediates with save(). Defaults to None.

  • wrap_longitude (bool, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval [-180, 180]. Defaults to False.

  • copy (bool, optional) – Copy data on construction. Defaults to True.

  • attrs (dict[str, Any], optional) – Attributes to add to data.attrs. Defaults to None. Generally, pycontrails Models may use the following attributes:

    • provider: Name of the data provider (e.g. “ECMWF”).

    • dataset: Name of the dataset (e.g. “ERA5”).

    • product: Name of the product type (e.g. “reanalysis”).

  • **attrs_kwargs (Any) – Keyword arguments to add to data.attrs. Defaults to None.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> from pycontrails.datalib.ecmwf import ERA5
>>> time = ("2022-03-01T00", "2022-03-01T02")
>>> variables = ["air_temperature", "specific_humidity"]
>>> pressure_levels = [200, 250, 300]
>>> era5 = ERA5(time, variables, pressure_levels)
>>> # Open directly as `MetDataset`
>>> met = era5.open_metdataset()
>>> # Use `data` attribute to access `xarray` object
>>> assert isinstance(met.data, xr.Dataset)
>>> # Alternatively, open with `xarray` and cast to `MetDataset`
>>> ds = xr.open_mfdataset(era5._cachepaths)
>>> met = MetDataset(ds)
>>> # Access sub-`DataArrays`
>>> mda = met["t"]  # `MetDataArray` instance, needed for interpolation operations
>>> da = mda.data  # Underlying `xarray` object
>>> # Check out a few values
>>> da[5:8, 5:8, 1, 1].values
array([[224.08959005, 224.41374427, 224.75945349],
       [224.09456429, 224.42037658, 224.76525676],
       [224.10036756, 224.42617985, 224.77106004]])
>>> # Mean temperature over entire array
>>> da.mean().load().item()
223.5083
property attrs

Pass through to self.data.attrs.

broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

cachestore

Cache datastore to use for save() or load()

property coords

Get coordinates of underlying data coordinates.

Only return non-dimension coordinates.

See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates

Returns:

dict[str, np.ndarray] – Dictionary of coordinates

copy()

Create a shallow copy of the current class.

See xarray.Dataset.copy() for reference.

Returns:

Self – Copy of the current class

data

DataArray or Dataset

property dataset_attr

Look up the ‘dataset’ attribute with a custom error message.

Returns:

str – Dataset of the data. If not one of ‘ERA5’, ‘HRES’, ‘IFS’, or ‘GFS’, a warning is issued.

dim_order = ('longitude', 'latitude', 'level', 'time')

Default dimension order for DataArray or Dataset (x, y, z, t)

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

Self – Return downselected data

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))

Downselect met to encompass a spatiotemporal region of the data.

Warning

This method is analogous to GeoVectorDataset.downselect_met(). It does not change the instance data, but instead operates on the met input. This method is different from downselect() which operates on the instance data.

Changed in version 0.54.5: Data is no longer copied when downselecting.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

ensure_vars(vars, raise_error=True)

Ensure variables exist in xr.Dataset.

Parameters:
  • vars (MetVariable | str | Sequence[MetVariable | str | list[MetVariable]]) – List of MetVariable (or string key), or individual MetVariable (or string key). If vars contains an element with a list[MetVariable], then only one variable in the list must be present in dataset.

  • raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

list[str] – List of met keys verified in MetDataset. Returns an empty list if any MetVariable is missing.

Raises:

KeyError – Raises when dataset does not contain variable in vars

classmethod from_coords(longitude, latitude, level, time)

Create a MetDataset containing a coordinate skeleton from coordinate arrays.

Parameters:
  • longitude, latitude (npt.ArrayLike | float) – Horizontal coordinates, in [\(\deg\)]

  • level (npt.ArrayLike | float) – Vertical coordinate, in [\(hPa\)]

  • time (npt.ArrayLike | np.datetime64,) – Temporal coordinates, in [\(UTC\)]. Will be sorted.

Returns:

Self – MetDataset with no variables.

Examples

>>> # Create skeleton MetDataset
>>> longitude = np.arange(0, 10, 0.5)
>>> latitude = np.arange(0, 10, 0.5)
>>> level = [250, 300]
>>> time = np.datetime64("2019-01-01")
>>> met = MetDataset.from_coords(longitude, latitude, level, time)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 360B
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    *empty*
>>> met.shape
(20, 20, 2, 1)
>>> met.size
800
>>> # Fill it up with some constant data
>>> met["temperature"] = xr.DataArray(np.full(met.shape, 234.5), coords=met.coords)
>>> met["humidity"] = xr.DataArray(np.full(met.shape, 0.5), coords=met.coords)
>>> met
MetDataset with data:
<xarray.Dataset> Size: 13kB
Dimensions:       (longitude: 20, latitude: 20, level: 2, time: 1)
Coordinates:
  * longitude     (longitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * latitude      (latitude) float64 160B 0.0 0.5 1.0 1.5 ... 8.0 8.5 9.0 9.5
  * level         (level) float64 16B 250.0 300.0
  * time          (time) datetime64[ns] 8B 2019-01-01
    air_pressure  (level) float32 8B 2.5e+04 3e+04
    altitude      (level) float32 8B 1.036e+04 9.164e+03
Data variables:
    temperature   (longitude, latitude, level, time) float64 6kB 234.5 ... 234.5
    humidity      (longitude, latitude, level, time) float64 6kB 0.5 0.5 ... 0.5
>>> # Convert to a GeoVectorDataset
>>> vector = met.to_vector()
>>> vector.dataframe.head()
longitude  latitude  level       time  temperature  humidity
0        0.0       0.0  250.0 2019-01-01        234.5       0.5
1        0.0       0.0  300.0 2019-01-01        234.5       0.5
2        0.0       0.5  250.0 2019-01-01        234.5       0.5
3        0.0       0.5  300.0 2019-01-01        234.5       0.5
4        0.0       1.0  250.0 2019-01-01        234.5       0.5
classmethod from_zarr(store, **kwargs)

Create a MetDataset from a path to a Zarr store.

Parameters:
Returns:

Self – MetDataset with data from Zarr store.

get(key, default_value=None)

Shortcut to data.get(k, v)() method.

Parameters:
  • key (str) – Key to get from data

  • default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values returned from data.get(key, default_value)

property hash

Generate a unique hash for this met instance.

Note this is not as robust as it could be since repr cuts off.

Returns:

str – Unique hash for met instance (sha1)

property indexes

Low level access to underlying data indexes.

This method is typically is faster for accessing coordinate indexes.

Added in version 0.25.2.

Returns:

dict[Hashable, pd.Index] – Dictionary of indexes.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 13))
>>> variables = "air_temperature", "specific_humidity"
>>> levels = [200, 300]
>>> era5 = ERA5(times, variables, levels)
>>> mds = era5.open_metdataset()
>>> mds.indexes["level"].to_numpy()
array([200., 300.])
>>> mda = mds["air_temperature"]
>>> mda.indexes["level"].to_numpy()
array([200., 300.])
property is_single_level

Check if instance contains “single level” or “surface level” data.

This method checks if level dimension contains a single value equal to -1, the pycontrails convention for surface only data.

Returns:

bool – If instance contains single level data.

property is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

Assumes the longitude dimension is sorted (this is established by the MetDataset or MetDataArray constructor).

Returns:

bool – True if longitude coordinates cover [-180, 180]

property is_zarr

Check if underlying data is sourced from a Zarr group.

Implementation is very brittle, and may break as external libraries change.

Some dask intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.

Added in version 0.26.0.

Returns:

bool – If data is based on a Zarr group.

classmethod load(hash, cachestore=None, chunks=None)

Load saved intermediate from cachestore.

Parameters:
  • hash (str) – Saved hash to load.

  • cachestore (CacheStore, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.

  • chunks (dict[str: int], optional) – Chunks kwarg passed to xarray.open_mfdataset() when opening files.

Returns:

Self – New MetDataArray with loaded data.

property product_attr

Look up the ‘product’ attribute with a custom error message.

Returns:

str – Product of the data. If not one of ‘forecast’, ‘ensemble’, or ‘reanalysis’, a warning is issued.

property provider_attr

Look up the ‘provider’ attribute with a custom error message.

Returns:

str – Provider of the data. If not one of ‘ECMWF’ or ‘NCEP’, a warning is issued.

save(**kwargs)

Save intermediate to cachestore as netcdf.

Load and restore using load().

Parameters:

**kwargs (Any) – Keyword arguments passed directly to xarray.Dataset.to_netcdf()

Returns:

list[str] – Returns filenames saved

property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

standardize_variables(variables)

Standardize variables in-place.

Parameters:

variables (Iterable[MetVariable]) – Data source variables

to_vector(transfer_attrs=True)

Convert a MetDataset to a GeoVectorDataset by raveling data.

If data is lazy, it will be loaded.

Parameters:

transfer_attrs (bool, optional) – Transfer attributes from data to output GeoVectorDataset. By default, True, meaning that attributes are transferred.

Returns:

GeoVectorDataset – Converted GeoVectorDataset. The variables on the returned instance include all of those on the input instance, plus the four core spatial temporal variables.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = "2022-03-01",  "2022-03-01T01"
>>> variables = ["air_temperature", "specific_humidity"]
>>> levels = [250, 200]
>>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels)
>>> met = era5.open_metdataset()
>>> met.to_vector(transfer_attrs=False)
GeoVectorDataset [6 keys x 4152960 length, 0 attributes]
    Keys: longitude, latitude, level, time, air_temperature, ..., specific_humidity
    Attributes:
    time                [2022-03-01 00:00:00, 2022-03-01 01:00:00]
    longitude           [-180.0, 179.75]
    latitude            [-90.0, 90.0]
    altitude            [10362.8, 11783.9]
update(other=None, **kwargs)

Shortcut to data.update().

See xarray.Dataset.update() for reference.

Parameters:
  • other (MutableMapping) – Variables with which to update this dataset

  • **kwargs (Any) – Variables defined by keyword arguments. If a variable exists both in other and as a keyword argument, the keyword argument takes precedence.

See also

-

meth:xarray.Dataset.update

wrap_longitude()

Wrap longitude coordinates.

Returns:

Self – Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped

pycontrails.core.met.downselect(data, bbox)

Downselect xr.Dataset or xr.DataArray with spatial bounding box.

Parameters:
  • data (XArrayType) – xr.Dataset or xr.DataArray to downselect

  • bbox (tuple[float, ]) – Tuple of coordinates defining a spatial bounding box in WGS84 coordinates.

    • For 2D queries, bbox takes the form (west, south, east, north)

    • For 3D queries, bbox takes the form (west, south, min-level, east, north, max-level)

    with level defined in [\(hPa\)].

Returns:

XArrayType – Downselected xr.Dataset or xr.DataArray

Raises:

ValueError – If parameter bbox has wrong length.

pycontrails.core.met.maybe_downselect_mds(big_mds, little_mds, t0, t1)

Possibly downselect big_mds in the time domain to cover [t0, t1].

If possible, little_mds is recycled to avoid re-loading data.

This implementation assumes t0 <= t1, but this is not enforced.

If little_mds already covers the time range, it is returned as-is.

If big_mds doesn’t cover the time range, no error is raised.

Parameters:
  • big_mds (MetDataset) – Larger MetDataset

  • little_mds (MetDataset | None) – Smaller MetDataset. This is assumed to be a subset of big_mds, though the implementation may work if this is not the case.

  • t0, t1 (numpy.datetime64) – Time range to cover

Returns:

MetDataset – MetDataset covering the time range [t0, t1] comprised of data from little_mds when possible, otherwise from big_mds.

pycontrails.core.met.originates_from_ecmwf(met)

Check if data appears to have originated from an ECMWF source.

Added in version 0.27.0: Experimental. Implementation is brittle.

Parameters:

met (MetDataset | MetDataArray) – Dataset or array to inspect.

Returns:

bool – True if data appears to be derived from an ECMWF source.

See also

-

class:ERA5

-

class:HRES

pycontrails.core.met.shift_longitude(data, bound=-180.0)

Shift longitude values from any input domain to [bound, 360 + bound) domain.

Sorts data by ascending longitude values.

Parameters:
  • data (XArrayType) – xr.Dataset or xr.DataArray with longitude dimension

  • bound (float, optional) – Lower bound of the domain. Output domain will be [bound, 360 + bound). Defaults to -180, which results in longitude domain [-180, 180).

Returns:

XArrayTypexr.Dataset or xr.DataArray with longitude values on [a, 360 + a).

pycontrails.core.met.standardize_variables(ds, variables)

Rename all variables in dataset from short name to standard name.

This function does not change any variables in ds that are not found in variables.

When there are multiple variables with the same short name, the last one is used.

Parameters:
  • ds (DatasetType) – An xr.Dataset or MetDataset. When a MetDataset is passed, the underlying xr.Dataset is modified in place.

  • variables (Iterable[MetVariable]) – Data source variables

Returns:

DatasetType – Dataset with variables renamed to standard names