pycontrails.MetDataArray

class pycontrails.MetDataArray(data, cachestore=None, wrap_longitude=False, copy=True, name=None)

Bases: MetBase

Meteorological DataArray of single variable.

Wrapper around xarray.DataArray to enforce certain variables and dimensions for internal usage.

Changed in version 0.54.4: Remove validate parameter. Validation is now always performed.

Parameters:
  • data (ArrayLike) – xr.DataArray or other array-like data source. When array-like input is provided, input **kwargs passed directly to xr.DataArray constructor.

  • cachestore (CacheStore, optional) – Cache datastore for staging intermediates with save(). Defaults to DiskCacheStore.

  • wrap_longitude (bool, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval [-180, 180]. Defaults to False.

  • copy (bool, optional) – Copy data parameter on construction, by default True. If data is lazy-loaded via dask, this parameter has no effect. If data is already loaded into memory, a copy of the data (rather than a view) may be created if True.

  • name (Hashable, optional) – Name of the data variable. If not specified, the name will be set to “met”.

Examples

>>> import numpy as np
>>> import xarray as xr
>>> rng = np.random.default_rng(seed=456)
>>> # Cook up random xarray object
>>> coords = {
...     "longitude": np.arange(-20, 20),
...     "latitude": np.arange(-30, 30),
...     "level": [220, 240, 260, 280],
...     "time": [np.datetime64("2021-08-01T12", "ns"), np.datetime64("2021-08-01T16", "ns")]
...     }
>>> da = xr.DataArray(rng.random((40, 60, 4, 2)), dims=coords.keys(), coords=coords)
>>> # Cast to `MetDataArray` in order to interpolate
>>> from pycontrails import MetDataArray
>>> mda = MetDataArray(da)
>>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13"))
array([0.52358215])
>>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13"), method='nearest')
array([0.4188465])
>>> da.sel(longitude=-11, latitude=6, level=240, time=np.datetime64("2021-08-01T12")).item()
0.41884649899766946
__init__(data, cachestore=None, wrap_longitude=False, copy=True, name=None)

Methods

__init__(data[, cachestore, wrap_longitude, ...])

broadcast_coords(name)

Broadcast coordinates along other dimensions.

copy()

Create a shallow copy of the current class.

downselect(bbox)

Downselect met data within spatial bounding box.

downselect_met(met, *[, longitude_buffer, ...])

Downselect met to encompass a spatiotemporal region of the data.

find_edges()

Find edges of regions.

interpolate(longitude, latitude, level, time, *)

Interpolate values over underlying DataArray.

load(hash[, cachestore, chunks])

Load saved intermediate from cachestore.

save(**kwargs)

Save intermediate to cachestore as netcdf.

to_polygon_feature([level, time, ...])

Create GeoJSON Feature artifact from spatial array on a single level and time slice.

to_polygon_feature_collection([time, ...])

Create GeoJSON FeatureCollection artifact from spatial array at time slice.

to_polyhedra(*[, time, iso_value, ...])

Create a collection of polyhedra from spatial array corresponding to a single time slice.

wrap_longitude()

Wrap longitude coordinates.

Attributes

attrs

Pass through to self.data.attrs.

binary

Determine if all data is a binary value (0, 1).

cachestore

Cache datastore to use for save() or load()

coords

Get coordinates of underlying data coordinates.

data

DataArray or Dataset

dim_order

Default dimension order for DataArray or Dataset (x, y, z, t)

hash

Generate a unique hash for this met instance.

in_memory

Check if underlying data is loaded into memory.

indexes

Low level access to underlying data indexes.

is_single_level

Check if instance contains "single level" or "surface level" data.

is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

is_zarr

Check if underlying data is sourced from a Zarr group.

name

Return the DataArray name.

proportion

Compute proportion of points with value 1.

shape

Return the shape of the dimensions.

size

Return the size of (each) array in underlying data.

values

Return underlying numpy array.

property attrs

Pass through to self.data.attrs.

property binary

Determine if all data is a binary value (0, 1).

Returns:

bool – True if all data values are binary value (0, 1)

broadcast_coords(name)

Broadcast coordinates along other dimensions.

Parameters:

name (str) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.

Returns:

xarray.DataArray – DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.

cachestore

Cache datastore to use for save() or load()

property coords

Get coordinates of underlying data coordinates.

Only return non-dimension coordinates.

See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates

Returns:

dict[str, np.ndarray] – Dictionary of coordinates

copy()

Create a shallow copy of the current class.

See xarray.Dataset.copy() for reference.

Returns:

Self – Copy of the current class

data

DataArray or Dataset

dim_order = ('longitude', 'latitude', 'level', 'time')

Default dimension order for DataArray or Dataset (x, y, z, t)

downselect(bbox)

Downselect met data within spatial bounding box.

Parameters:

bbox (list[float]) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].

Returns:

Self – Return downselected data

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))

Downselect met to encompass a spatiotemporal region of the data.

Warning

This method is analogous to GeoVectorDataset.downselect_met(). It does not change the instance data, but instead operates on the met input. This method is different from downselect() which operates on the instance data.

Changed in version 0.54.5: Data is no longer copied when downselecting.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

find_edges()

Find edges of regions.

Returns:

Self – MetDataArray with a binary field, 1 on the edge of the regions, 0 outside and inside the regions.

Raises:

NotImplementedError – If the instance is not binary.

property hash

Generate a unique hash for this met instance.

Note this is not as robust as it could be since repr cuts off.

Returns:

str – Unique hash for met instance (sha1)

property in_memory

Check if underlying data is loaded into memory.

This method uses protected attributes of underlying xarray objects, and may be subject to deprecation.

Changed in version 0.26.0: Rename from is_loaded to in_memory.

Returns:

bool – If underlying data exists as an np.ndarray in memory.

property indexes

Low level access to underlying data indexes.

This method is typically is faster for accessing coordinate indexes.

Added in version 0.25.2.

Returns:

dict[Hashable, pd.Index] – Dictionary of indexes.

Examples

>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 13))
>>> variables = "air_temperature", "specific_humidity"
>>> levels = [200, 300]
>>> era5 = ERA5(times, variables, levels)
>>> mds = era5.open_metdataset()
>>> mds.indexes["level"].to_numpy()
array([200., 300.])
>>> mda = mds["air_temperature"]
>>> mda.indexes["level"].to_numpy()
array([200., 300.])
interpolate(longitude, latitude, level, time, *, method='linear', bounds_error=False, fill_value=nan, localize=False, lowmem=False, indices=None, return_indices=False)

Interpolate values over underlying DataArray.

Zero dimensional coordinates are reshaped to 1D arrays.

If lowmem == False, method automatically loads underlying data into memory. Otherwise, method iterates through smaller subsets of data and releases subsets from memory once interpolation against each subset is finished.

If method == "nearest", the out array will have the same dtype as the underlying data.

If method == "linear", the out array will be promoted to the most precise dtype of:

  • underlying data

  • data.longitude

  • data.latitude

  • data.level

  • longitude

  • latitude

Added in version 0.24: This method can now handle singleton dimensions with method == "linear". Previously these degenerate dimensions caused nan values to be returned.

Parameters:
  • longitude (float | npt.NDArray[np.floating]) – Longitude values to interpolate. Assumed to be 0 or 1 dimensional.

  • latitude (float | npt.NDArray[np.floating]) – Latitude values to interpolate. Assumed to be 0 or 1 dimensional.

  • level (float | npt.NDArray[np.floating]) – Level values to interpolate. Assumed to be 0 or 1 dimensional.

  • time (np.datetime64 | npt.NDArray[np.datetime64]) – Time values to interpolate. Assumed to be 0 or 1 dimensional.

  • method (str, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Defaults to “linear”.

  • bounds_error (bool, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Defaults to False.

  • fill_value (float | np.float64, optional) – Additional keyword arguments to pass to scipy.interpolate.RegularGridInterpolator. Set to None to extrapolate outside the boundary when method is nearest. Defaults to np.nan.

  • localize (bool, optional) – Experimental. If True, downselect gridded data to smallest bounding box containing all points. By default False.

  • lowmem (bool, optional) – Experimental. If True, iterate through points binned by the time coordinate of the grided data, and downselect gridded data to the smallest bounding box containing each binned set of point before loading into memory. This can significantly reduce memory consumption with large numbers of points at the cost of increased runtime. By default False.

  • indices (tuple | None, optional) – Experimental. See interpolation.interp(). None by default.

  • return_indices (bool, optional) – Experimental. See interpolation.interp(). False by default. Note that values returned differ when lowmem=True and lowmem=False, so output should only be re-used in calls with the same lowmem value.

Returns:

numpy.ndarray – Interpolated values

Examples

>>> from datetime import datetime
>>> import numpy as np
>>> import pandas as pd
>>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12),  datetime(2022, 3, 1, 15))
>>> variables = "air_temperature"
>>> levels = [200, 250, 300]
>>> era5 = ERA5(times, variables, levels)
>>> met = era5.open_metdataset()
>>> mda = met["air_temperature"]
>>> # Interpolation at a grid point agrees with value
>>> mda.interpolate(1, 2, 300, np.datetime64('2022-03-01T14:00'))
array([241.91972984])
>>> da = mda.data
>>> da.sel(longitude=1, latitude=2, level=300, time=np.datetime64('2022-03-01T14')).item()
241.9197298421629
>>> # Interpolation off grid
>>> mda.interpolate(1.1, 2.1, 290, np.datetime64('2022-03-01 13:10'))
array([239.83793798])
>>> # Interpolate along path
>>> longitude = np.linspace(1, 2, 10)
>>> latitude = np.linspace(2, 3, 10)
>>> level = np.linspace(200, 300, 10)
>>> time = pd.date_range("2022-03-01T14", periods=10, freq="5min")
>>> mda.interpolate(longitude, latitude, level, time)
array([220.44347694, 223.08900738, 225.74338924, 228.41642088,
       231.10858599, 233.54857391, 235.71504913, 237.86478872,
       239.99274623, 242.10792167])
>>> # Can easily switch to alternative low-memory implementation
>>> mda.interpolate(longitude, latitude, level, time, lowmem=True)
array([220.44347694, 223.08900738, 225.74338924, 228.41642088,
       231.10858599, 233.54857391, 235.71504913, 237.86478872,
       239.99274623, 242.10792167])
property is_single_level

Check if instance contains “single level” or “surface level” data.

This method checks if level dimension contains a single value equal to -1, the pycontrails convention for surface only data.

Returns:

bool – If instance contains single level data.

property is_wrapped

Check if the longitude dimension covers the closed interval [-180, 180].

Assumes the longitude dimension is sorted (this is established by the MetDataset or MetDataArray constructor).

Returns:

bool – True if longitude coordinates cover [-180, 180]

property is_zarr

Check if underlying data is sourced from a Zarr group.

Implementation is very brittle, and may break as external libraries change.

Some dask intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.

Added in version 0.26.0.

Returns:

bool – If data is based on a Zarr group.

classmethod load(hash, cachestore=None, chunks=None)

Load saved intermediate from cachestore.

Parameters:
  • hash (str) – Saved hash to load.

  • cachestore (CacheStore, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.

  • chunks (dict[str, int], optional) – Chunks kwarg passed to xarray.open_mfdataset() when opening files.

Returns:

MetDataArray – New MetDataArray with loaded data.

property name

Return the DataArray name.

Returns:

Hashable – DataArray name

property proportion

Compute proportion of points with value 1.

Returns:

float – Proportion of points with value 1

Raises:

NotImplementedError – If instance does not contain binary data.

save(**kwargs)

Save intermediate to cachestore as netcdf.

Load and restore using load().

Parameters:

**kwargs (Any) – Keyword arguments passed directly to xarray.save_mfdataset()

Returns:

list[str] – Returns filenames of saved files

property shape

Return the shape of the dimensions.

Returns:

tuple[int, int, int, int] – Shape of underlying data

property size

Return the size of (each) array in underlying data.

Returns:

int – Total number of grid points in underlying data

to_polygon_feature(level=None, time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)

Create GeoJSON Feature artifact from spatial array on a single level and time slice.

Computed polygons always contain an exterior linear ring as defined by the GeoJSON Polygon specification <https://www.rfc-editor.org/rfc/rfc7946.html#section-3.1.6>. Polygons may also contain interior linear rings (holes). This method does not support nesting beyond the GeoJSON specification. See the pycontrails.core.polygon for additional polygon support.

Changed in version 0.25.12: Previous implementation include several additional parameters which have been removed:

New implementation includes new parameters previously lacking:

  • fill_value

  • min_area

  • include_altitude

Changed in version 0.38.0: Change default value of epsilon from 0.15 to 0.

Changed in version 0.41.0: Convert continuous fields to binary fields before computing polygons. The parameters max_area and epsilon are now expressed in terms of longitude/latitude units instead of pixels.

Parameters:
  • level (float, optional) – Level slice to create polygons. If the “level” coordinate is length 1, then the single level slice will be selected automatically.

  • time (datetime, optional) – Time slice to create polygons. If the “time” coordinate is length 1, then the single time slice will be selected automatically.

  • fill_value (float, optional) – Value used for filling missing data and for padding the underlying data array. Set to np.nan by default, which ensures that regions with missing data are never included in polygons.

  • iso_value (float, optional) – Value in field to create iso-surface. Defaults to the average of the min and max value of the array. (This is the same convention as used by skimage.)

  • min_area (float, optional) – Minimum area of each polygon. Polygons with area less than min_area are not included in the output. The unit of this parameter is in longitude/latitude degrees squared. Set to 0 to omit any polygon filtering based on a minimal area conditional. By default, 0.0.

  • epsilon (float, optional) – Control the extent to which the polygon is simplified. A value of 0 does not alter the geometry of the polygon. The unit of this parameter is in longitude/latitude degrees. By default, 0.0.

  • lower_bound (bool, optional) – Whether to use iso_value as a lower or upper bound on values in polygon interiors. By default, True.

  • precision (int, optional) – Number of decimal places to round coordinates to. If None, no rounding is performed.

  • interiors (bool, optional) – If True, include interior linear rings (holes) in the output. True by default.

  • convex_hull (bool, optional) – EXPERIMENTAL. If True, compute the convex hull of each polygon. Only implemented for depth=1. False by default. A warning is issued if the underlying algorithm fails to make valid polygons after computing the convex hull.

  • include_altitude (bool, optional) – If True, include the array altitude [\(m\)] as a z-coordinate in the GeoJSON output <https://www.rfc-editor.org/rfc/rfc7946#section-3.1.1>. False by default.

  • properties (dict, optional) – Additional properties to include in the GeoJSON output. By default, None.

Returns:

dict[str, Any] – Python representation of GeoJSON Feature with MultiPolygon geometry.

Notes

Cocip and CocipGrid set some quantities to 0 and other quantities to np.nan in regions where no contrails form. When computing polygons from Cocip or CocipGrid output, take care that the choice of fill_value correctly includes or excludes contrail-free regions. See the Cocip documentation for details about np.nan in model output.

See also

to_polyhedra(), polygons.find_multipolygons()

Examples

>>> from pprint import pprint
>>> from pycontrails.datalib.ecmwf import ERA5
>>> era5 = ERA5("2022-03-01", variables="air_temperature", pressure_levels=250)
>>> mda = era5.open_metdataset()["air_temperature"]
>>> mda.shape
(1440, 721, 1, 1)
>>> pprint(mda.to_polygon_feature(iso_value=239.5, precision=2, epsilon=0.1))
{'geometry': {'coordinates': [[[[167.88, -22.5],
                                [167.75, -22.38],
                                [167.62, -22.5],
                                [167.75, -22.62],
                                [167.88, -22.5]]],
                              [[[43.38, -33.5],
                                [43.5, -34.12],
                                [43.62, -33.5],
                                [43.5, -33.38],
                                [43.38, -33.5]]]],
              'type': 'MultiPolygon'},
 'properties': {},
 'type': 'Feature'}
to_polygon_feature_collection(time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)

Create GeoJSON FeatureCollection artifact from spatial array at time slice.

See the to_polygon_feature() method for a description of the parameters.

Returns:

dict[str, Any] – Python representation of GeoJSON FeatureCollection. This dictionary is comprised of individual GeoJON Features, one per self.data["level"].

to_polyhedra(*, time=None, iso_value=0.0, simplify_fraction=1.0, lower_bound=True, return_type='geojson', path=None, altitude_scale=1.0, output_vertex_normals=False, closed=True)

Create a collection of polyhedra from spatial array corresponding to a single time slice.

Parameters:
  • time (datetime, optional) – Time slice to create mesh. If the “time” coordinate is length 1, then the single time slice will be selected automatically.

  • iso_value (float, optional) – Value in field to create iso-surface. Defaults to 0.

  • simplify_fraction (float, optional) – Apply open3d simplify_quadric_decimation method to simplify the polyhedra geometry. This parameter must be in the half-open interval (0.0, 1.0]. Defaults to 1.0, corresponding to no reduction.

  • lower_bound (bool, optional) – Whether to use iso_value as a lower or upper bound on values in polyhedra interiors. By default, True.

  • return_type (str, optional) – Must be one of “geojson” or “mesh”. Defaults to “geojson”. If “geojson”, this method returns a dictionary representation of a geojson MultiPolygon object whose polygons are polyhedra faces. If “mesh”, this method returns an open3d TriangleMesh instance.

  • path (str, optional) – Output geojson or mesh to file. If return_type is “mesh”, see Open3D File I/O for Mesh for file type options.

  • altitude_scale (float, optional) – Rescale the altitude dimension of the mesh, [\(m\)]

  • output_vertex_normals (bool, optional) – If path is defined, write out vertex normals. Defaults to False.

  • closed (bool, optional) – If True, pad spatial array along all axes to ensure polyhedra are “closed”. This flag often gives rise to cleaner visualizations. Defaults to True.

Returns:

dict | open3d.geometry.TriangleMesh – Python representation of geojson object or Open3D Triangle Mesh depending on the return_type parameter.

Raises:

Notes

Uses the scikit-image Marching Cubes algorithm to reconstruct a surface from the point-cloud like arrays.

property values

Return underlying numpy array.

This methods loads data if it is not already in memory.

Returns:

numpy.ndarray – Underlying numpy array

wrap_longitude()

Wrap longitude coordinates.

Returns:

Self – Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped