pycontrails.MetDataArray¶
- class pycontrails.MetDataArray(data, cachestore=None, wrap_longitude=False, copy=True, name=None)¶
Bases:
MetBase
Meteorological DataArray of single variable.
Wrapper around
xarray.DataArray
to enforce certain variables and dimensions for internal usage.Changed in version 0.54.4: Remove
validate
parameter. Validation is now always performed.- Parameters:
data (
ArrayLike
) – xr.DataArray or other array-like data source. When array-like input is provided, input**kwargs
passed directly to xr.DataArray constructor.cachestore (
CacheStore
, optional) – Cache datastore for staging intermediates withsave()
. Defaults to DiskCacheStore.wrap_longitude (
bool
, optional) – Wrap data along the longitude dimension. If True, duplicate and shift longitude values (ie, -180 -> 180) to ensure that the longitude dimension covers the entire interval[-180, 180]
. Defaults to False.copy (
bool
, optional) – Copy data parameter on construction, by default True. If data is lazy-loaded via dask, this parameter has no effect. If data is already loaded into memory, a copy of the data (rather than a view) may be created if True.name (
Hashable
, optional) – Name of the data variable. If not specified, the name will be set to “met”.
Examples
>>> import numpy as np >>> import xarray as xr >>> rng = np.random.default_rng(seed=456)
>>> # Cook up random xarray object >>> coords = { ... "longitude": np.arange(-20, 20), ... "latitude": np.arange(-30, 30), ... "level": [220, 240, 260, 280], ... "time": [np.datetime64("2021-08-01T12", "ns"), np.datetime64("2021-08-01T16", "ns")] ... } >>> da = xr.DataArray(rng.random((40, 60, 4, 2)), dims=coords.keys(), coords=coords)
>>> # Cast to `MetDataArray` in order to interpolate >>> from pycontrails import MetDataArray >>> mda = MetDataArray(da) >>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13")) array([0.52358215])
>>> mda.interpolate(-11.4, 5.7, 234, np.datetime64("2021-08-01T13"), method='nearest') array([0.4188465])
>>> da.sel(longitude=-11, latitude=6, level=240, time=np.datetime64("2021-08-01T12")).item() 0.41884649899766946
- __init__(data, cachestore=None, wrap_longitude=False, copy=True, name=None)¶
Methods
__init__
(data[, cachestore, wrap_longitude, ...])broadcast_coords
(name)Broadcast coordinates along other dimensions.
copy
()Create a shallow copy of the current class.
downselect
(bbox)Downselect met data within spatial bounding box.
downselect_met
(met, *[, longitude_buffer, ...])Downselect
met
to encompass a spatiotemporal region of the data.Find edges of regions.
interpolate
(longitude, latitude, level, time, *)Interpolate values over underlying DataArray.
load
(hash[, cachestore, chunks])Load saved intermediate from
cachestore
.save
(**kwargs)Save intermediate to
cachestore
as netcdf.to_polygon_feature
([level, time, ...])Create GeoJSON Feature artifact from spatial array on a single level and time slice.
to_polygon_feature_collection
([time, ...])Create GeoJSON FeatureCollection artifact from spatial array at time slice.
to_polyhedra
(*[, time, iso_value, ...])Create a collection of polyhedra from spatial array corresponding to a single time slice.
Wrap longitude coordinates.
Attributes
Pass through to
self.data.attrs
.Determine if all data is a binary value (0, 1).
Get coordinates of underlying
data
coordinates.DataArray or Dataset
Default dimension order for DataArray or Dataset (x, y, z, t)
Generate a unique hash for this met instance.
Check if underlying
data
is loaded into memory.Low level access to underlying
data
indexes.Check if instance contains "single level" or "surface level" data.
Check if the longitude dimension covers the closed interval
[-180, 180]
.Check if underlying
data
is sourced from a Zarr group.Return the DataArray name.
Compute proportion of points with value 1.
Return the shape of the dimensions.
Return the size of (each) array in underlying
data
.Return underlying numpy array.
- property attrs¶
Pass through to
self.data.attrs
.
- property binary¶
Determine if all data is a binary value (0, 1).
- Returns:
bool
– True if all data values are binary value (0, 1)
- broadcast_coords(name)¶
Broadcast coordinates along other dimensions.
- Parameters:
name (
str
) – Coordinate/dimension name to broadcast. Can be a dimension or non-dimension coordinates.- Returns:
xarray.DataArray
– DataArray of the coordinate broadcasted along all other dimensions. The DataArray will have the same shape as the gridded data.
- property coords¶
Get coordinates of underlying
data
coordinates.Only return non-dimension coordinates.
See: http://xarray.pydata.org/en/stable/user-guide/data-structures.html#coordinates
- Returns:
dict[str
,np.ndarray]
– Dictionary of coordinates
- copy()¶
Create a shallow copy of the current class.
See
xarray.Dataset.copy()
for reference.- Returns:
Self
– Copy of the current class
- data¶
DataArray or Dataset
- dim_order = ('longitude', 'latitude', 'level', 'time')¶
Default dimension order for DataArray or Dataset (x, y, z, t)
- downselect(bbox)¶
Downselect met data within spatial bounding box.
- Parameters:
bbox (
list[float]
) – List of coordinates defining a spatial bounding box in WGS84 coordinates. For 2D queries, list is [west, south, east, north]. For 3D queries, list is [west, south, min-level, east, north, max-level] with level defined in [\(hPa\)].- Returns:
Self
– Return downselected data
- downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(np.timedelta64(0, 'h'), np.timedelta64(0, 'h')))¶
Downselect
met
to encompass a spatiotemporal region of the data.Warning
This method is analogous to
GeoVectorDataset.downselect_met()
. It does not change the instance data, but instead operates on themet
input. This method is different fromdownselect()
which operates on the instance data.Changed in version 0.54.5: Data is no longer copied when downselecting.
- Parameters:
met (
MetDataset | MetDataArray
) – MetDataset or MetDataArray to downselect.longitude_buffer (
tuple[float
,float]
, optional) – Extend longitude domain past bylongitude_buffer[0]
on the low side andlongitude_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
degrees.latitude_buffer (
tuple[float
,float]
, optional) – Extend latitude domain past bylatitude_buffer[0]
on the low side andlatitude_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
degrees.level_buffer (
tuple[float
,float]
, optional) – Extend level domain past bylevel_buffer[0]
on the low side andlevel_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(0, 0)
[\(hPa\)].time_buffer (
tuple[np.timedelta64
,np.timedelta64]
, optional) – Extend time domain past bytime_buffer[0]
on the low side andtime_buffer[1]
on the high side. Units must be the same as class coordinates. Defaults to(np.timedelta64(0, "h"), np.timedelta64(0, "h"))
.
- Returns:
MetDataset | MetDataArray
– Copy of downselected MetDataset or MetDataArray.
- find_edges()¶
Find edges of regions.
- Returns:
Self
– MetDataArray with a binary field, 1 on the edge of the regions, 0 outside and inside the regions.- Raises:
NotImplementedError – If the instance is not binary.
- property hash¶
Generate a unique hash for this met instance.
Note this is not as robust as it could be since repr cuts off.
- Returns:
str
– Unique hash for met instance (sha1)
- property in_memory¶
Check if underlying
data
is loaded into memory.This method uses protected attributes of underlying xarray objects, and may be subject to deprecation.
Changed in version 0.26.0: Rename from
is_loaded
toin_memory
.- Returns:
bool
– If underlying data exists as an np.ndarray in memory.
- property indexes¶
Low level access to underlying
data
indexes.This method is typically is faster for accessing coordinate indexes.
Added in version 0.25.2.
- Returns:
dict[Hashable
,pd.Index]
– Dictionary of indexes.
Examples
>>> from pycontrails.datalib.ecmwf import ERA5 >>> times = (datetime(2022, 3, 1, 12), datetime(2022, 3, 1, 13)) >>> variables = "air_temperature", "specific_humidity" >>> levels = [200, 300] >>> era5 = ERA5(times, variables, levels) >>> mds = era5.open_metdataset() >>> mds.indexes["level"].to_numpy() array([200., 300.])
>>> mda = mds["air_temperature"] >>> mda.indexes["level"].to_numpy() array([200., 300.])
- interpolate(longitude, latitude, level, time, *, method='linear', bounds_error=False, fill_value=nan, localize=False, lowmem=False, indices=None, return_indices=False)¶
Interpolate values over underlying DataArray.
Zero dimensional coordinates are reshaped to 1D arrays.
If
lowmem == False
, method automatically loads underlyingdata
into memory. Otherwise, method iterates through smaller subsets ofdata
and releases subsets from memory once interpolation against each subset is finished.If
method == "nearest"
, the out array will have the samedtype
as the underlyingdata
.If
method == "linear"
, the out array will be promoted to the most precisedtype
of:underlying
data
data.longitude
data.latitude
data.level
longitude
latitude
Added in version 0.24: This method can now handle singleton dimensions with
method == "linear"
. Previously these degenerate dimensions caused nan values to be returned.- Parameters:
longitude (
float | npt.NDArray[np.floating]
) – Longitude values to interpolate. Assumed to be 0 or 1 dimensional.latitude (
float | npt.NDArray[np.floating]
) – Latitude values to interpolate. Assumed to be 0 or 1 dimensional.level (
float | npt.NDArray[np.floating]
) – Level values to interpolate. Assumed to be 0 or 1 dimensional.time (
np.datetime64 | npt.NDArray[np.datetime64]
) – Time values to interpolate. Assumed to be 0 or 1 dimensional.method (
str
, optional) – Additional keyword arguments to pass toscipy.interpolate.RegularGridInterpolator
. Defaults to “linear”.bounds_error (
bool
, optional) – Additional keyword arguments to pass toscipy.interpolate.RegularGridInterpolator
. Defaults toFalse
.fill_value (
float | np.float64
, optional) – Additional keyword arguments to pass toscipy.interpolate.RegularGridInterpolator
. Set to None to extrapolate outside the boundary whenmethod
isnearest
. Defaults tonp.nan
.localize (
bool
, optional) – Experimental. If True, downselect gridded data to smallest bounding box containing all points. By default False.lowmem (
bool
, optional) – Experimental. If True, iterate through points binned by the time coordinate of the grided data, and downselect gridded data to the smallest bounding box containing each binned set of point before loading into memory. This can significantly reduce memory consumption with large numbers of points at the cost of increased runtime. By default False.indices (
tuple | None
, optional) – Experimental. Seeinterpolation.interp()
. None by default.return_indices (
bool
, optional) – Experimental. Seeinterpolation.interp()
. False by default. Note that values returned differ whenlowmem=True
andlowmem=False
, so output should only be re-used in calls with the samelowmem
value.
- Returns:
numpy.ndarray
– Interpolated values
See also
Examples
>>> from datetime import datetime >>> import numpy as np >>> import pandas as pd >>> from pycontrails.datalib.ecmwf import ERA5
>>> times = (datetime(2022, 3, 1, 12), datetime(2022, 3, 1, 15)) >>> variables = "air_temperature" >>> levels = [200, 250, 300] >>> era5 = ERA5(times, variables, levels) >>> met = era5.open_metdataset() >>> mda = met["air_temperature"]
>>> # Interpolation at a grid point agrees with value >>> mda.interpolate(1, 2, 300, np.datetime64('2022-03-01T14:00')) array([241.91972984])
>>> da = mda.data >>> da.sel(longitude=1, latitude=2, level=300, time=np.datetime64('2022-03-01T14')).item() 241.9197298421629
>>> # Interpolation off grid >>> mda.interpolate(1.1, 2.1, 290, np.datetime64('2022-03-01 13:10')) array([239.83793798])
>>> # Interpolate along path >>> longitude = np.linspace(1, 2, 10) >>> latitude = np.linspace(2, 3, 10) >>> level = np.linspace(200, 300, 10) >>> time = pd.date_range("2022-03-01T14", periods=10, freq="5min") >>> mda.interpolate(longitude, latitude, level, time) array([220.44347694, 223.08900738, 225.74338924, 228.41642088, 231.10858599, 233.54857391, 235.71504913, 237.86478872, 239.99274623, 242.10792167])
>>> # Can easily switch to alternative low-memory implementation >>> mda.interpolate(longitude, latitude, level, time, lowmem=True) array([220.44347694, 223.08900738, 225.74338924, 228.41642088, 231.10858599, 233.54857391, 235.71504913, 237.86478872, 239.99274623, 242.10792167])
- property is_single_level¶
Check if instance contains “single level” or “surface level” data.
This method checks if
level
dimension contains a single value equal to -1, the pycontrails convention for surface only data.- Returns:
bool
– If instance contains single level data.
- property is_wrapped¶
Check if the longitude dimension covers the closed interval
[-180, 180]
.Assumes the longitude dimension is sorted (this is established by the
MetDataset
orMetDataArray
constructor).- Returns:
bool
– True if longitude coordinates cover[-180, 180]
- property is_zarr¶
Check if underlying
data
is sourced from a Zarr group.Implementation is very brittle, and may break as external libraries change.
Some
dask
intermediate artifact is cached when this is called. Typically, subsequent calls to this method are much faster than the initial call.Added in version 0.26.0.
- Returns:
bool
– Ifdata
is based on a Zarr group.
- classmethod load(hash, cachestore=None, chunks=None)¶
Load saved intermediate from
cachestore
.- Parameters:
hash (
str
) – Saved hash to load.cachestore (
CacheStore
, optional) – Cache datastore to use for sourcing files. Defaults to DiskCacheStore.chunks (
dict[str
,int]
, optional) – Chunks kwarg passed toxarray.open_mfdataset()
when opening files.
- Returns:
MetDataArray
– New MetDataArray with loaded data.
- property name¶
Return the DataArray name.
- Returns:
Hashable
– DataArray name
- property proportion¶
Compute proportion of points with value 1.
- Returns:
float
– Proportion of points with value 1- Raises:
NotImplementedError – If instance does not contain binary data.
- save(**kwargs)¶
Save intermediate to
cachestore
as netcdf.Load and restore using
load()
.- Parameters:
**kwargs (
Any
) – Keyword arguments passed directly toxarray.save_mfdataset()
- Returns:
list[str]
– Returns filenames of saved files
- property shape¶
Return the shape of the dimensions.
- property size¶
Return the size of (each) array in underlying
data
.- Returns:
int
– Total number of grid points in underlying data
- to_polygon_feature(level=None, time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)¶
Create GeoJSON Feature artifact from spatial array on a single level and time slice.
Computed polygons always contain an exterior linear ring as defined by the GeoJSON Polygon specification <https://www.rfc-editor.org/rfc/rfc7946.html#section-3.1.6>. Polygons may also contain interior linear rings (holes). This method does not support nesting beyond the GeoJSON specification. See the
pycontrails.core.polygon
for additional polygon support.Changed in version 0.25.12: Previous implementation include several additional parameters which have been removed:
The
approximate
parameterAn
path
parameter to save output as JSONPassing arbitrary kwargs to
skimage.measure.find_contours()
.
New implementation includes new parameters previously lacking:
fill_value
min_area
include_altitude
Changed in version 0.38.0: Change default value of
epsilon
from 0.15 to 0.Changed in version 0.41.0: Convert continuous fields to binary fields before computing polygons. The parameters
max_area
andepsilon
are now expressed in terms of longitude/latitude units instead of pixels.- Parameters:
level (
float
, optional) – Level slice to create polygons. If the “level” coordinate is length 1, then the single level slice will be selected automatically.time (
datetime
, optional) – Time slice to create polygons. If the “time” coordinate is length 1, then the single time slice will be selected automatically.fill_value (
float
, optional) – Value used for filling missing data and for padding the underlying data array. Set tonp.nan
by default, which ensures that regions with missing data are never included in polygons.iso_value (
float
, optional) – Value in field to create iso-surface. Defaults to the average of the min and max value of the array. (This is the same convention as used byskimage
.)min_area (
float
, optional) – Minimum area of each polygon. Polygons with area less thanmin_area
are not included in the output. The unit of this parameter is in longitude/latitude degrees squared. Set to 0 to omit any polygon filtering based on a minimal area conditional. By default, 0.0.epsilon (
float
, optional) – Control the extent to which the polygon is simplified. A value of 0 does not alter the geometry of the polygon. The unit of this parameter is in longitude/latitude degrees. By default, 0.0.lower_bound (
bool
, optional) – Whether to useiso_value
as a lower or upper bound on values in polygon interiors. By default, True.precision (
int
, optional) – Number of decimal places to round coordinates to. If None, no rounding is performed.interiors (
bool
, optional) – If True, include interior linear rings (holes) in the output. True by default.convex_hull (
bool
, optional) – EXPERIMENTAL. If True, compute the convex hull of each polygon. Only implemented for depth=1. False by default. A warning is issued if the underlying algorithm fails to make valid polygons after computing the convex hull.include_altitude (
bool
, optional) – If True, include the array altitude [\(m\)] as a z-coordinate in the GeoJSON output <https://www.rfc-editor.org/rfc/rfc7946#section-3.1.1>. False by default.properties (
dict
, optional) – Additional properties to include in the GeoJSON output. By default, None.
- Returns:
dict[str
,Any]
– Python representation of GeoJSON Feature with MultiPolygon geometry.
Notes
Cocip
andCocipGrid
set some quantities to 0 and other quantities tonp.nan
in regions where no contrails form. When computing polygons fromCocip
orCocipGrid
output, take care that the choice offill_value
correctly includes or excludes contrail-free regions. See theCocip
documentation for details aboutnp.nan
in model output.See also
to_polyhedra()
,polygons.find_multipolygons()
Examples
>>> from pprint import pprint >>> from pycontrails.datalib.ecmwf import ERA5 >>> era5 = ERA5("2022-03-01", variables="air_temperature", pressure_levels=250) >>> mda = era5.open_metdataset()["air_temperature"] >>> mda.shape (1440, 721, 1, 1)
>>> pprint(mda.to_polygon_feature(iso_value=239.5, precision=2, epsilon=0.1)) {'geometry': {'coordinates': [[[[167.88, -22.5], [167.75, -22.38], [167.62, -22.5], [167.75, -22.62], [167.88, -22.5]]], [[[43.38, -33.5], [43.5, -34.12], [43.62, -33.5], [43.5, -33.38], [43.38, -33.5]]]], 'type': 'MultiPolygon'}, 'properties': {}, 'type': 'Feature'}
- to_polygon_feature_collection(time=None, fill_value=nan, iso_value=None, min_area=0.0, epsilon=0.0, lower_bound=True, precision=None, interiors=True, convex_hull=False, include_altitude=False, properties=None)¶
Create GeoJSON FeatureCollection artifact from spatial array at time slice.
See the
to_polygon_feature()
method for a description of the parameters.- Returns:
dict[str
,Any]
– Python representation of GeoJSON FeatureCollection. This dictionary is comprised of individual GeoJON Features, one perself.data["level"]
.
- to_polyhedra(*, time=None, iso_value=0.0, simplify_fraction=1.0, lower_bound=True, return_type='geojson', path=None, altitude_scale=1.0, output_vertex_normals=False, closed=True)¶
Create a collection of polyhedra from spatial array corresponding to a single time slice.
- Parameters:
time (
datetime
, optional) – Time slice to create mesh. If the “time” coordinate is length 1, then the single time slice will be selected automatically.iso_value (
float
, optional) – Value in field to create iso-surface. Defaults to 0.simplify_fraction (
float
, optional) – Apply open3d simplify_quadric_decimation method to simplify the polyhedra geometry. This parameter must be in the half-open interval (0.0, 1.0]. Defaults to 1.0, corresponding to no reduction.lower_bound (
bool
, optional) – Whether to useiso_value
as a lower or upper bound on values in polyhedra interiors. By default, True.return_type (
str
, optional) – Must be one of “geojson” or “mesh”. Defaults to “geojson”. If “geojson”, this method returns a dictionary representation of a geojson MultiPolygon object whose polygons are polyhedra faces. If “mesh”, this method returns an open3d TriangleMesh instance.path (
str
, optional) – Output geojson or mesh to file. If return_type is “mesh”, see Open3D File I/O for Mesh for file type options.altitude_scale (
float
, optional) – Rescale the altitude dimension of the mesh, [\(m\)]output_vertex_normals (
bool
, optional) – Ifpath
is defined, write out vertex normals. Defaults to False.closed (
bool
, optional) – If True, pad spatial array along all axes to ensure polyhedra are “closed”. This flag often gives rise to cleaner visualizations. Defaults to True.
- Returns:
dict | open3d.geometry.TriangleMesh
– Python representation of geojson object or Open3D Triangle Mesh depending on the return_type parameter.- Raises:
ModuleNotFoundError – Method requires the vis optional dependencies
ValueError – If input parameters are invalid.
Notes
Uses the scikit-image Marching Cubes algorithm to reconstruct a surface from the point-cloud like arrays.
- property values¶
Return underlying numpy array.
This methods loads
data
if it is not already in memory.- Returns:
numpy.ndarray
– Underlying numpy array
See also
- wrap_longitude()¶
Wrap longitude coordinates.
- Returns:
Self
– Copy of instance with wrapped longitude values. Returns copy of data when longitude values are already wrapped