pycontrails.core.vector

Lightweight data structures for vector paths.

Module Attributes

VectorDatasetType

Vector types

Functions

vector_to_lon_lat_grid(vector, agg, *[, ...])

Convert vectors to a longitude-latitude grid.

Classes

AttrDict

Thin wrapper around dict to warn when setting a key that already exists.

GeoVectorDataset([data, longitude, ...])

Base class to hold 1D geospatial arrays of consistent size.

VectorDataDict([data])

Thin wrapper around dict[str, np.ndarray] to ensure consistency.

VectorDataset([data, attrs, copy])

Base class to hold 1D arrays of consistent size.

class pycontrails.core.vector.AttrDict

Bases: dict[str, Any]

Thin wrapper around dict to warn when setting a key that already exists.

setdefault(k, default=None)

Thin wrapper around dict.setdefault.

Overwrites value if value is None.

Parameters:
  • k (str) – Key

  • default (Any, optional) – Default value for key k

Returns:

Any – Value at k

class pycontrails.core.vector.GeoVectorDataset(data=None, *, longitude=None, latitude=None, altitude=None, altitude_ft=None, level=None, time=None, attrs=None, copy=True, **attrs_kwargs)

Bases: VectorDataset

Base class to hold 1D geospatial arrays of consistent size.

GeoVectorDataset is required to have geospatial coordinate keys defined in required_keys.

Expect latitude-longitude CRS in WGS 84. Expect altitude in [\(m\)]. Expect level in [\(hPa\)].

Each spatial variable is expected to have “float32” or “float64” dtype. The time variable is expected to have “datetime64[ns]” dtype.

Use the attribute attr["crs"] to specify coordinate reference system using PROJ or EPSG syntax.

Parameters:
  • data (dict[str, npt.ArrayLike] | pd.DataFrame | VectorDataDict | VectorDataset | None, optional) – Data dictionary or pandas.DataFrame . Must include keys/columns time, latitude, longitude, altitude or level. Keyword arguments for time, latitude, longitude, altitude or level override data inputs. Expects altitude in meters and time as a DatetimeLike (or array that can processed with pd.to_datetime()). Additional waypoint-specific data can be included as additional keys/columns.

  • longitude (npt.ArrayLike, optional) – Longitude data. Defaults to None.

  • latitude (npt.ArrayLike, optional) – Latitude data. Defaults to None.

  • altitude (npt.ArrayLike, optional) – Altitude data, [\(m\)]. Defaults to None.

  • altitude_ft (npt.ArrayLike, optional) – Altitude data, [\(ft\)]. Defaults to None.

  • level (npt.ArrayLike, optional) – Level data, [\(hPa\)]. Defaults to None.

  • time (npt.ArrayLike, optional) – Time data. Expects an array of DatetimeLike values, or array that can processed with pd.to_datetime(). Defaults to None.

  • attrs (dict[Hashable, Any] | AttrDict, optional) – Additional properties as a dictionary. Defaults to {}.

  • copy (bool, optional) – Copy data on class creation. Defaults to True.

  • **attrs_kwargs (Any) – Additional properties passed as keyword arguments.

Raises:

KeyError – Raises if data input does not contain at least time, latitude, longitude, (altitude or level).

T_isa()

Calculate the ICAO standard atmosphere temperature at each point.

Returns:

npt.NDArray[np.float64] – ISA temperature, [\(K\)]

property air_pressure

Get air_pressure values for points.

Returns:

npt.NDArray[np.float64] – Point air pressure values, [\(Pa\)]

property altitude

Get altitude.

Automatically calculates altitude using units.pl_to_m() using level key.

Note that if altitude key exists in data, the data at the altitude key will be returned. This allows an override of the default calculation of altitude from pressure level.

Returns:

npt.NDArray[np.float64] – Altitude, [\(m\)]

property altitude_ft

Get altitude in feet.

Returns:

npt.NDArray[np.float64] – Altitude, [\(ft\)]

property constants

Return a dictionary of constant attributes and data values.

Includes attrs and values from columns in data with a unique value.

Returns:

dict[str, Any] – Properties and their constant values

property coords

Get geospatial coordinates for compatibility with MetDataArray.

Returns:

pandas.DataFramepd.DataFrame with columns longitude, latitude, level, and time.

coords_intersect_met(met)

Return boolean mask of data inside the bounding box defined by met.

Parameters:

met (MetDataset | MetDataArray) – MetDataset or MetDataArray to compare.

Returns:

npt.NDArray[np.bool_] – True if point is inside the bounding box defined by met.

classmethod create_empty(keys=None, attrs=None, **attrs_kwargs)

Create instance with variables defined by keys and size 0.

If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.

Parameters:
  • keys (Iterable[str]) – Keys to include in empty VectorDataset instance.

  • attrs (dict[str, Any] | None, optional) – Attributes to attach instance.

  • **attrs_kwargs (Any) – Define attributes as keyword arguments.

Returns:

VectorDatasetType – Empty VectorDataset instance.

downselect_met(met, *, longitude_buffer=(0.0, 0.0), latitude_buffer=(0.0, 0.0), level_buffer=(0.0, 0.0), time_buffer=(numpy.timedelta64(0, 'h'), numpy.timedelta64(0, 'h')), copy=True)

Downselect met to encompass a spatiotemporal region of the data.

Parameters:
  • met (MetDataset | MetDataArray) – MetDataset or MetDataArray to downselect.

  • longitude_buffer (tuple[float, float], optional) – Extend longitude domain past by longitude_buffer[0] on the low side and longitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • latitude_buffer (tuple[float, float], optional) – Extend latitude domain past by latitude_buffer[0] on the low side and latitude_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) degrees.

  • level_buffer (tuple[float, float], optional) – Extend level domain past by level_buffer[0] on the low side and level_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (0, 0) [\(hPa\)].

  • time_buffer (tuple[np.timedelta64, np.timedelta64], optional) – Extend time domain past by time_buffer[0] on the low side and time_buffer[1] on the high side. Units must be the same as class coordinates. Defaults to (np.timedelta64(0, "h"), np.timedelta64(0, "h")).

  • copy (bool) – If returned object is a copy or view of the original. True by default.

Returns:

MetDataset | MetDataArray – Copy of downselected MetDataset or MetDataArray.

intersect_met(mda, *, longitude=None, latitude=None, level=None, time=None, use_indices=False, **interp_kwargs)

Intersect waypoints with MetDataArray.

Parameters:
  • mda (MetDataArray) – MetDataArray containing a meteorological variable at spatio-temporal coordinates.

  • longitude (npt.NDArray[np.float64], optional) – Override existing coordinates for met interpolation

  • latitude (npt.NDArray[np.float64], optional) – Override existing coordinates for met interpolation

  • level (npt.NDArray[np.float64], optional) – Override existing coordinates for met interpolation

  • time (npt.NDArray[np.datetime64], optional) – Override existing coordinates for met interpolation

  • use_indices (bool, optional) – Experimental.

  • **interp_kwargs (Any) – Additional keyword arguments to pass to MetDataArray.intersect_met(). Examples include method, bounds_error, and fill_value. If an error such as

    ValueError: One of the requested xi is out of bounds in dimension 2
    

    occurs, try calling this function with bounds_error=False. In addition, setting fill_value=0.0 will replace NaN values with 0.0.

Returns:

npt.NDArray[np.float64] – Interpolated values

Examples

>>> from datetime import datetime
>>> import pandas as pd
>>> import numpy as np
>>> from pycontrails.datalib.ecmwf import ERA5
>>> from pycontrails import Flight
>>> # Get met data
>>> times = (datetime(2022, 3, 1, 0),  datetime(2022, 3, 1, 3))
>>> variables = ["air_temperature", "specific_humidity"]
>>> levels = [300, 250, 200]
>>> era5 = ERA5(time=times, variables=variables, pressure_levels=levels)
>>> met = era5.open_metdataset()
>>> # Example flight
>>> df = pd.DataFrame()
>>> df['longitude'] = np.linspace(0, 50, 10)
>>> df['latitude'] = np.linspace(0, 10, 10)
>>> df['altitude'] = 11000
>>> df['time'] = pd.date_range("2022-03-01T00", "2022-03-01T02", periods=10)
>>> fl = Flight(df)
>>> # Intersect
>>> fl.intersect_met(met['air_temperature'], method='nearest')
array([231.62969892, 230.72604651, 232.24318771, 231.88338483,
       231.06429438, 231.59073409, 231.65125393, 231.93064004,
       232.03344087, 231.65954432])
>>> fl.intersect_met(met['air_temperature'], method='linear')
array([225.77794552, 225.13908414, 226.231218  , 226.31831528,
       225.56102321, 225.81192149, 226.03192642, 226.22056121,
       226.03770174, 225.63226188])
>>> # Interpolate and attach to `Flight` instance
>>> for key in met:
...     fl[key] = fl.intersect_met(met[key])
>>> # Show the final three columns of the dataframe
>>> fl.dataframe.iloc[:, -3:].head()
                 time  air_temperature  specific_humidity
0 2022-03-01 00:00:00       225.777946           0.000132
1 2022-03-01 00:13:20       225.139084           0.000132
2 2022-03-01 00:26:40       226.231218           0.000107
3 2022-03-01 00:40:00       226.318315           0.000171
4 2022-03-01 00:53:20       225.561022           0.000109
property level

Get pressure level values for points.

Automatically calculates pressure level using units.m_to_pl() using altitude key.

Note that if level key exists in data, the data at the level key will be returned. This allows an override of the default calculation of pressure level from altitude.

Returns:

npt.NDArray[np.float64] – Point pressure level values, [\(hPa\)]

required_keys = ('longitude', 'latitude', 'time')

Required keys for creating GeoVectorDataset

to_geojson_points()

Return dataset as GeoJSON FeatureCollection of Points.

Each Feature has a properties attribute that includes time and other data besides latitude, longitude, and altitude in data.

Returns:

dict[str, Any] – Python representation of GeoJSON FeatureCollection

to_lon_lat_grid(agg, *, spatial_bbox=(-180.0, -90.0, 180.0, 90.0), spatial_grid_res=0.5)

Convert vectors to a longitude-latitude grid.

to_pseudo_mercator(copy=True)

Convert data from attrs["crs"] to Pseudo Mercator (EPSG:3857).

Parameters:

copy (bool, optional) – Copy data on transformation. Defaults to True.

Returns:

GeoVectorDatasetType

transform_crs(crs, copy=True)

Transform trajectory data from one coordinate reference system (CRS) to another.

Parameters:
  • crs (str) – Target CRS. Passed into to pyproj.Transformer. The source CRS is inferred from the attrs["crs"] attribute.

  • copy (bool, optional) – Copy data on transformation. Defaults to True.

Returns:

GeoVectorDatasetType – Converted dataset with new coordinate reference system. attrs["crs"] reflects new crs.

vertical_keys = ('altitude', 'level', 'altitude_ft')

At least one of these vertical-coordinate keys must also be included

class pycontrails.core.vector.VectorDataDict(data=None)

Bases: dict[str, ndarray]

Thin wrapper around dict[str, np.ndarray] to ensure consistency.

Parameters:

data (dict[str, np.ndarray], optional) – Dictionary input

setdefault(k, default=None)

Thin wrapper around dict.setdefault.

The main purpose of overriding is to run _validate_array() on set.

Parameters:
  • k (str) – Key

  • default (npt.ArrayLike, optional) – Default value for key k

Returns:

Any – Value at k

update(other=None, **kwargs)

Update values without warning if overwriting.

This method casts values in other to numpy.ndarray and ensures that the array sizes are consistent with the instance.

Parameters:
  • other (dict[str, npt.ArrayLike] | None, optional) – Fields to update as dict

  • **kwargs (npt.ArrayLike) – Fields to update as kwargs

class pycontrails.core.vector.VectorDataset(data=None, *, attrs=None, copy=True, **attrs_kwargs)

Bases: object

Base class to hold 1D arrays of consistent size.

Parameters:
  • data (dict[str, npt.ArrayLike] | pd.DataFrame | VectorDataDict | VectorDataset | None, optional) – Initial data, by default None

  • attrs (dict[str, Any] | AttrDict, optional) – Dictionary of attributes, by default None

  • copy (bool, optional) – Copy data on class creation, by default True

  • **attrs_kwargs (Any) – Additional attributes passed as keyword arguments

Raises:

ValueError – If “time” variable cannot be converted to numpy array.

attrs

Generic dataset attributes

broadcast_attrs(keys, overwrite=False, raise_error=True)

Attach values from keys in attrs onto data.

If possible, use dtype = np.float32 when broadcasting. If not possible, use whatever dtype is inferred from the data by numpy.full().

Parameters:
  • keys (str | Iterable[str]) – Keys to broadcast

  • overwrite (bool, optional) – If True, overwrite existing values in data. By default False.

  • raise_error (bool, optional) – Raise KeyError if self.attrs does not contain some of keys.

Raises:

KeyError – Not all keys found in attrs.

broadcast_numeric_attrs(ignore_keys=None, overwrite=False)

Attach numeric values in attrs onto data.

Iterate through values in attrs and attach float and int values to data.

This method modifies object in place.

Parameters:
  • ignore_keys (str | Iterable[str], optional) – Do not broadcast selected keys. Defaults to None.

  • overwrite (bool, optional) – If True, overwrite existing values in data. By default False.

copy(**kwargs)

Return a copy of this VectorDatasetType class.

Parameters:

**kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

VectorDatasetType – Copy of class

classmethod create_empty(keys, attrs=None, **attrs_kwargs)

Create instance with variables defined by keys and size 0.

If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.

Parameters:
  • keys (Iterable[str]) – Keys to include in empty VectorDataset instance.

  • attrs (dict[str, Any] | None, optional) – Attributes to attach instance.

  • **attrs_kwargs (Any) – Define attributes as keyword arguments.

Returns:

VectorDatasetType – Empty VectorDataset instance.

data

Vector data with labels as keys and numpy.ndarray as values

property dataframe

Shorthand property to access to_dataframe() with copy=False.

Returns:

pandas.DataFrame – Equivalent to the output from to_dataframe()

ensure_vars(vars, raise_error=True)

Ensure variables exist in column of data or attrs.

Parameters:
  • vars (str | Iterable[str]) – A single string variable name or a sequence of string variable names.

  • raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

bool – True if all variables exist. False otherwise.

Raises:

KeyError – Raises when dataset does not contain variable in vars

filter(mask, copy=True, **kwargs)

Filter data according to a boolean array mask.

Entries corresponding to mask == True are kept.

Parameters:
  • mask (npt.NDArray[np.bool_]) – Boolean array with compatible shape.

  • copy (bool, optional) – Copy data on filter. Defaults to True. See numpy best practices for insight into whether copy is appropriate.

  • **kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

VectorDatasetType – Containing filtered data

Raises:

TypeError – If mask is not a boolean array.

classmethod from_dict(obj, copy=True, **obj_kwargs)

Create instance from dict representation containing data and attrs.

Parameters:
  • obj (dict[str, Any]) – Dict representation of VectorDataset (e.g. to_dict())

  • copy (bool, optional) – Passed to VectorDataset constructor. Defaults to True.

  • **obj_kwargs (Any) – Additional properties passed as keyword arguments.

Returns:

VectorDatasetType – VectorDataset instance.

See also

to_dict()

generate_splits(n_splits, copy=True)

Split instance into n_split sub-vectors.

Parameters:
Returns:

Generator[VectorDatasetType, None, None] – Generator of split vectors.

get(key, default_value=None)

Get values from data with default_value if key not in data.

Parameters:
  • key (str) – Key to get from data

  • default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values at data[key] or default_value

get_data_or_attr(key, default=<object object>)

Get value from data or attrs.

This method first checks if key is in data and returns the value if so. If key is not in data, then this method checks if key is in attrs and returns the value if so. If key is not in data or attrs, then the default value is returned if provided. Otherwise a KeyError is raised.

Parameters:
  • key (str) – Key to get from data or attrs

  • default (Any, optional) – Default value to return if key is not in data or attrs.

Returns:

Any – Value at data[key] or attrs[key]

Raises:

KeyError – If key is not in data or attrs and default is not provided.

Examples

>>> vector = VectorDataset({"a": [1, 2, 3]}, attrs={"b": 4})
>>> vector.get_data_or_attr("a")
array([1, 2, 3])
>>> vector.get_data_or_attr("b")
4
>>> vector.get_data_or_attr("c")
Traceback (most recent call last):
...
KeyError: "Key 'c' not found in data or attrs."
>>> vector.get_data_or_attr("c", default=5)
5
property hash

Generate a unique hash for this class instance.

Returns:

str – Unique hash for flight instance (sha1)

select(keys, copy=True)

Return new class instance only containing specified keys.

Parameters:
  • keys (Iterable[str]) – An iterable of keys to filter by.

  • copy (bool, optional) – Copy data on selection. Defaults to True.

Returns:

VectorDataset – VectorDataset containing only data associated to keys. Note that this method always returns a VectorDataset, even if the calling class is a proper subclass of VectorDataset.

setdefault(key, default=None)

Shortcut to VectorDataDict.setdefault().

Parameters:
  • key (str) – Key in data dict.

  • default (npt.ArrayLike, optional) – Values to use as default, if key is not defined

Returns:

numpy.ndarray – Values at key

property shape

Shape of each array in data.

Returns:

tuple[int] – Shape of each array in data.

property size

Length of each array in data.

Returns:

int – Length of each array in data.

sort(by)

Sort data by key(s).

This method always creates a copy of the data by calling pandas.DataFrame.sort_values().

Parameters:

by (str | list[str]) – Key or list of keys to sort by.

Returns:

VectorDatasetType – Instance with sorted data.

classmethod sum(vectors, infer_attrs=True, fill_value=None)

Sum a list of VectorDataset instances.

Parameters:
  • vectors (Sequence[VectorDataset]) – List of VectorDataset instances to concatenate.

  • infer_attrs (bool, optional) – If True, infer attributes from the first element in the sequence.

  • fill_value (float, optional) – Fill value to use when concatenating arrays. By default None, which raises an error if incompatible keys are found.

Returns:

VectorDataset – Sum of all instances in vectors.

Raises:

KeyError – If incompatible data keys are found among vectors.

Examples

>>> from pycontrails import VectorDataset
>>> v1 = VectorDataset({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> v2 = VectorDataset({"a": [7, 8, 9], "b": [10, 11, 12]})
>>> v3 = VectorDataset({"a": [13, 14, 15], "b": [16, 17, 18]})
>>> v = VectorDataset.sum([v1, v2, v3])
>>> v.dataframe
    a   b
0   1   4
1   2   5
2   3   6
3   7  10
4   8  11
5   9  12
6  13  16
7  14  17
8  15  18
to_dataframe(copy=True)

Create pd.DataFrame in which each key-value pair in data is a column.

DataFrame does not copy data by default. Use the copy parameter to copy data values on creation.

Parameters:

copy (bool, optional) – Copy data on DataFrame creation.

Returns:

pandas.DataFrame – DataFrame holding key-values as columns.

to_dict()

Create dictionary with data and attrs.

If geo-spatial coordinates (e.g. "latitude", "longitude", "altitude") are present, round to a reasonable precision. If a "time" variable is present, round to unix seconds. When the instance is a GeoVectorDataset, disregard any "altitude" or "level" coordinate and only include "altitude_ft" in the output.

Returns:

dict[str, Any] – Dictionary with data and attrs.

See also

from_dict()

Examples

>>> import pprint
>>> from pycontrails import Flight
>>> fl = Flight(
...     longitude=[-100, -110],
...     latitude=[40, 50],
...     level=[200, 200],
...     time=[np.datetime64("2020-01-01T09"), np.datetime64("2020-01-01T09:30")],
...     aircraft_type="B737",
... )
>>> fl = fl.resample_and_fill("5min")
>>> pprint.pprint(fl.to_dict())
{'aircraft_type': 'B737',
 'altitude_ft': [38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0],
 'crs': 'EPSG:4326',
 'latitude': [40.0, 41.724, 43.428, 45.111, 46.769, 48.399, 50.0],
 'longitude': [-100.0,
               -101.441,
               -102.959,
               -104.563,
               -106.267,
               -108.076,
               -110.0],
 'time': [1577869200,
          1577869500,
          1577869800,
          1577870100,
          1577870400,
          1577870700,
          1577871000]}
update(other=None, **kwargs)

Update values in data dict without warning if overwriting.

Parameters:
  • other (dict[str, npt.ArrayLike] | None, optional) – Fields to update as dict

  • **kwargs (npt.ArrayLike) – Fields to update as kwargs

class pycontrails.core.vector.VectorDatasetType

Vector types

alias of TypeVar(‘VectorDatasetType’, bound=VectorDataset)

pycontrails.core.vector.vector_to_lon_lat_grid(vector, agg, *, spatial_bbox=(-180.0, -90.0, 180.0, 90.0), spatial_grid_res=0.5)

Convert vectors to a longitude-latitude grid.

Parameters:
  • vector (GeoVectorDataset) – Contains the longitude, latitude and variables for aggregation.

  • agg (dict[str, str]) – Variable name and the function selected for aggregation, i.e. {"segment_length": "sum"}.

  • spatial_bbox (tuple[float, float, float, float]) – Spatial bounding box, (lon_min, lat_min, lon_max, lat_max), [\(\deg\)]. By default, the entire globe is used.

  • spatial_grid_res (float) – Spatial grid resolution, [\(\deg\)]

Returns:

xarray.Dataset – Aggregated variables in a longitude-latitude grid.

Examples

>>> rng = np.random.default_rng(234)
>>> vector = GeoVectorDataset(
...     longitude=rng.uniform(-10, 10, 10000),
...     latitude=rng.uniform(-10, 10, 10000),
...     altitude=np.zeros(10000),
...     time=np.zeros(10000).astype("datetime64[ns]"),
... )
>>> vector["foo"] = rng.uniform(0, 1, 10000)
>>> ds = vector.to_lon_lat_grid({"foo": "sum"}, spatial_bbox=(-10, -10, 9.5, 9.5))
>>> da = ds["foo"]
>>> da.coords
Coordinates:
  * longitude  (longitude) float64 320B -10.0 -9.5 -9.0 -8.5 ... 8.0 8.5 9.0 9.5
  * latitude   (latitude) float64 320B -10.0 -9.5 -9.0 -8.5 ... 8.0 8.5 9.0 9.5
>>> da.values.round(2)
array([[2.23, 0.67, 1.29, ..., 4.66, 3.91, 1.93],
       [4.1 , 3.84, 1.34, ..., 3.24, 1.71, 4.55],
       [0.78, 3.25, 2.33, ..., 3.78, 2.93, 2.33],
       ...,
       [1.97, 3.02, 1.84, ..., 2.37, 3.87, 2.09],
       [3.74, 1.6 , 4.01, ..., 4.6 , 4.27, 3.4 ],
       [2.97, 0.12, 1.33, ..., 3.54, 0.74, 2.59]])
>>> da.sum().item() == vector["foo"].sum()
True