pycontrails.VectorDataset

class pycontrails.VectorDataset(data=None, *, attrs=None, copy=True, **attrs_kwargs)

Bases: object

Base class to hold 1D arrays of consistent size.

Parameters:
  • data (dict[str, npt.ArrayLike] | pd.DataFrame | VectorDataDict | VectorDataset | None, optional) – Initial data, by default None

  • attrs (dict[str, Any] | AttrDict, optional) – Dictionary of attributes, by default None

  • copy (bool, optional) – Copy data on class creation, by default True

  • **attrs_kwargs (Any) – Additional attributes passed as keyword arguments

Raises:

ValueError – If “time” variable cannot be converted to numpy array.

__init__(data=None, *, attrs=None, copy=True, **attrs_kwargs)

Methods

__init__([data, attrs, copy])

broadcast_attrs(keys[, overwrite, raise_error])

Attach values from keys in attrs onto data.

broadcast_numeric_attrs([ignore_keys, overwrite])

Attach numeric values in attrs onto data.

copy(**kwargs)

Return a copy of this instance.

create_empty(keys[, attrs])

Create instance with variables defined by keys and size 0.

ensure_vars(vars[, raise_error])

Ensure variables exist in column of data or attrs.

filter(mask[, copy])

Filter data according to a boolean array mask.

from_dict(obj[, copy])

Create instance from dict representation containing data and attrs.

generate_splits(n_splits[, copy])

Split instance into n_split sub-vectors.

get(key[, default_value])

Get values from data with default_value if key not in data.

get_data_or_attr(key[, default])

Get value from data or attrs.

select(keys[, copy])

Return new class instance only containing specified keys.

setdefault(key[, default])

Shortcut to VectorDataDict.setdefault().

sort(by)

Sort data by key(s).

sum(vectors[, infer_attrs, fill_value])

Sum a list of VectorDataset instances.

to_dataframe([copy])

Create pd.DataFrame in which each key-value pair in data is a column.

to_dict()

Create dictionary with data and attrs.

update([other])

Update values in data dict without warning if overwriting.

Attributes

data

Vector data with labels as keys and numpy.ndarray as values

attrs

Generic dataset attributes

dataframe

Shorthand property to access to_dataframe() with copy=False.

hash

Generate a unique hash for this class instance.

shape

Shape of each array in data.

size

Length of each array in data.

attrs

Generic dataset attributes

broadcast_attrs(keys, overwrite=False, raise_error=True)

Attach values from keys in attrs onto data.

If possible, use dtype = np.float32 when broadcasting. If not possible, use whatever dtype is inferred from the data by numpy.full().

Parameters:
  • keys (str | Iterable[str]) – Keys to broadcast

  • overwrite (bool, optional) – If True, overwrite existing values in data. By default False.

  • raise_error (bool, optional) – Raise KeyError if self.attrs does not contain some of keys.

Raises:

KeyError – Not all keys found in attrs.

broadcast_numeric_attrs(ignore_keys=None, overwrite=False)

Attach numeric values in attrs onto data.

Iterate through values in attrs and attach float and int values to data.

This method modifies object in place.

Parameters:
  • ignore_keys (str | Iterable[str], optional) – Do not broadcast selected keys. Defaults to None.

  • overwrite (bool, optional) – If True, overwrite existing values in data. By default False.

copy(**kwargs)

Return a copy of this instance.

Parameters:

**kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

Self – Copy of class

classmethod create_empty(keys, attrs=None, **attrs_kwargs)

Create instance with variables defined by keys and size 0.

If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.

Parameters:
  • keys (Iterable[str]) – Keys to include in empty VectorDataset instance.

  • attrs (dict[str, Any] | None, optional) – Attributes to attach instance.

  • **attrs_kwargs (Any) – Define attributes as keyword arguments.

Returns:

Self – Empty VectorDataset instance.

data

Vector data with labels as keys and numpy.ndarray as values

property dataframe

Shorthand property to access to_dataframe() with copy=False.

Returns:

pandas.DataFrame – Equivalent to the output from to_dataframe()

ensure_vars(vars, raise_error=True)

Ensure variables exist in column of data or attrs.

Parameters:
  • vars (str | Iterable[str]) – A single string variable name or a sequence of string variable names.

  • raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

bool – True if all variables exist. False otherwise.

Raises:

KeyError – Raises when dataset does not contain variable in vars

filter(mask, copy=True, **kwargs)

Filter data according to a boolean array mask.

Entries corresponding to mask == True are kept.

Parameters:
  • mask (npt.NDArray[np.bool_]) – Boolean array with compatible shape.

  • copy (bool, optional) – Copy data on filter. Defaults to True. See numpy best practices for insight into whether copy is appropriate.

  • **kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

Self – Containing filtered data

Raises:

TypeError – If mask is not a boolean array.

classmethod from_dict(obj, copy=True, **obj_kwargs)

Create instance from dict representation containing data and attrs.

Parameters:
  • obj (dict[str, Any]) – Dict representation of VectorDataset (e.g. to_dict())

  • copy (bool, optional) – Passed to VectorDataset constructor. Defaults to True.

  • **obj_kwargs (Any) – Additional properties passed as keyword arguments.

Returns:

Self – VectorDataset instance.

See also

to_dict()

generate_splits(n_splits, copy=True)

Split instance into n_split sub-vectors.

Parameters:
Returns:

Generator[Self, None, None] – Generator of split vectors.

get(key, default_value=None)

Get values from data with default_value if key not in data.

Parameters:
  • key (str) – Key to get from data

  • default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values at data[key] or default_value

get_data_or_attr(key, default=<object object>)

Get value from data or attrs.

This method first checks if key is in data and returns the value if so. If key is not in data, then this method checks if key is in attrs and returns the value if so. If key is not in data or attrs, then the default value is returned if provided. Otherwise a KeyError is raised.

Parameters:
  • key (str) – Key to get from data or attrs

  • default (Any, optional) – Default value to return if key is not in data or attrs.

Returns:

Any – Value at data[key] or attrs[key]

Raises:

KeyError – If key is not in data or attrs and default is not provided.

Examples

>>> vector = VectorDataset({"a": [1, 2, 3]}, attrs={"b": 4})
>>> vector.get_data_or_attr("a")
array([1, 2, 3])
>>> vector.get_data_or_attr("b")
4
>>> vector.get_data_or_attr("c")
Traceback (most recent call last):
...
KeyError: "Key 'c' not found in data or attrs."
>>> vector.get_data_or_attr("c", default=5)
5
property hash

Generate a unique hash for this class instance.

Returns:

str – Unique hash for flight instance (sha1)

select(keys, copy=True)

Return new class instance only containing specified keys.

Parameters:
  • keys (Iterable[str]) – An iterable of keys to filter by.

  • copy (bool, optional) – Copy data on selection. Defaults to True.

Returns:

VectorDataset – VectorDataset containing only data associated to keys. Note that this method always returns a VectorDataset, even if the calling class is a proper subclass of VectorDataset.

setdefault(key, default=None)

Shortcut to VectorDataDict.setdefault().

Parameters:
  • key (str) – Key in data dict.

  • default (npt.ArrayLike, optional) – Values to use as default, if key is not defined

Returns:

numpy.ndarray – Values at key

property shape

Shape of each array in data.

Returns:

tuple[int] – Shape of each array in data.

property size

Length of each array in data.

Returns:

int – Length of each array in data.

sort(by)

Sort data by key(s).

This method always creates a copy of the data by calling pandas.DataFrame.sort_values().

Parameters:

by (str | list[str]) – Key or list of keys to sort by.

Returns:

Self – Instance with sorted data.

classmethod sum(vectors, infer_attrs=True, fill_value=None)

Sum a list of VectorDataset instances.

Parameters:
  • vectors (Sequence[VectorDataset]) – List of VectorDataset instances to concatenate.

  • infer_attrs (bool, optional) – If True, infer attributes from the first element in the sequence.

  • fill_value (float, optional) – Fill value to use when concatenating arrays. By default None, which raises an error if incompatible keys are found.

Returns:

VectorDataset – Sum of all instances in vectors.

Raises:

KeyError – If incompatible data keys are found among vectors.

Examples

>>> from pycontrails import VectorDataset
>>> v1 = VectorDataset({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> v2 = VectorDataset({"a": [7, 8, 9], "b": [10, 11, 12]})
>>> v3 = VectorDataset({"a": [13, 14, 15], "b": [16, 17, 18]})
>>> v = VectorDataset.sum([v1, v2, v3])
>>> v.dataframe
    a   b
0   1   4
1   2   5
2   3   6
3   7  10
4   8  11
5   9  12
6  13  16
7  14  17
8  15  18
to_dataframe(copy=True)

Create pd.DataFrame in which each key-value pair in data is a column.

DataFrame does not copy data by default. Use the copy parameter to copy data values on creation.

Parameters:

copy (bool, optional) – Copy data on DataFrame creation.

Returns:

pandas.DataFrame – DataFrame holding key-values as columns.

to_dict()

Create dictionary with data and attrs.

If geo-spatial coordinates (e.g. "latitude", "longitude", "altitude") are present, round to a reasonable precision. If a "time" variable is present, round to unix seconds. When the instance is a GeoVectorDataset, disregard any "altitude" or "level" coordinate and only include "altitude_ft" in the output.

Returns:

dict[str, Any] – Dictionary with data and attrs.

See also

from_dict()

Examples

>>> import pprint
>>> from pycontrails import Flight
>>> fl = Flight(
...     longitude=[-100, -110],
...     latitude=[40, 50],
...     level=[200, 200],
...     time=[np.datetime64("2020-01-01T09"), np.datetime64("2020-01-01T09:30")],
...     aircraft_type="B737",
... )
>>> fl = fl.resample_and_fill("5min")
>>> pprint.pprint(fl.to_dict())
{'aircraft_type': 'B737',
 'altitude_ft': [38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0],
 'latitude': [40.0, 41.724, 43.428, 45.111, 46.769, 48.399, 50.0],
 'longitude': [-100.0,
               -101.441,
               -102.959,
               -104.563,
               -106.267,
               -108.076,
               -110.0],
 'time': [1577869200,
          1577869500,
          1577869800,
          1577870100,
          1577870400,
          1577870700,
          1577871000]}
update(other=None, **kwargs)

Update values in data dict without warning if overwriting.

Parameters:
  • other (dict[str, npt.ArrayLike] | None, optional) – Fields to update as dict

  • **kwargs (npt.ArrayLike) – Fields to update as kwargs