pycontrails.VectorDataset¶

class pycontrails.VectorDataset(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶

Bases: object

Base class to hold 1D arrays of consistent size.

Parameters:

data (dict[str, npt.ArrayLike] | pd.DataFrame | VectorDataset | None, optional) – Initial data, by default None. A shallow copy is always made. Use the copy parameter to copy the underlying array data.
attrs (dict[str, Any] | None, optional) – Dictionary of attributes, by default None. A shallow copy is always made.
copy (bool, optional) – Copy individual arrays on instantiation, by default True.
**attrs_kwargs (Any) – Additional attributes passed as keyword arguments.

Raises:

ValueError – If “time” variable cannot be converted to numpy array.

__init__(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶

Methods

`__init__`([data, attrs, copy])
`broadcast_attrs`(keys[, overwrite, raise_error])	Attach values from `keys` in `attrs` onto `data`.
`broadcast_numeric_attrs`([ignore_keys, overwrite])	Attach numeric values in `attrs` onto `data`.
`copy`(**kwargs)	Return a copy of this instance.
`create_empty`(keys[, attrs])	Create instance with variables defined by `keys` and size 0.
`ensure_vars`(vars[, raise_error])	Ensure variables exist in column of `data` or `attrs`.
`filter`(mask[, copy])	Filter `data` according to a boolean array `mask`.
`from_dict`(obj[, copy])	Create instance from dict representation containing data and attrs.
`generate_splits`(n_splits[, copy])	Split instance into `n_split` sub-vectors.
`get`(key[, default_value])	Get values from `data` with `default_value` if `key` not in `data`.
`get_constant`(key[, default])	Get a constant value from `attrs` or `data`.
`get_data_or_attr`(key[, default])	Get value from `data` or `attrs`.
`select`(keys[, copy])	Return new class instance only containing specified keys.
`setdefault`(key[, default])	Shortcut to `VectorDataDict.setdefault()`.
`sort`(by)	Sort data by key(s).
`sum`(vectors[, infer_attrs, fill_value])	Sum a list of `VectorDataset` instances.
`to_dataframe`([copy])	Create `pd.DataFrame` in which each key-value pair in `data` is a column.
`to_dict`()	Create dictionary with `data` and `attrs`.
`update`([other])	Update values in `data` dict without warning if overwriting.

Attributes

`attrs`	Generic dataset attributes
`data`	Vector data with labels as keys and `numpy.ndarray` as values
`dataframe`	Shorthand property to access `to_dataframe()` with `copy=False`.
`hash`	Generate a unique hash for this class instance.
`shape`	Shape of each array in `data`.
`size`	Length of each array in `data`.

attrs¶: Generic dataset attributes

broadcast_attrs(keys, overwrite=False, raise_error=True)¶

Attach values from keys in attrs onto data.

If possible, use dtype = np.float32 when broadcasting. If not possible, use whatever dtype is inferred from the data by numpy.full().

Parameters:

keys (str | Iterable[str]) – Keys to broadcast
overwrite (bool, optional) – If True, overwrite existing values in data. By default False.
raise_error (bool, optional) – Raise KeyError if self.attrs does not contain some of keys.

Raises:

KeyError – Not all keys found in attrs.

broadcast_numeric_attrs(ignore_keys=None, overwrite=False)¶

Attach numeric values in attrs onto data.

Iterate through values in attrs and attach float and int values to data.

This method modifies object in place.

Parameters:

ignore_keys (str | Iterable[str] | None, optional) – Do not broadcast selected keys. Defaults to None.
overwrite (bool, optional) – If True, overwrite existing values in data. By default False.

copy(**kwargs)¶

Return a copy of this instance.

Parameters:: **kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.
Returns:: Self – Copy of class

classmethod create_empty(keys, attrs=None, **kwargs)¶

Create instance with variables defined by keys and size 0.

If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.

Parameters:

keys (Iterable[str]) – Keys to include in empty VectorDataset instance.
attrs (dict[str, Any] | None, optional) – Attributes to attach instance.
**kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

Self – Empty VectorDataset instance.

data¶: Vector data with labels as keys and numpy.ndarray as values

property dataframe¶

Shorthand property to access to_dataframe() with copy=False.

Returns:: pandas.DataFrame – Equivalent to the output from to_dataframe()

ensure_vars(vars, raise_error=True)¶

Ensure variables exist in column of data or attrs.

Parameters:

vars (str | Iterable[str]) – A single string variable name or a sequence of string variable names.
raise_error (bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.

Returns:

bool – True if all variables exist. False otherwise.

Raises:

KeyError – Raises when dataset does not contain variable in vars

filter(mask, copy=True, **kwargs)¶

Filter data according to a boolean array mask.

Entries corresponding to mask == True are kept.

Parameters:

mask (npt.NDArray[np.bool_]) – Boolean array with compatible shape.
copy (bool, optional) – Copy data on filter. Defaults to True. See numpy best practices for insight into whether copy is appropriate.
**kwargs (Any) – Additional keyword arguments passed into the constructor of the returned class.

Returns:

Self – Containing filtered data

Raises:

TypeError – If mask is not a boolean array.

classmethod from_dict(obj, copy=True, **obj_kwargs)¶

Create instance from dict representation containing data and attrs.

Parameters:

obj (dict[str, Any]) – Dict representation of VectorDataset (e.g. to_dict())
copy (bool, optional) – Passed to VectorDataset constructor. Defaults to True.
**obj_kwargs (Any) – Additional properties passed as keyword arguments.

Returns:

Self – VectorDataset instance.

See also

to_dict()

generate_splits(n_splits, copy=True)¶

Split instance into n_split sub-vectors.

Parameters:

n_splits (int) – Number of splits.
copy (bool, optional) – Passed into filter(). Defaults to True. Recommend to keep as True based on numpy best practices.

Yields:

Self – Generator of split vectors.

See also

numpy.array_split()

get(key, default_value=None)¶

Get values from data with default_value if key not in data.

Parameters:

key (str) – Key to get from data
default_value (Any, optional) – Return default_value if key not in data, by default None

Returns:

Any – Values at data[key] or default_value

get_constant(key, default=<object object>)¶

Get a constant value from attrs or data.

If key is found in attrs, the value is returned.
If key is found in data, the common value is returned if all values are equal.
If key is not found in attrs or data and a default is provided, the default is returned.
Otherwise, a KeyError is raised.

Parameters:

key (str) – Key to look for.
default (Any, optional) – Default value to return if key is not found in attrs or data.

Returns:

Any – The constant value for key.

Raises:

KeyError – If key is not found in attrs or the values in data are not equal and default is not provided.

Examples

>>> vector = VectorDataset({"a": [1, 1, 1], "b": [2, 2, 3]})
>>> vector.get_constant("a")
np.int64(1)
>>> vector.get_constant("b")
Traceback (most recent call last):
...
KeyError: "A constant key 'b' not found in attrs or data"
>>> vector.get_constant("b", 3)
3

get_data_or_attr(key, default=<object object>)¶

Get value from data or attrs.

This method first checks if key is in data and returns the value if so. If key is not in data, then this method checks if key is in attrs and returns the value if so. If key is not in data or attrs, then the default value is returned if provided. Otherwise a KeyError is raised.

Parameters:

key (str) – Key to get from data or attrs
default (Any, optional) – Default value to return if key is not in data or attrs.

Returns:

Any – Value at data[key] or attrs[key]

Raises:

KeyError – If key is not in data or attrs and default is not provided.

Examples

>>> vector = VectorDataset({"a": [1, 2, 3]}, attrs={"b": 4})
>>> vector.get_data_or_attr("a")
array([1, 2, 3])

>>> vector.get_data_or_attr("b")
4

>>> vector.get_data_or_attr("c")
Traceback (most recent call last):
...
KeyError: "Key 'c' not found in data or attrs."

>>> vector.get_data_or_attr("c", default=5)
5

See also

get_constant

property hash¶

Generate a unique hash for this class instance.

Returns:: str – Unique hash for flight instance (sha1)

select(keys, copy=True)¶

Return new class instance only containing specified keys.

Parameters:

keys (Iterable[str]) – An iterable of keys to filter by.
copy (bool, optional) – Copy data on selection. Defaults to True.

Returns:

VectorDataset – VectorDataset containing only data associated to keys. Note that this method always returns a VectorDataset, even if the calling class is a proper subclass of VectorDataset.

setdefault(key, default=None)¶

Shortcut to VectorDataDict.setdefault().

Parameters:

key (str) – Key in data dict.
default (npt.ArrayLike, optional) – Values to use as default, if key is not defined

Returns:

numpy.ndarray – Values at key

property shape¶

Shape of each array in data.

Returns:: tuple[int] – Shape of each array in data.

property size¶

Length of each array in data.

Returns:: int – Length of each array in data.

sort(by)¶

Sort data by key(s).

This method always creates a copy of the data by calling pandas.DataFrame.sort_values().

Parameters:: by (str | list[str]) – Key or list of keys to sort by.
Returns:: Self – Instance with sorted data.

classmethod sum(vectors, infer_attrs=True, fill_value=None)¶

Sum a list of VectorDataset instances.

Parameters:

vectors (Sequence[VectorDataset]) – List of VectorDataset instances to concatenate.
infer_attrs (bool, optional) – If True, infer attributes from the first element in the sequence.
fill_value (float | None, optional) – Fill value to use when concatenating arrays. By default None, which raises an error if incompatible keys are found.

Returns:

Self – Sum of all instances in vectors.

Raises:

KeyError – If incompatible data keys are found among vectors.

Examples

>>> from pycontrails import VectorDataset
>>> v1 = VectorDataset({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> v2 = VectorDataset({"a": [7, 8, 9], "b": [10, 11, 12]})
>>> v3 = VectorDataset({"a": [13, 14, 15], "b": [16, 17, 18]})
>>> v = VectorDataset.sum([v1, v2, v3])
>>> v.dataframe
    a   b
0   1   4
1   2   5
2   3   6
3   7  10
4   8  11
5   9  12
6  13  16
7  14  17
8  15  18

to_dataframe(copy=True)¶

Create pd.DataFrame in which each key-value pair in data is a column.

DataFrame does not copy data by default. Use the copy parameter to copy data values on creation.

Parameters:: copy (bool, optional) – Copy data on DataFrame creation.
Returns:: pandas.DataFrame – DataFrame holding key-values as columns.

to_dict()¶

Create dictionary with data and attrs.

If geo-spatial coordinates (e.g. "latitude", "longitude", "altitude") are present, round to a reasonable precision. If a "time" variable is present, round to unix seconds. When the instance is a GeoVectorDataset, disregard any "altitude" or "level" coordinate and only include "altitude_ft" in the output.

Returns:: dict[str, Any] – Dictionary with data and attrs.

See also

from_dict()

Examples

>>> import pprint
>>> from pycontrails import Flight
>>> fl = Flight(
...     longitude=[-100, -110],
...     latitude=[40, 50],
...     level=[200, 200],
...     time=[np.datetime64("2020-01-01T09"), np.datetime64("2020-01-01T09:30")],
...     aircraft_type="B737",
... )
>>> fl = fl.resample_and_fill("5min")
>>> pprint.pprint(fl.to_dict())
{'aircraft_type': 'B737',
 'altitude_ft': [38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0],
 'latitude': [40.0, 41.724, 43.428, 45.111, 46.769, 48.399, 50.0],
 'longitude': [-100.0,
               -101.441,
               -102.959,
               -104.563,
               -106.267,
               -108.076,
               -110.0],
 'time': [1577869200,
          1577869500,
          1577869800,
          1577870100,
          1577870400,
          1577870700,
          1577871000]}

update(other=None, **kwargs)¶

Update values in data dict without warning if overwriting.

Parameters:

other (dict[str, npt.ArrayLike] | None, optional) – Fields to update as dict
**kwargs (npt.ArrayLike) – Fields to update as kwargs