pycontrails.VectorDataset¶
- class pycontrails.VectorDataset(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶
Bases:
objectBase class to hold 1D arrays of consistent size.
- Parameters:
data (
dict[str,npt.ArrayLike] | pd.DataFrame | VectorDataset | None, optional) – Initial data, by default None. A shallow copy is always made. Use thecopyparameter to copy the underlying array data.attrs (
dict[str,Any] | None, optional) – Dictionary of attributes, by default None. A shallow copy is always made.copy (
bool, optional) – Copy individual arrays on instantiation, by default True.**attrs_kwargs (
Any) – Additional attributes passed as keyword arguments.
- Raises:
ValueError – If “time” variable cannot be converted to numpy array.
- __init__(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶
Methods
__init__([data, attrs, copy])broadcast_attrs(keys[, overwrite, raise_error])broadcast_numeric_attrs([ignore_keys, overwrite])copy(**kwargs)Return a copy of this instance.
create_empty(keys[, attrs])Create instance with variables defined by
keysand size 0.ensure_vars(vars[, raise_error])filter(mask[, copy])Filter
dataaccording to a boolean arraymask.from_dict(obj[, copy])Create instance from dict representation containing data and attrs.
generate_splits(n_splits[, copy])Split instance into
n_splitsub-vectors.get(key[, default_value])get_constant(key[, default])get_data_or_attr(key[, default])select(keys[, copy])Return new class instance only containing specified keys.
setdefault(key[, default])Shortcut to
VectorDataDict.setdefault().sort(by)Sort data by key(s).
sum(vectors[, infer_attrs, fill_value])Sum a list of
VectorDatasetinstances.to_dataframe([copy])Create
pd.DataFramein which each key-value pair indatais a column.to_dict()update([other])Update values in
datadict without warning if overwriting.Attributes
Generic dataset attributes
Vector data with labels as keys and
numpy.ndarrayas valuesShorthand property to access
to_dataframe()withcopy=False.Generate a unique hash for this class instance.
Shape of each array in
data.Length of each array in
data.- attrs¶
Generic dataset attributes
- broadcast_attrs(keys, overwrite=False, raise_error=True)¶
Attach values from
keysinattrsontodata.If possible, use
dtype = np.float32when broadcasting. If not possible, use whateverdtypeis inferred from the data bynumpy.full().
- broadcast_numeric_attrs(ignore_keys=None, overwrite=False)¶
Attach numeric values in
attrsontodata.Iterate through values in
attrsand attachfloatandintvalues todata.This method modifies object in place.
- copy(**kwargs)¶
Return a copy of this instance.
- Parameters:
**kwargs (
Any) – Additional keyword arguments passed into the constructor of the returned class.- Returns:
Self– Copy of class
- classmethod create_empty(keys, attrs=None, **kwargs)¶
Create instance with variables defined by
keysand size 0.If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.
- Parameters:
keys (
Iterable[str]) – Keys to include in empty VectorDataset instance.attrs (
dict[str,Any] | None, optional) – Attributes to attach instance.**kwargs (
Any) – Additional keyword arguments passed into the constructor of the returned class.
- Returns:
Self– Empty VectorDataset instance.
- data¶
Vector data with labels as keys and
numpy.ndarrayas values
- property dataframe¶
Shorthand property to access
to_dataframe()withcopy=False.- Returns:
pandas.DataFrame– Equivalent to the output fromto_dataframe()
- ensure_vars(vars, raise_error=True)¶
Ensure variables exist in column of
dataorattrs.- Parameters:
vars (
str | Iterable[str]) – A single string variable name or a sequence of string variable names.raise_error (
bool, optional) – Raise KeyError if data does not contain variables. Defaults to True.
- Returns:
bool– True if all variables exist. False otherwise.- Raises:
KeyError – Raises when dataset does not contain variable in
vars
- filter(mask, copy=True, **kwargs)¶
Filter
dataaccording to a boolean arraymask.Entries corresponding to
mask == Trueare kept.- Parameters:
mask (
npt.NDArray[np.bool_]) – Boolean array with compatible shape.copy (
bool, optional) – Copy data on filter. Defaults to True. See numpy best practices for insight into whether copy is appropriate.**kwargs (
Any) – Additional keyword arguments passed into the constructor of the returned class.
- Returns:
Self– Containing filtered data- Raises:
TypeError – If
maskis not a boolean array.
- classmethod from_dict(obj, copy=True, **obj_kwargs)¶
Create instance from dict representation containing data and attrs.
- Parameters:
obj (
dict[str,Any]) – Dict representation of VectorDataset (e.g.to_dict())copy (
bool, optional) – Passed toVectorDatasetconstructor. Defaults to True.**obj_kwargs (
Any) – Additional properties passed as keyword arguments.
- Returns:
Self– VectorDataset instance.
See also
- generate_splits(n_splits, copy=True)¶
Split instance into
n_splitsub-vectors.- Parameters:
n_splits (
int) – Number of splits.copy (
bool, optional) – Passed intofilter(). Defaults to True. Recommend to keep as True based on numpy best practices.
- Yields:
Self– Generator of split vectors.
See also
- get(key, default_value=None)¶
- get_constant(key, default=<object object>)¶
Get a constant value from
attrsordata.If
keyis found inattrs, the value is returned.If
keyis found indata, the common value is returned if all values are equal.If
keyis not found inattrsordataand adefaultis provided, thedefaultis returned.Otherwise, a KeyError is raised.
- Parameters:
- Returns:
Any– The constant value forkey.- Raises:
KeyError – If
keyis not found inattrsor the values indataare not equal anddefaultis not provided.
Examples
>>> vector = VectorDataset({"a": [1, 1, 1], "b": [2, 2, 3]}) >>> vector.get_constant("a") np.int64(1) >>> vector.get_constant("b") Traceback (most recent call last): ... KeyError: "A constant key 'b' not found in attrs or data" >>> vector.get_constant("b", 3) 3
See also
- get_data_or_attr(key, default=<object object>)¶
-
This method first checks if
keyis indataand returns the value if so. Ifkeyis not indata, then this method checks ifkeyis inattrsand returns the value if so. Ifkeyis not indataorattrs, then thedefaultvalue is returned if provided. Otherwise aKeyErroris raised.- Parameters:
- Returns:
Any– Value atdata[key]orattrs[key]- Raises:
KeyError – If
keyis not indataorattrsanddefaultis not provided.
Examples
>>> vector = VectorDataset({"a": [1, 2, 3]}, attrs={"b": 4}) >>> vector.get_data_or_attr("a") array([1, 2, 3])
>>> vector.get_data_or_attr("b") 4
>>> vector.get_data_or_attr("c") Traceback (most recent call last): ... KeyError: "Key 'c' not found in data or attrs."
>>> vector.get_data_or_attr("c", default=5) 5
See also
- property hash¶
Generate a unique hash for this class instance.
- Returns:
str– Unique hash for flight instance (sha1)
- select(keys, copy=True)¶
Return new class instance only containing specified keys.
- Parameters:
keys (
Iterable[str]) – An iterable of keys to filter by.copy (
bool, optional) – Copy data on selection. Defaults to True.
- Returns:
VectorDataset– VectorDataset containing only data associated tokeys. Note that this method always returns aVectorDataset, even if the calling class is a proper subclass ofVectorDataset.
- setdefault(key, default=None)¶
Shortcut to
VectorDataDict.setdefault().- Parameters:
- Returns:
numpy.ndarray– Values atkey
- sort(by)¶
Sort data by key(s).
This method always creates a copy of the data by calling
pandas.DataFrame.sort_values().- Parameters:
by (
str | list[str]) – Key or list of keys to sort by.- Returns:
Self– Instance with sorted data.
- classmethod sum(vectors, infer_attrs=True, fill_value=None)¶
Sum a list of
VectorDatasetinstances.- Parameters:
vectors (
Sequence[VectorDataset]) – List ofVectorDatasetinstances to concatenate.infer_attrs (
bool, optional) – If True, infer attributes from the first element in the sequence.fill_value (
float | None, optional) – Fill value to use when concatenating arrays. By default None, which raises an error if incompatible keys are found.
- Returns:
Self– Sum of all instances invectors.- Raises:
KeyError – If incompatible
datakeys are found amongvectors.
Examples
>>> from pycontrails import VectorDataset >>> v1 = VectorDataset({"a": [1, 2, 3], "b": [4, 5, 6]}) >>> v2 = VectorDataset({"a": [7, 8, 9], "b": [10, 11, 12]}) >>> v3 = VectorDataset({"a": [13, 14, 15], "b": [16, 17, 18]}) >>> v = VectorDataset.sum([v1, v2, v3]) >>> v.dataframe a b 0 1 4 1 2 5 2 3 6 3 7 10 4 8 11 5 9 12 6 13 16 7 14 17 8 15 18
- to_dataframe(copy=True)¶
Create
pd.DataFramein which each key-value pair indatais a column.DataFrame does not copy data by default. Use the
copyparameter to copy data values on creation.- Parameters:
copy (
bool, optional) – Copy data on DataFrame creation.- Returns:
pandas.DataFrame– DataFrame holding key-values as columns.
- to_dict()¶
Create dictionary with
dataandattrs.If geo-spatial coordinates (e.g.
"latitude","longitude","altitude") are present, round to a reasonable precision. If a"time"variable is present, round to unix seconds. When the instance is aGeoVectorDataset, disregard any"altitude"or"level"coordinate and only include"altitude_ft"in the output.See also
Examples
>>> import pprint >>> from pycontrails import Flight >>> fl = Flight( ... longitude=[-100, -110], ... latitude=[40, 50], ... level=[200, 200], ... time=[np.datetime64("2020-01-01T09"), np.datetime64("2020-01-01T09:30")], ... aircraft_type="B737", ... ) >>> fl = fl.resample_and_fill("5min") >>> pprint.pprint(fl.to_dict()) {'aircraft_type': 'B737', 'altitude_ft': [38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0], 'latitude': [40.0, 41.724, 43.428, 45.111, 46.769, 48.399, 50.0], 'longitude': [-100.0, -101.441, -102.959, -104.563, -106.267, -108.076, -110.0], 'time': [1577869200, 1577869500, 1577869800, 1577870100, 1577870400, 1577870700, 1577871000]}