pycontrails.VectorDataset¶
- class pycontrails.VectorDataset(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶
Bases:
object
Base class to hold 1D arrays of consistent size.
- Parameters:
data (
dict[str
,npt.ArrayLike] | pd.DataFrame | VectorDataDict | VectorDataset | None
, optional) – Initial data, by default Noneattrs (
dict[str
,Any] | AttrDict
, optional) – Dictionary of attributes, by default Nonecopy (
bool
, optional) – Copy data on class creation, by default True**attrs_kwargs (
Any
) – Additional attributes passed as keyword arguments
- Raises:
ValueError – If “time” variable cannot be converted to numpy array.
- __init__(data=None, *, attrs=None, copy=True, **attrs_kwargs)¶
Methods
__init__
([data, attrs, copy])broadcast_attrs
(keys[, overwrite, raise_error])broadcast_numeric_attrs
([ignore_keys, overwrite])copy
(**kwargs)Return a copy of this instance.
create_empty
(keys[, attrs])Create instance with variables defined by
keys
and size 0.ensure_vars
(vars[, raise_error])filter
(mask[, copy])Filter
data
according to a boolean arraymask
.from_dict
(obj[, copy])Create instance from dict representation containing data and attrs.
generate_splits
(n_splits[, copy])Split instance into
n_split
sub-vectors.get
(key[, default_value])get_data_or_attr
(key[, default])select
(keys[, copy])Return new class instance only containing specified keys.
setdefault
(key[, default])Shortcut to
VectorDataDict.setdefault()
.sort
(by)Sort data by key(s).
sum
(vectors[, infer_attrs, fill_value])Sum a list of
VectorDataset
instances.to_dataframe
([copy])Create
pd.DataFrame
in which each key-value pair indata
is a column.to_dict
()update
([other])Update values in
data
dict without warning if overwriting.Attributes
Vector data with labels as keys and
numpy.ndarray
as valuesGeneric dataset attributes
Shorthand property to access
to_dataframe()
withcopy=False
.Generate a unique hash for this class instance.
Shape of each array in
data
.Length of each array in
data
.- attrs¶
Generic dataset attributes
- broadcast_attrs(keys, overwrite=False, raise_error=True)¶
Attach values from
keys
inattrs
ontodata
.If possible, use
dtype = np.float32
when broadcasting. If not possible, use whateverdtype
is inferred from the data bynumpy.full()
.
- broadcast_numeric_attrs(ignore_keys=None, overwrite=False)¶
Attach numeric values in
attrs
ontodata
.Iterate through values in
attrs
and attachfloat
andint
values todata
.This method modifies object in place.
- copy(**kwargs)¶
Return a copy of this instance.
- Parameters:
**kwargs (
Any
) – Additional keyword arguments passed into the constructor of the returned class.- Returns:
Self
– Copy of class
- classmethod create_empty(keys, attrs=None, **attrs_kwargs)¶
Create instance with variables defined by
keys
and size 0.If instance requires additional variables to be defined, these keys will automatically be attached to returned instance.
- Parameters:
keys (
Iterable[str]
) – Keys to include in empty VectorDataset instance.attrs (
dict[str
,Any] | None
, optional) – Attributes to attach instance.**attrs_kwargs (
Any
) – Define attributes as keyword arguments.
- Returns:
Self
– Empty VectorDataset instance.
- data¶
Vector data with labels as keys and
numpy.ndarray
as values
- property dataframe¶
Shorthand property to access
to_dataframe()
withcopy=False
.- Returns:
pandas.DataFrame
– Equivalent to the output fromto_dataframe()
- ensure_vars(vars, raise_error=True)¶
Ensure variables exist in column of
data
orattrs
.- Parameters:
vars (
str | Iterable[str]
) – A single string variable name or a sequence of string variable names.raise_error (
bool
, optional) – Raise KeyError if data does not contain variables. Defaults to True.
- Returns:
bool
– True if all variables exist. False otherwise.- Raises:
KeyError – Raises when dataset does not contain variable in
vars
- filter(mask, copy=True, **kwargs)¶
Filter
data
according to a boolean arraymask
.Entries corresponding to
mask == True
are kept.- Parameters:
mask (
npt.NDArray[np.bool_]
) – Boolean array with compatible shape.copy (
bool
, optional) – Copy data on filter. Defaults to True. See numpy best practices for insight into whether copy is appropriate.**kwargs (
Any
) – Additional keyword arguments passed into the constructor of the returned class.
- Returns:
Self
– Containing filtered data- Raises:
TypeError – If
mask
is not a boolean array.
- classmethod from_dict(obj, copy=True, **obj_kwargs)¶
Create instance from dict representation containing data and attrs.
- Parameters:
obj (
dict[str
,Any]
) – Dict representation of VectorDataset (e.g.to_dict()
)copy (
bool
, optional) – Passed toVectorDataset
constructor. Defaults to True.**obj_kwargs (
Any
) – Additional properties passed as keyword arguments.
- Returns:
Self
– VectorDataset instance.
See also
- generate_splits(n_splits, copy=True)¶
Split instance into
n_split
sub-vectors.- Parameters:
n_splits (
int
) – Number of splits.copy (
bool
, optional) – Passed intofilter()
. Defaults to True. Recommend to keep as True based on numpy best practices.
- Returns:
Generator[Self
,None
,None]
– Generator of split vectors.
See also
- get(key, default_value=None)¶
- get_data_or_attr(key, default=<object object>)¶
-
This method first checks if
key
is indata
and returns the value if so. Ifkey
is not indata
, then this method checks ifkey
is inattrs
and returns the value if so. Ifkey
is not indata
orattrs
, then thedefault
value is returned if provided. Otherwise aKeyError
is raised.- Parameters:
- Returns:
Any
– Value atdata[key]
orattrs[key]
- Raises:
KeyError – If
key
is not indata
orattrs
anddefault
is not provided.
Examples
>>> vector = VectorDataset({"a": [1, 2, 3]}, attrs={"b": 4}) >>> vector.get_data_or_attr("a") array([1, 2, 3])
>>> vector.get_data_or_attr("b") 4
>>> vector.get_data_or_attr("c") Traceback (most recent call last): ... KeyError: "Key 'c' not found in data or attrs."
>>> vector.get_data_or_attr("c", default=5) 5
- property hash¶
Generate a unique hash for this class instance.
- Returns:
str
– Unique hash for flight instance (sha1)
- select(keys, copy=True)¶
Return new class instance only containing specified keys.
- Parameters:
keys (
Iterable[str]
) – An iterable of keys to filter by.copy (
bool
, optional) – Copy data on selection. Defaults to True.
- Returns:
VectorDataset
– VectorDataset containing only data associated tokeys
. Note that this method always returns aVectorDataset
, even if the calling class is a proper subclass ofVectorDataset
.
- setdefault(key, default=None)¶
Shortcut to
VectorDataDict.setdefault()
.- Parameters:
- Returns:
numpy.ndarray
– Values atkey
- sort(by)¶
Sort data by key(s).
This method always creates a copy of the data by calling
pandas.DataFrame.sort_values()
.- Parameters:
by (
str | list[str]
) – Key or list of keys to sort by.- Returns:
Self
– Instance with sorted data.
- classmethod sum(vectors, infer_attrs=True, fill_value=None)¶
Sum a list of
VectorDataset
instances.- Parameters:
vectors (
Sequence[VectorDataset]
) – List ofVectorDataset
instances to concatenate.infer_attrs (
bool
, optional) – If True, infer attributes from the first element in the sequence.fill_value (
float
, optional) – Fill value to use when concatenating arrays. By default None, which raises an error if incompatible keys are found.
- Returns:
VectorDataset
– Sum of all instances invectors
.- Raises:
KeyError – If incompatible
data
keys are found amongvectors
.
Examples
>>> from pycontrails import VectorDataset >>> v1 = VectorDataset({"a": [1, 2, 3], "b": [4, 5, 6]}) >>> v2 = VectorDataset({"a": [7, 8, 9], "b": [10, 11, 12]}) >>> v3 = VectorDataset({"a": [13, 14, 15], "b": [16, 17, 18]}) >>> v = VectorDataset.sum([v1, v2, v3]) >>> v.dataframe a b 0 1 4 1 2 5 2 3 6 3 7 10 4 8 11 5 9 12 6 13 16 7 14 17 8 15 18
- to_dataframe(copy=True)¶
Create
pd.DataFrame
in which each key-value pair indata
is a column.DataFrame does not copy data by default. Use the
copy
parameter to copy data values on creation.- Parameters:
copy (
bool
, optional) – Copy data on DataFrame creation.- Returns:
pandas.DataFrame
– DataFrame holding key-values as columns.
- to_dict()¶
Create dictionary with
data
andattrs
.If geo-spatial coordinates (e.g.
"latitude"
,"longitude"
,"altitude"
) are present, round to a reasonable precision. If a"time"
variable is present, round to unix seconds. When the instance is aGeoVectorDataset
, disregard any"altitude"
or"level"
coordinate and only include"altitude_ft"
in the output.See also
Examples
>>> import pprint >>> from pycontrails import Flight >>> fl = Flight( ... longitude=[-100, -110], ... latitude=[40, 50], ... level=[200, 200], ... time=[np.datetime64("2020-01-01T09"), np.datetime64("2020-01-01T09:30")], ... aircraft_type="B737", ... ) >>> fl = fl.resample_and_fill("5min") >>> pprint.pprint(fl.to_dict()) {'aircraft_type': 'B737', 'altitude_ft': [38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0, 38661.0], 'latitude': [40.0, 41.724, 43.428, 45.111, 46.769, 48.399, 50.0], 'longitude': [-100.0, -101.441, -102.959, -104.563, -106.267, -108.076, -110.0], 'time': [1577869200, 1577869500, 1577869800, 1577870100, 1577870400, 1577870700, 1577871000]}