pycontrails.core.datalib#
Datalib utilities.
Module Attributes
NetCDF engine to use for parsing netcdf files |
|
Default chunking strategy when opening datasets with xarray |
|
Whether to open multi-file datasets in parallel |
|
Whether to use file locking when opening multi-file datasets |
Functions
|
Parse input grid spacing. |
|
Check input pressure levels are consistent type and ensure levels exist in ECMWF data source. |
|
Parse time input into set of time steps. |
|
Parse input variables. |
|
Round time to the nearest whole hour before input time. |
Classes
|
Abstract class for wrapping meteorology data sources. |
- pycontrails.core.datalib.DEFAULT_CHUNKS = {'time': 1}#
Default chunking strategy when opening datasets with xarray
- class pycontrails.core.datalib.MetDataSource(time, variables, pressure_levels=[-1], paths=None, grid=None, **kwargs)#
Bases:
ABC
Abstract class for wrapping meteorology data sources.
- abstract cache_dataset(dataset)#
Cache data from data source.
- Parameters:
dataset (
xarray.Dataset
) – Dataset loaded from remote API or local files. The dataset must have the same format as the original data source API or files.
- cachestore#
Cache store for intermediates while processing data source If None, cache is turned off.
- abstract create_cachepath(t)#
Return cachepath to local data file based on datetime.
- Parameters:
t (
datetime
) – Datetime of datafile- Returns:
str
– Path to cached data file
- download(**xr_kwargs)#
Confirm all data files are downloaded and available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- abstract download_dataset(times)#
Download data from data source for input times.
- Parameters:
times (
list[datetime]
) – List of datetimes to download a store in cache
- grid#
Lat / Lon grid spacing
- property hash#
Generate a unique hash for this datasource.
- Returns:
str
– Unique hash for met instance (sha1)
- is_datafile_cached(t, **xr_kwargs)#
Check datafile defined by datetime for variables and pressure levels in class.
If using a cloud cache store (i.e.
cache.GCPCacheStore
), this is where the datafile will be mirrored to a local file for access.- Parameters:
t (
datetime
) – Datetime of datafile**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: True
- Returns:
bool
– True if data file exists for datetime with all variables and pressure levels, False otherwise
- list_timesteps_cached(**xr_kwargs)#
Get a list of data files available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- list_timesteps_not_cached(**xr_kwargs)#
Get a list of data files not available locally in the
cachestore
.- Parameters:
**xr_kwargs – Passed into
xarray.open_dataset()
viais_datafile_cached()
.
- open_dataset(disk_paths, **xr_kwargs)#
Open multi-file dataset in xarray.
- Parameters:
disk_paths (
str | list[str] | pathlib.Path | list[pathlib.Path]
) – list of string paths to local files to open**xr_kwargs (
Any
) – Additional kwargs passed directly toxarray.open_mfdataset()
when opening files. By default, the following values are used if not specified:chunks: {“time”: 1}
engine: “netcdf4”
parallel: False
lock: False
- Returns:
xarray.Dataset
– Open xarray dataset
- abstract open_metdataset(dataset=None, xr_kwargs=None, **kwargs)#
Open MetDataset from data source.
This method should download / load any required datafiles and returns a MetDataset of the multi-file dataset opened by xarray.
- Parameters:
dataset (
xr.Dataset | None
, optional) – Inputxr.Dataset
loaded manually. The dataset must have the same format as the original data source API or files.xr_kwargs (
dict[str
,int] | None
, optional) – Dictionary of keyword arguments passed intoxarray.open_mfdataset()
when opening files. Examples include “chunks”, “engine”, “parallel”, etc. Ignored ifdataset
is input.**kwargs (
Any
) – Keyword arguments passed through directly intoMetDataset
constructor.
- Returns:
MetDataset
– Meteorology dataset
See also
- paths#
Path to local source files to load. Set to the paths of files cached in
cachestore
if nopaths
input is provided on init.
- property pressure_level_variables#
Parameters available from data source.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- pressure_levels#
List of pressure levels. Set to [-1] for data without level coordinate. Use
parse_pressure_levels()
to handlePressureLevelInput
.
- abstract set_metadata(ds)#
Set met source metadata on
ds.attrs
.This is called within the
open_metdataset()
method to set metadata on the returnedMetDataset
instance.- Parameters:
ds (
xr.Dataset | MetDataset
) – Dataset to set metadata on. Mutated in place.
- property single_level_variables#
Parameters available from data source.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- property supported_pressure_levels#
Pressure levels available from datasource.
- Returns:
list[int] | None
– List of integer pressure levels for class. If None, no pressure level information available for class.
- property supported_variables#
Parameters available from data source.
- Returns:
list[MetVariable] | None
– List of MetVariable available in datasource
- timesteps#
List of individual timesteps from data source derived from
time
Useparse_time()
to handleTimeInput
.
- property variable_shortnames#
Return a list of variable short names.
- Returns:
list[str]
– Lst of variable short names.
- property variable_standardnames#
Return a list of variable standard names.
- Returns:
list[str]
– Lst of variable standard names.
- variables#
Variables requested from data source Use
parse_variables()
to handleVariableInput
.
- pycontrails.core.datalib.NETCDF_ENGINE = 'netcdf4'#
NetCDF engine to use for parsing netcdf files
- pycontrails.core.datalib.OPEN_IN_PARALLEL = False#
Whether to open multi-file datasets in parallel
- pycontrails.core.datalib.OPEN_WITH_LOCK = False#
Whether to use file locking when opening multi-file datasets
- pycontrails.core.datalib.parse_grid(grid, supported)#
Parse input grid spacing.
- Parameters:
grid (
float
) – Input grid floatsupported (
Sequence[float]
) – Sequence of support grid values
- Returns:
float
– Parsed grid spacing- Raises:
ValueError – Raises ValueError when
grid
is not in supported
- pycontrails.core.datalib.parse_pressure_levels(pressure_levels, supported=None)#
Check input pressure levels are consistent type and ensure levels exist in ECMWF data source.
- Parameters:
pressure_levels (
PressureLevelInput
) – Input pressure levels for data, in hPa (mbar) Set to [-1] to represent surface level.supported (
list[int]
, optional) – List of supported pressures levels in data source
- Returns:
list[int]
– List of integer pressure levels supported by ECMWF data source- Raises:
ValueError – Raises ValueError if pressure level is not supported by ECMWF data source
- pycontrails.core.datalib.parse_timesteps(time, freq='1H')#
Parse time input into set of time steps.
If input time is length 2, this creates a range of equally spaced time points between
[start, end]
with intervalfreq
.- Parameters:
time (
TimeInput | None
) – Input datetime(s) specifying the time or time range of the data [start, end]. Either a single datetime-like or tuple of datetime-like with the first value the start of the date range and second value the end of the time range. Input values can be any type compatible withpandas.to_datetime()
.freq (
str | None
, optional) – Timestep interval in range. See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases for a list of frequency aliases. If None, returns input time as a list. Defaults to “1H”.
- Returns:
list[datetime]
– List of unique datetimes. If inputtime
is None, returns an empty list- Raises:
ValueError – Raises when the time has len > 2 or when time elements fail to be parsed with pd.to_datetime
- pycontrails.core.datalib.parse_variables(variables, supported)#
Parse input variables.
- Parameters:
variables (
VariableInput
) – Variable name, or sequence of variable names. i.e."air_temperature"
,["air_temperature, relative_humidity"]
,[130]
,[AirTemperature]
,[[EastwardWind, NorthwardWind]]
If an element is a list of MetVariable, the first MetVariable that is supported will be chosen.supported (
list[MetVariable]
) – Supported MetVariable.
- Returns:
list[MetVariable]
– List of MetVariable- Raises:
ValueError – Raises ValueError if variable is not supported
- pycontrails.core.datalib.round_hour(time, hour)#
Round time to the nearest whole hour before input time.
- Parameters:
time (
datetime
) – Input timehour (
int
) – Hour to round down time
- Returns:
datetime
– Rounded time- Raises:
ValueError – Description