pycontrails.core.cache¶
Pycontrails Caching Support.
Classes
Abstract cache storage class for storing staged and intermediate data. |
|
|
Cache that uses a folder on the local filesystem. |
|
Google Cloud Platform (Storage) Cache. |
- class pycontrails.core.cache.CacheStore¶
Bases:
ABC
Abstract cache storage class for storing staged and intermediate data.
- allow_clear¶
- cache_dir¶
- abstract exists(cache_path)¶
Check if a path in cache exists.
- Parameters:
cache_path (
str
) – Path to directory or file in cache- Returns:
bool
– True if directory or file exists
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.exists("file.nc") False
- abstract get(cache_path)¶
Get data from cache.
- abstract listdir(path='')¶
List the contents of a directory in the cache.
- Parameters:
path (
str
) – Path to the directory to list- Returns:
list[str]
– List of files in the directory
- abstract path(cache_path)¶
Return a full filepath in cache.
- Parameters:
cache_path (
str
) – string path or filepath to create in cache If parent directories do not exist, they will be created.- Returns:
str
– Full path string to subdirectory directory or object in cache directory
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.path("file.nc") 'cache/file.nc'
>>> cache.clear() # cleanup
- abstract put(data, cache_path=None)¶
Save data to cache.
- put_multiple(data_path, cache_path)¶
Put multiple files into the cache at once.
- Parameters:
- Returns:
list[str]
– Returns a list of relative paths to the stored files in the cache
- class pycontrails.core.cache.DiskCacheStore(cache_dir=None, allow_clear=False)¶
Bases:
CacheStore
Cache that uses a folder on the local filesystem.
- Parameters:
Examples
>>> from pycontrails import DiskCacheStore >>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> disk_cache.cache_dir 'cache'
>>> disk_cache.clear() # cleanup
- allow_clear¶
- cache_dir¶
- clear(cache_path='')¶
Delete all files and folders within
cache_path
.If no
cache_path
is provided, this will clear the entire cache.If
allow_clear
is set toFalse
, this method will do nothing.- Parameters:
cache_path (
str
, optional) – Path to subdirectory or file in cache- Raises:
RuntimeError – Raises a RuntimeError when
allow_clear
is set toFalse
Examples
>>> from pycontrails import DiskCacheStore >>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> # Write some data to the cache >>> disk_cache.put("README.md", "test/example.txt") 'test/example.txt'
>>> disk_cache.exists("test/example.txt") True
>>> # clear a specific path >>> disk_cache.clear("test/example.txt")
>>> # clear the whole cache >>> disk_cache.clear()
- exists(cache_path)¶
Check if a path in cache exists.
- Parameters:
cache_path (
str
) – Path to directory or file in cache- Returns:
bool
– True if directory or file exists
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.exists("file.nc") False
- get(cache_path)¶
Get data path from the local cache store.
Alias for
path()
- Parameters:
cache_path (
str
) – Cache path to retrieve- Returns:
str
– Returns the relative path in the cache to the stored file
Examples
>>> from pycontrails import DiskCacheStore >>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> >>> # returns a path >>> disk_cache.get("test/file.md") 'cache/test/file.md'
- listdir(path='')¶
List the contents of a directory in the cache.
- Parameters:
path (
str
) – Path to the directory to list- Returns:
list[str]
– List of files in the directory
- path(cache_path)¶
Return a full filepath in cache.
- Parameters:
cache_path (
str
) – string path or filepath to create in cache If parent directories do not exist, they will be created.- Returns:
str
– Full path string to subdirectory directory or object in cache directory
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.path("file.nc") 'cache/file.nc'
>>> cache.clear() # cleanup
- put(data_path, cache_path=None)¶
Save data to the local cache store.
- Parameters:
data_path (
str | pathlib.Path
) – Path to data to cache.cache_path (
str | None
, optional) – Path in cache store to save data Defaults to the same filename asdata_path
- Returns:
str
– Returns the relative path in the cache to the stored file- Raises:
FileNotFoundError – Raises if data is a string and a file is not found at the string
Examples
>>> from pycontrails import DiskCacheStore >>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> >>> # put a file directly >>> disk_cache.put("README.md", "test/file.md") 'test/file.md'
- put_multiple(data_path, cache_path)¶
Put multiple files into the cache at once.
- Parameters:
- Returns:
list[str]
– Returns a list of relative paths to the stored files in the cache
- class pycontrails.core.cache.GCPCacheStore(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)¶
Bases:
CacheStore
Google Cloud Platform (Storage) Cache.
This class downloads files from Google Cloud Storage locally to a
DiskCacheStore
initialized withcache_dir=".gcp"
to avoid re-downloading files. If the source files on GCP changes, the local mirror of the GCP DiskCacheStore must be cleared by initializing this class and runningclear_disk()
.Note by default, GCP Cache Store is read only. When a
put()
is called andread_only
is set to True, the cache will throw anRuntimeError
error. Setread_only
to False to enable writing to cache store.- Parameters:
cache_dir (
str
, optional) – Root object prefix withinbucket
Defaults toPYCONTRAILS_CACHE_DIR
environment variable, or the root of the bucket. The full GCP URI (ie, “gs://<MY_BUCKET>/<PREFIX>”) can be used here.project (str , *optional*) – GCP Project. Defaults to the current active project set in the google-cloud-sdk environment
bucket (
str
, optional) – GCP Bucket to use for cache. Defaults toPYCONTRAILS_CACHE_BUCKET
environment variable.read_only (
bool
, optional) – Only enable reading from cache. Defaults toTrue
.allow_clear (
bool
, optional) – Allow this cache to be cleared usingclear()
. Defaults toFalse
.disk_cache (
DiskCacheStore
, optional) – Specify a custom local disk cache store to mirror files. Defaults toDiskCacheStore(cache_dir="{user_cache_dir}/.gcp/{bucket}")
show_progress (
bool
, optional) – Show progress bar on cacheput()
. Defaults to Falsechunk_size (
int
, optional) – Chunk size for uploads and downloads with progress. Set a larger size to see more granular progress, and set a smaller size for more optimal download speed. Chunk size must be a multiple of 262144 (ie, 10 * 262144). Default value is 8 * 262144, which will throttle fast download speeds.
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.cache_dir 'cache/' >>> cache.bucket 'contrails-301217-unit-test'
- allow_clear¶
- bucket¶
- cache_dir¶
- chunk_size¶
- clear_disk(cache_path='')¶
Clear the local disk cache mirror of the GCP Cache Store.
- Parameters:
cache_path (
str
, optional) – Path in mirrored cache store. Passed into_disk_clear.clear()
. By default, this method will clear the entire mirrored cache store.
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.clear_disk()
- property client¶
Handle to Google Cloud Storage client.
- Returns:
google.cloud.storage.Client
– Handle to Google Cloud Storage client
- exists(cache_path)¶
Check if a path in cache exists.
- Parameters:
cache_path (
str
) – Path to directory or file in cache- Returns:
bool
– True if directory or file exists
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.exists("file.nc") False
- get(cache_path)¶
Get data from the local cache store.
- Parameters:
cache_path (
str
) – Path in cache store to get data- Returns:
str
– Returns path to downloaded local file- Raises:
ValueError – Raises value error is
cache_path
refers to a directory
Examples
>>> import pathlib >>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... read_only=False, ... )
>>> cache.put("README.md", "example/file.md") 'example/file.md'
>>> # returns a full path to local copy of the file >>> path = cache.get("example/file.md") >>> pathlib.Path(path).is_file() True
>>> pathlib.Path(path).read_text()[17:69] 'Python library for modeling aviation climate impacts'
- gs_path(cache_path)¶
Return a full Google Storage (gs://) URI to object.
- Parameters:
cache_path (
str
) – string path to object in cache- Returns:
str
– Google Storage URI (gs://) to object in cache
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... ) >>> cache.path("file.nc") 'cache/file.nc'
- listdir(path='')¶
List the contents of a directory in the cache.
- Parameters:
path (
str
) – Path to the directory to list- Returns:
list[str]
– List of files in the directory
- path(cache_path)¶
Return a full filepath in cache.
- Parameters:
cache_path (
str
) – string path or filepath to create in cache If parent directories do not exist, they will be created.- Returns:
str
– Full path string to subdirectory directory or object in cache directory
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.path("file.nc") 'cache/file.nc'
>>> cache.clear() # cleanup
- project¶
- put(data_path, cache_path=None)¶
Save data to the GCP cache store.
If
read_only
is True, this method will return the path to the local disk cache store path.- Parameters:
data_path (
str | pathlib.Path
) – Data to save to GCP cache store.cache_path (
str
, optional) – Path in cache store to save data. Defaults to the same filename asdata_path
.
- Returns:
str
– Returns the path in the cache to the stored file- Raises:
RuntimeError – Raises if
read_only
is TrueFileNotFoundError – Raises if
data
is a string and a file is not found at the string
Examples
>>> from pycontrails import GCPCacheStore >>> cache = GCPCacheStore( ... bucket="contrails-301217-unit-test", ... cache_dir="cache", ... read_only=False, ... )
>>> # put a file directly >>> cache.put("README.md", "test/file.md") 'test/file.md'
- put_multiple(data_path, cache_path)¶
Put multiple files into the cache at once.
- Parameters:
- Returns:
list[str]
– Returns a list of relative paths to the stored files in the cache
- read_only¶
- show_progress¶
- property size¶
Return the disk size (in MBytes) of the local cache.
- Returns:
float
– Size of the disk cache store in MB
Examples
>>> from pycontrails import DiskCacheStore >>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True) >>> cache.size 0.0...
>>> cache.clear() # cleanup
- timeout¶