pycontrails.core.cache

Pycontrails Caching Support.

Classes

CacheStore()

Abstract cache storage class for storing staged and intermediate data.

DiskCacheStore([cache_dir, allow_clear])

Cache that uses a folder on the local filesystem.

GCPCacheStore([cache_dir, project, bucket, ...])

Google Cloud Platform (Storage) Cache.

class pycontrails.core.cache.CacheStore

Bases: ABC

Abstract cache storage class for storing staged and intermediate data.

allow_clear
cache_dir
abstract exists(cache_path)

Check if a path in cache exists.

Parameters:

cache_path (str) – Path to directory or file in cache

Returns:

bool – True if directory or file exists

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.exists("file.nc")
False
abstract get(cache_path)

Get data from cache.

abstract listdir(path='')

List the contents of a directory in the cache.

Parameters:

path (str) – Path to the directory to list

Returns:

list[str] – List of files in the directory

abstract path(cache_path)

Return a full filepath in cache.

Parameters:

cache_path (str) – string path or filepath to create in cache If parent directories do not exist, they will be created.

Returns:

str – Full path string to subdirectory directory or object in cache directory

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.path("file.nc")
'cache/file.nc'
>>> cache.clear()  # cleanup
abstract put(data, cache_path=None)

Save data to cache.

put_multiple(data_path, cache_path)

Put multiple files into the cache at once.

Parameters:
  • data_path (Sequence[str | pathlib.Path]) – List of data files to cache. Each member is passed directly on to put().

  • cache_path (list[str]) – List of cache paths corresponding to each element in the data_path list. Each member is passed directly on to put().

Returns:

list[str] – Returns a list of relative paths to the stored files in the cache

abstract property size

Return the disk size (in MBytes) of the local cache.

Returns:

float – Size of the disk cache store in MB

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.size
0.0...
>>> cache.clear()  # cleanup
class pycontrails.core.cache.DiskCacheStore(cache_dir=None, allow_clear=False)

Bases: CacheStore

Cache that uses a folder on the local filesystem.

Parameters:
  • allow_clear (bool, optional) – Allow this cache to be cleared using clear(). Defaults to False.

  • cache_dir (str | pathlib.Path, optional) – Root cache directory. By default, looks first for PYCONTRAILS_CACHE_DIR environment variable, then uses the OS specific platformdirs.user_cache_dir() function.

Examples

>>> from pycontrails import DiskCacheStore
>>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> disk_cache.cache_dir
'cache'
>>> disk_cache.clear()  # cleanup
allow_clear
cache_dir
clear(cache_path='')

Delete all files and folders within cache_path.

If no cache_path is provided, this will clear the entire cache.

If allow_clear is set to False, this method will do nothing.

Parameters:

cache_path (str, optional) – Path to subdirectory or file in cache

Raises:

RuntimeError – Raises a RuntimeError when allow_clear is set to False

Examples

>>> from pycontrails import DiskCacheStore
>>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> # Write some data to the cache
>>> disk_cache.put("README.md", "test/example.txt")
'test/example.txt'
>>> disk_cache.exists("test/example.txt")
True
>>> # clear a specific path
>>> disk_cache.clear("test/example.txt")
>>> # clear the whole cache
>>> disk_cache.clear()
exists(cache_path)

Check if a path in cache exists.

Parameters:

cache_path (str) – Path to directory or file in cache

Returns:

bool – True if directory or file exists

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.exists("file.nc")
False
get(cache_path)

Get data path from the local cache store.

Alias for path()

Parameters:

cache_path (str) – Cache path to retrieve

Returns:

str – Returns the relative path in the cache to the stored file

Examples

>>> from pycontrails import DiskCacheStore
>>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>>
>>> # returns a path
>>> disk_cache.get("test/file.md")
'cache/test/file.md'
listdir(path='')

List the contents of a directory in the cache.

Parameters:

path (str) – Path to the directory to list

Returns:

list[str] – List of files in the directory

path(cache_path)

Return a full filepath in cache.

Parameters:

cache_path (str) – string path or filepath to create in cache If parent directories do not exist, they will be created.

Returns:

str – Full path string to subdirectory directory or object in cache directory

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.path("file.nc")
'cache/file.nc'
>>> cache.clear()  # cleanup
put(data_path, cache_path=None)

Save data to the local cache store.

Parameters:
  • data_path (str | pathlib.Path) – Path to data to cache.

  • cache_path (str | None, optional) – Path in cache store to save data Defaults to the same filename as data_path

Returns:

str – Returns the relative path in the cache to the stored file

Raises:

FileNotFoundError – Raises if data is a string and a file is not found at the string

Examples

>>> from pycontrails import DiskCacheStore
>>> disk_cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>>
>>> # put a file directly
>>> disk_cache.put("README.md", "test/file.md")
'test/file.md'
put_multiple(data_path, cache_path)

Put multiple files into the cache at once.

Parameters:
  • data_path (Sequence[str | pathlib.Path]) – List of data files to cache. Each member is passed directly on to put().

  • cache_path (list[str]) – List of cache paths corresponding to each element in the data_path list. Each member is passed directly on to put().

Returns:

list[str] – Returns a list of relative paths to the stored files in the cache

property size

Return the disk size (in MBytes) of the local cache.

Returns:

float – Size of the disk cache store in MB

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.size
0.0...
>>> cache.clear()  # cleanup
class pycontrails.core.cache.GCPCacheStore(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)

Bases: CacheStore

Google Cloud Platform (Storage) Cache.

This class downloads files from Google Cloud Storage locally to a DiskCacheStore initialized with cache_dir=".gcp" to avoid re-downloading files. If the source files on GCP changes, the local mirror of the GCP DiskCacheStore must be cleared by initializing this class and running clear_disk().

Note by default, GCP Cache Store is read only. When a put() is called and read_only is set to True, the cache will throw an RuntimeError error. Set read_only to False to enable writing to cache store.

Parameters:
  • cache_dir (str, optional) – Root object prefix within bucket Defaults to PYCONTRAILS_CACHE_DIR environment variable, or the root of the bucket. The full GCP URI (ie, “gs://<MY_BUCKET>/<PREFIX>”) can be used here.

  • project (str , *optional*) – GCP Project. Defaults to the current active project set in the google-cloud-sdk environment

  • bucket (str, optional) – GCP Bucket to use for cache. Defaults to PYCONTRAILS_CACHE_BUCKET environment variable.

  • read_only (bool, optional) – Only enable reading from cache. Defaults to True.

  • allow_clear (bool, optional) – Allow this cache to be cleared using clear(). Defaults to False.

  • disk_cache (DiskCacheStore, optional) – Specify a custom local disk cache store to mirror files. Defaults to DiskCacheStore(cache_dir="{user_cache_dir}/.gcp/{bucket}")

  • show_progress (bool, optional) – Show progress bar on cache put(). Defaults to False

  • chunk_size (int, optional) – Chunk size for uploads and downloads with progress. Set a larger size to see more granular progress, and set a smaller size for more optimal download speed. Chunk size must be a multiple of 262144 (ie, 10 * 262144). Default value is 8 * 262144, which will throttle fast download speeds.

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.cache_dir
'cache/'
>>> cache.bucket
'contrails-301217-unit-test'
allow_clear
bucket
cache_dir
chunk_size
clear_disk(cache_path='')

Clear the local disk cache mirror of the GCP Cache Store.

Parameters:

cache_path (str, optional) – Path in mirrored cache store. Passed into _disk_clear.clear(). By default, this method will clear the entire mirrored cache store.

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.clear_disk()
property client

Handle to Google Cloud Storage client.

Returns:

google.cloud.storage.Client – Handle to Google Cloud Storage client

exists(cache_path)

Check if a path in cache exists.

Parameters:

cache_path (str) – Path to directory or file in cache

Returns:

bool – True if directory or file exists

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.exists("file.nc")
False
get(cache_path)

Get data from the local cache store.

Parameters:

cache_path (str) – Path in cache store to get data

Returns:

str – Returns path to downloaded local file

Raises:

ValueError – Raises value error is cache_path refers to a directory

Examples

>>> import pathlib
>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
...     read_only=False,
... )
>>> cache.put("README.md", "example/file.md")
'example/file.md'
>>> # returns a full path to local copy of the file
>>> path = cache.get("example/file.md")
>>> pathlib.Path(path).is_file()
True
>>> pathlib.Path(path).read_text()[17:69]
'Python library for modeling aviation climate impacts'
gs_path(cache_path)

Return a full Google Storage (gs://) URI to object.

Parameters:

cache_path (str) – string path to object in cache

Returns:

str – Google Storage URI (gs://) to object in cache

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.path("file.nc")
'cache/file.nc'
listdir(path='')

List the contents of a directory in the cache.

Parameters:

path (str) – Path to the directory to list

Returns:

list[str] – List of files in the directory

path(cache_path)

Return a full filepath in cache.

Parameters:

cache_path (str) – string path or filepath to create in cache If parent directories do not exist, they will be created.

Returns:

str – Full path string to subdirectory directory or object in cache directory

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.path("file.nc")
'cache/file.nc'
>>> cache.clear()  # cleanup
project
put(data_path, cache_path=None)

Save data to the GCP cache store.

If read_only is True, this method will return the path to the local disk cache store path.

Parameters:
  • data_path (str | pathlib.Path) – Data to save to GCP cache store.

  • cache_path (str, optional) – Path in cache store to save data. Defaults to the same filename as data_path.

Returns:

str – Returns the path in the cache to the stored file

Raises:

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
...     read_only=False,
... )
>>> # put a file directly
>>> cache.put("README.md", "test/file.md")
'test/file.md'
put_multiple(data_path, cache_path)

Put multiple files into the cache at once.

Parameters:
  • data_path (Sequence[str | pathlib.Path]) – List of data files to cache. Each member is passed directly on to put().

  • cache_path (list[str]) – List of cache paths corresponding to each element in the data_path list. Each member is passed directly on to put().

Returns:

list[str] – Returns a list of relative paths to the stored files in the cache

read_only
show_progress
property size

Return the disk size (in MBytes) of the local cache.

Returns:

float – Size of the disk cache store in MB

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.size
0.0...
>>> cache.clear()  # cleanup
timeout