pycontrails.GCPCacheStore

class pycontrails.GCPCacheStore(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)

Bases: CacheStore

Google Cloud Platform (Storage) Cache.

This class downloads files from Google Cloud Storage locally to a DiskCacheStore initialized with cache_dir=".gcp" to avoid re-downloading files. If the source files on GCP changes, the local mirror of the GCP DiskCacheStore must be cleared by initializing this class and running clear_disk().

Note by default, GCP Cache Store is read only. When a put() is called and read_only is set to True, the cache will throw an RuntimeError error. Set read_only to False to enable writing to cache store.

Parameters:
  • cache_dir (str, optional) – Root object prefix within bucket Defaults to PYCONTRAILS_CACHE_DIR environment variable, or the root of the bucket. The full GCP URI (ie, “gs://<MY_BUCKET>/<PREFIX>”) can be used here.

  • project (str , *optional*) – GCP Project. Defaults to the current active project set in the google-cloud-sdk environment

  • bucket (str, optional) – GCP Bucket to use for cache. Defaults to PYCONTRAILS_CACHE_BUCKET environment variable.

  • read_only (bool, optional) – Only enable reading from cache. Defaults to True.

  • allow_clear (bool, optional) – Allow this cache to be cleared using clear(). Defaults to False.

  • disk_cache (DiskCacheStore, optional) – Specify a custom local disk cache store to mirror files. Defaults to DiskCacheStore(cache_dir="{user_cache_dir}/.gcp/{bucket}")

  • show_progress (bool, optional) – Show progress bar on cache put(). Defaults to False

  • chunk_size (int, optional) – Chunk size for uploads and downloads with progress. Set a larger size to see more granular progress, and set a smaller size for more optimal download speed. Chunk size must be a multiple of 262144 (ie, 10 * 262144). Default value is 8 * 262144, which will throttle fast download speeds.

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.cache_dir
'cache/'
>>> cache.bucket
'contrails-301217-unit-test'
__init__(cache_dir='', project=None, bucket=None, disk_cache=None, read_only=True, allow_clear=False, timeout=300, show_progress=False, chunk_size=16777216)

Methods

__init__([cache_dir, project, bucket, ...])

clear_disk([cache_path])

Clear the local disk cache mirror of the GCP Cache Store.

exists(cache_path)

Check if a path in cache exists.

get(cache_path)

Get data from the local cache store.

gs_path(cache_path)

Return a full Google Storage (gs://) URI to object.

listdir([path])

List the contents of a directory in the cache.

path(cache_path)

Return a full filepath in cache.

put(data_path[, cache_path])

Save data to the GCP cache store.

put_multiple(data_path, cache_path)

Put multiple files into the cache at once.

Attributes

bucket

chunk_size

project

read_only

show_progress

timeout

allow_clear

cache_dir

client

Handle to Google Cloud Storage client.

size

Return the disk size (in MBytes) of the local cache.

allow_clear
bucket
cache_dir
chunk_size
clear_disk(cache_path='')

Clear the local disk cache mirror of the GCP Cache Store.

Parameters:

cache_path (str, optional) – Path in mirrored cache store. Passed into _disk_clear.clear(). By default, this method will clear the entire mirrored cache store.

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.clear_disk()
property client

Handle to Google Cloud Storage client.

Returns:

google.cloud.storage.Client – Handle to Google Cloud Storage client

exists(cache_path)

Check if a path in cache exists.

Parameters:

cache_path (str) – Path to directory or file in cache

Returns:

bool – True if directory or file exists

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.exists("file.nc")
False
get(cache_path)

Get data from the local cache store.

Parameters:

cache_path (str) – Path in cache store to get data

Returns:

str – Returns path to downloaded local file

Raises:

ValueError – Raises value error is cache_path refers to a directory

Examples

>>> import pathlib
>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
...     read_only=False,
... )
>>> cache.put("README.md", "example/file.md")
'example/file.md'
>>> # returns a full path to local copy of the file
>>> path = cache.get("example/file.md")
>>> pathlib.Path(path).is_file()
True
>>> pathlib.Path(path).read_text()[17:69]
'Python library for modeling aviation climate impacts'
gs_path(cache_path)

Return a full Google Storage (gs://) URI to object.

Parameters:

cache_path (str) – string path to object in cache

Returns:

str – Google Storage URI (gs://) to object in cache

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
... )
>>> cache.path("file.nc")
'cache/file.nc'
listdir(path='')

List the contents of a directory in the cache.

Parameters:

path (str) – Path to the directory to list

Returns:

list[str] – List of files in the directory

path(cache_path)

Return a full filepath in cache.

Parameters:

cache_path (str) – string path or filepath to create in cache If parent directories do not exist, they will be created.

Returns:

str – Full path string to subdirectory directory or object in cache directory

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.path("file.nc")
'cache/file.nc'
>>> cache.clear()  # cleanup
project
put(data_path, cache_path=None)

Save data to the GCP cache store.

If read_only is True, this method will return the path to the local disk cache store path.

Parameters:
  • data_path (str | pathlib.Path) – Data to save to GCP cache store.

  • cache_path (str, optional) – Path in cache store to save data. Defaults to the same filename as data_path.

Returns:

str – Returns the path in the cache to the stored file

Raises:

Examples

>>> from pycontrails import GCPCacheStore
>>> cache = GCPCacheStore(
...     bucket="contrails-301217-unit-test",
...     cache_dir="cache",
...     read_only=False,
... )
>>> # put a file directly
>>> cache.put("README.md", "test/file.md")
'test/file.md'
put_multiple(data_path, cache_path)

Put multiple files into the cache at once.

Parameters:
  • data_path (Sequence[str | pathlib.Path]) – List of data files to cache. Each member is passed directly on to put().

  • cache_path (list[str]) – List of cache paths corresponding to each element in the data_path list. Each member is passed directly on to put().

Returns:

list[str] – Returns a list of relative paths to the stored files in the cache

read_only
show_progress
property size

Return the disk size (in MBytes) of the local cache.

Returns:

float – Size of the disk cache store in MB

Examples

>>> from pycontrails import DiskCacheStore
>>> cache = DiskCacheStore(cache_dir="cache", allow_clear=True)
>>> cache.size
0.0...
>>> cache.clear()  # cleanup
timeout