cfdm.Data.nc_dataset_shards

Data.nc_dataset_shards()[source]

Get the Zarr dataset shard size for the data.

When writing to a Zarr dataset, sharding provides a mechanism to store multiple dataset chunks in a single storage object or file. Without sharding, each dataset chunk is written to its own file. Traditional file systems and object storage systems may have performance issues storing and accessing large number of files, and small files can be inefficient to store if they are smaller than the block size of the file system. Sharding can improve performance by creating fewer, and larger, files for storing the dataset chunks.

The sharding strategy is ignored when writing to a non-Zarr dataset.

Added in version (cfdm): 1.13.0.0

See also

nc_clear_dataset_shards, nc_set_dataset_shards, nc_dataset_chunksizes, cfdm.write

Returns:
None or int or sequence of int

The current sharding strategy. One of:

  • None

    No sharding.

  • int

    The integer number of dataset chunks to be stored in a single shard, favouring an equal number of dataset chunks along each shard dimension.

  • sequence of int

    The number of chunks along each shard dimension.

Examples

>>> d.shape
(1, 100, 200)
>>> d.nc_dataset_chunksizes()
(1, 30, 50)
>>> d.nc_set_dataset_shards(4)
>>> d.nc_dataset_shards()
4
>>> d.nc_clear_dataset_shards()
4
>>> print(d.nc_dataset_shards())
None
>>> d.nc_set_dataset_shards((5, 4))
>>> d.nc_dataset_shards()
(5, 4)