cf.Data.nc_set_dataset_shards¶

Data.nc_set_dataset_shards(shards)[source]¶

Set the Zarr dataset sharding strategy for the data.

When writing to a Zarr dataset, sharding provides a mechanism to store multiple dataset chunks in a single storage object or file. Without sharding, each dataset chunk is written to its own file. Traditional file systems and object storage systems may have performance issues storing and accessing large number of files, and small files can be inefficient to store if they are smaller than the block size of the file system. Sharding can improve performance by creating fewer, and larger, files for storing the dataset chunks.

The sharding strategy is ignored when writing to a non-Zarr dataset.

Added in version (cfdm): 1.13.0.0

Parameters:

shards: None or int or sequence of int

The new sharding strategy. One of:

None

No sharding.
int

The integer number of dataset chunks to be stored in a single shard, favouring an equal number of dataset chunks along each shard dimension.
sequence of int

The number of chunks along each shard dimension.

Example:: For two dimensional data, the following are equivalent: 25 and (5, 5).

Returns:

None

Examples

>>> d.shape
(1, 100, 200)
>>> d.nc_dataset_chunksizes()
(1, 30, 50)
>>> d.nc_set_dataset_shards(4)
>>> d.nc_dataset_shards()
4
>>> d.nc_clear_dataset_shards()
4
>>> print(d.nc_dataset_shards())
None
>>> d.nc_set_dataset_shards((5, 4))
>>> d.nc_dataset_shards()
(5, 4)
>>> d.nc_set_dataset_shards(None)
>>> print(d.nc_dataset_shards())
None

cf 3.19.0

Related Topics

cf.Data.nc_set_dataset_shards¶