cf.Data.rechunk

Data.rechunk(chunks='auto', threshold=None, block_size_limit=None, balance=False, inplace=False)[source]

Change the chunk structure of the data.

Performance

Rechunking can sometimes be expensive and incur a lot of communication overheads.

New in version 3.14.0.

Parameters
chunks: int, tuple, dict or str, optional

Specify the chunking of the underlying dask array.

Any value accepted by the chunks parameter of the dask.array.from_array function is allowed.

By default, "auto" is used to specify the array chunking, which uses a chunk size in bytes defined by the cf.chunksize function, preferring square-like chunk shapes.

Parameter example:

A blocksize like 1000.

Parameter example:

A blockshape like (1000, 1000).

Parameter example:

Explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).

Parameter example:

A size in bytes, like "100MiB" which will choose a uniform block-like shape, preferring square-like chunk shapes.

Parameter example:

A blocksize of -1 or None in a tuple or dictionary indicates the size of the corresponding dimension.

Parameter example:

Blocksizes of some or all dimensions mapped to dimension positions, like {1: 200}, or {0: -1, 1: (400, 400)}.

threshold: int, optional

The graph growth factor under which we don’t bother introducing an intermediate step. See dask.array.rechunk for details.

block_size_limit: int, optional

The maximum block size (in bytes) we want to produce, as defined by the cf.chunksize function.

balance: bool, optional

If True, try to make each chunk the same size. By default this is not attempted.

This means balance=True will remove any small leftover chunks, so using d.rechunk(chunks=len(d) // N, balance=True) will almost certainly result in N chunks.

Returns
Data or None

The rechunked data, or None if the operation was in-place.

Examples

>>> x = cf.Data.ones((1000, 1000), chunks=(100, 100))

Specify uniform chunk sizes with a tuple

>>> y = x.rechunk((1000, 10))

Or chunk only specific dimensions with a dictionary

>>> y = x.rechunk({0: 1000})

Use the value -1 to specify that you want a single chunk along a dimension or the value "auto" to specify that dask can freely rechunk a dimension to attain blocks of a uniform block size.

>>> y = x.rechunk({0: -1, 1: 'auto'}, block_size_limit=1e8)

If a chunk size does not divide the dimension then rechunk will leave any unevenness to the last chunk.

>>> x.rechunk(chunks=(400, -1)).chunks
((400, 400, 200), (1000,))

However if you want more balanced chunks, and don’t mind dask choosing a different chunksize for you then you can use the balance=True option.

>>> x.rechunk(chunks=(400, -1), balance=True).chunks
((500, 500), (1000,))