cf.Data.rechunk¶
-
Data.
rechunk
(chunks='auto', threshold=None, block_size_limit=None, balance=False, inplace=False)[source]¶ Change the chunk structure of the data.
Performance
Rechunking can sometimes be expensive and incur a lot of communication overheads.
New in version 3.14.0.
See also
- Parameters
- chunks:
int
,tuple
,dict
orstr
, optional Specify the chunking of the underlying dask array.
Any value accepted by the chunks parameter of the
dask.array.from_array
function is allowed.By default,
"auto"
is used to specify the array chunking, which uses a chunk size in bytes defined by thecf.chunksize
function, preferring square-like chunk shapes.- Parameter example:
A blocksize like
1000
.- Parameter example:
A blockshape like
(1000, 1000)
.- Parameter example:
Explicit sizes of all blocks along all dimensions like
((1000, 1000, 500), (400, 400))
.- Parameter example:
A size in bytes, like
"100MiB"
which will choose a uniform block-like shape, preferring square-like chunk shapes.- Parameter example:
A blocksize of
-1
orNone
in a tuple or dictionary indicates the size of the corresponding dimension.- Parameter example:
Blocksizes of some or all dimensions mapped to dimension positions, like
{1: 200}
, or{0: -1, 1: (400, 400)}
.
- threshold:
int
, optional The graph growth factor under which we don’t bother introducing an intermediate step. See
dask.array.rechunk
for details.- block_size_limit:
int
, optional The maximum block size (in bytes) we want to produce, as defined by the
cf.chunksize
function.- balance:
bool
, optional If True, try to make each chunk the same size. By default this is not attempted.
This means
balance=True
will remove any small leftover chunks, so usingd.rechunk(chunks=len(d) // N, balance=True)
will almost certainly result inN
chunks.
- chunks:
- Returns
Examples
>>> x = cf.Data.ones((1000, 1000), chunks=(100, 100))
Specify uniform chunk sizes with a tuple
>>> y = x.rechunk((1000, 10))
Or chunk only specific dimensions with a dictionary
>>> y = x.rechunk({0: 1000})
Use the value
-1
to specify that you want a single chunk along a dimension or the value"auto"
to specify that dask can freely rechunk a dimension to attain blocks of a uniform block size.>>> y = x.rechunk({0: -1, 1: 'auto'}, block_size_limit=1e8)
If a chunk size does not divide the dimension then rechunk will leave any unevenness to the last chunk.
>>> x.rechunk(chunks=(400, -1)).chunks ((400, 400, 200), (1000,))
However if you want more balanced chunks, and don’t mind
dask
choosing a different chunksize for you then you can use thebalance=True
option.>>> x.rechunk(chunks=(400, -1), balance=True).chunks ((500, 500), (1000,))