cfdm.Data¶
-
class
cfdm.
Data
(array=None, units=None, calendar=None, fill_value=None, hardmask=True, chunks='auto', dt=False, source=None, copy=True, dtype=None, mask=None, mask_value=None, to_memory=False, init_options=None, _use_array=True)[source]¶ Bases:
cfdm.mixin.container.Container
,cfdm.mixin.netcdf.NetCDFAggregation
,cfdm.mixin.netcdf.NetCDFHDF5
,cfdm.mixin.files.Files
,cfdm.core.data.data.Data
An N-dimensional data array with units and masked values.
Contains an N-dimensional, indexable and broadcastable array with many similarities to a
numpy
array.Contains the units of the array elements.
Supports masked arrays, regardless of whether or not it was initialised with a masked array.
Stores and operates on data arrays which are larger than the available memory.
Indexing
A data array is indexable in a similar way to a numpy array:
>>> d.shape (12, 19, 73, 96) >>> d[...].shape (12, 19, 73, 96) >>> d[slice(0, 9), 10:0:-2, :, :].shape (9, 5, 73, 96)
There are three extensions to the numpy indexing functionality:
Size 1 dimensions are never removed by indexing.
An integer index i takes the i-th element but does not reduce the rank of the output array by one:
>>> d.shape (12, 19, 73, 96) >>> d[0, ...].shape (1, 19, 73, 96) >>> d[:, 3, slice(10, 0, -2), 95].shape (12, 1, 5, 1)
Size 1 dimensions may be removed with the
squeeze
method.The indices for each axis work independently.
When more than one dimension’s slice is a 1-d Boolean sequence or 1-d sequence of integers, then these indices work independently along each dimension (similar to the way vector subscripts work in Fortran), rather than by their elements:
>>> d.shape (12, 19, 73, 96) >>> d[0, :, [0, 1], [0, 13, 27]].shape (1, 19, 2, 3)
Boolean indices may be any object which exposes the numpy array interface.
>>> d.shape (12, 19, 73, 96) >>> d[..., d[0, 0, 0] > d[0, 0, 0].min()]
Initialisation
- Parameters
- array: optional
The array of values. May be a scalar or array-like object, including another
Data
instance, anything with ato_dask_array
method,numpy
array,dask
array,xarray
array,cfdm.Array
subclass,list
,tuple
, scalar.- Parameter example:
array=34.6
- Parameter example:
array=[[1, 2], [3, 4]]
- Parameter example:
array=numpy.ma.arange(10).reshape(2, 1, 5)
- units:
str
orUnits
, optional The physical units of the data. if a
Units
object is provided then this an also set the calendar.The units (without the calendar) may also be set after initialisation with the
set_units
method.- Parameter example:
units='km hr-1'
- Parameter example:
units='days since 2018-12-01'
- calendar:
str
, optional The calendar for reference time units.
The calendar may also be set after initialisation with the
set_calendar
method.- Parameter example:
calendar='360_day'
- fill_value: optional
The fill value of the data. By default, or if set to
None
, thenumpy
fill value appropriate to the array’s data-type will be used (seenumpy.ma.default_fill_value
).The fill value may also be set after initialisation with the
set_fill_value
method.- Parameter example:
fill_value=-999.
- dtype: data-type, optional
The desired data-type for the data. By default the data-type will be inferred form the array parameter.
The data-type may also be set after initialisation with the
dtype
attribute.- Parameter example:
dtype=float
- Parameter example:
dtype='float32'
- Parameter example:
dtype=numpy.dtype('i2')
New in version 3.0.4.
- mask: optional
Apply this mask to the data given by the array parameter. By default, or if mask is
None
, no mask is applied. May be any scalar or array-like object (such as alist
,numpy
array orData
instance) that is broadcastable to the shape of array. Masking will be carried out where the mask elements evaluate toTrue
.This mask will applied in addition to any mask already defined by the array parameter.
- mask_value: scalar array_like
Mask array where it is equal to mask_value, using numerically tolerant floating point equality.
New in version (cfdm): 1.11.0.0
- hardmask:
bool
, optional If True (the default) then the mask is hard. If False then the mask is soft.
- dt:
bool
, optional If True then strings (such as
'1990-12-01 12:00'
) given by the array parameter are re-interpreted as date-time objects. By default they are not.- source: optional
Convert source, which can be any type of object, to a
Data
instance.All other parameters, apart from copy, are ignored and their values are instead inferred from source by assuming that it has the
Data
API. Any parameters that can not be retrieved from source in this way are assumed to have their default value.Note that if
x
is also aData
instance thencfdm.Data(source=x)
is equivalent tox.copy()
.- copy:
bool
, optional If True (the default) then deep copy the input parameters prior to initialisation. By default the parameters are not deep copied.
- chunks:
int
,tuple
,dict
orstr
, optional Specify the chunking of the underlying dask array.
Any value accepted by the chunks parameter of the
dask.array.from_array
function is allowed.By default,
"auto"
is used to specify the array chunking, which uses a chunk size in bytes defined by thecfdm.chunksize
function, preferring square-like chunk shapes.- Parameter example:
A blocksize like
1000
.- Parameter example:
A blockshape like
(1000, 1000)
.- Parameter example:
Explicit sizes of all blocks along all dimensions like
((1000, 1000, 500), (400, 400))
.- Parameter example:
A size in bytes, like
"100MiB"
which will choose a uniform block-like shape, preferring square-like chunk shapes.- Parameter example:
A blocksize of
-1
orNone
in a tuple or dictionary indicates the size of the corresponding dimension.- Parameter example:
Blocksizes of some or all dimensions mapped to dimension positions, like
{1: 200}
, or{0: -1, 1: (400, 400)}
.
New in version (cfdm): 1.11.2.0
- to_memory:
bool
, optional If True then ensure that the original data are in memory, rather than on disk.
If the original data are on disk, then reading data into memory during initialisation will slow down the initialisation process, but can considerably improve downstream performance by avoiding the need for independent reads for every dask chunk, each time the data are computed.
In general, setting to_memory to True is not the same as calling the
persist
of the newly createdData
object, which also decompresses data compressed by convention and computes any data type, mask and date-time modifications.If the input array is a
dask.array.Array
object then to_memory is ignored.New in version (cfdm): 1.11.2.0
- init_options:
dict
, optional Provide optional keyword arguments to methods and functions called during the initialisation process. A dictionary key identifies a method or function. The corresponding value is another dictionary whose key/value pairs are the keyword parameter names and values to be applied.
Supported keys are:
'from_array'
: Provide keyword arguments to thedask.array.from_array
function. This is used when initialising data that is not already a dask array and is not compressed by convention.'first_non_missing_value'
: Provide keyword arguments to thecfdm.data.utils.first_non_missing_value
function. This is used when the input array contains date-time strings or objects, and may affect performance.
- Parameter example:
{'from_array': {'inline_array': True}}
Examples
>>> d = cfdm.Data(5) >>> d = cfdm.Data([1,2,3], units='K') >>> import numpy >>> d = cfdm.Data(numpy.arange(10).reshape(2,5), ... units='m/s', fill_value=-999) >>> d = cfdm.Data('fly') >>> d = cfdm.Data(tuple('fly'))
Inspection¶
Attributes
A numpy array copy of the data. |
|
Return an independent |
|
The |
|
Number of dimensions in the data array. |
|
Tuple of the data array’s dimension sizes. |
|
Number of elements in the data array. |
|
Total number of bytes consumed by the elements of the array. |
Units¶
Delete the units. |
|
Return the units. |
|
Whether units have been set. |
|
Set the units. |
Attributes
The |
Date-time support¶
Delete the calendar. |
|
Return the calendar. |
|
Whether a calendar has been set. |
|
Set the calendar. |
Attributes
An independent numpy array of date-time objects. |
|
Alias for |
|
Returns an independent numpy array with datetimes as strings. |
Dask¶
A view of the computed data. |
|
Persist data into memory. |
|
Remove unnecessary tasks from the dask graph in-place. |
|
Returns a dask array of the compressed data. |
|
Change the chunk structure of the data. |
|
Return indices of the data that define each dask chunk. |
|
Return a dictionary of the dask graph key/value pairs. |
|
Convert the data to a |
|
Get the deterministic name for the data. |
|
Whether there is a deterministic name for the data. |
Attributes
The |
|
The largest |
|
Find the position of each chunk. |
|
The total number of chunks. |
|
The number of chunks along each dimension. |
Data creation routines¶
Ones and zeros¶
Return a new array, without initialising entries. |
|
Returns a new array filled with ones of set shape and type. |
|
Returns a new array filled with zeros of set shape and type. |
|
Return new data filled with a fill value. |
From existing data¶
Convert the input to a |
|
Return a deep copy of the data. |
Data manipulation routines¶
Changing data shape¶
Flatten specified axes of the data. |
|
Change the shape of the data without changing its values. |
Transpose-like operations¶
Permute the axes of the data array. |
Changing number of dimensions¶
Expand the shape of the data array in place. |
|
Remove size 1 axes from the data array. |
Joining data¶
Join a sequence of data arrays together. |
Adding and removing elements¶
The unique elements of the data. |
Expanding the data¶
Pad an axis with missing data. |
Indexing routines¶
Single value selection¶
Return the first element of the data as a scalar. |
|
Return the second element of the data as a scalar. |
|
Return the last element of the data as a scalar. |
Logic functions¶
Truth value testing¶
Test whether all data array elements evaluate to True. |
|
Test whether any data array elements evaluate to True. |
Comparison¶
True if two data arrays are logically equal, False otherwise. |
Mask support¶
Force the mask to hard. |
|
Force the mask to soft. |
|
Apply masking. |
|
Mask the data where a condition is met. |
|
Replace masked elements with a fill value. |
|
Mask using floating point equality. |
|
Delete the fill value. |
|
Return the missing data value. |
|
Whether a fill value has been set. |
|
Set the missing data value. |
Attributes
Hardness of the mask. |
|
The Boolean missing data mask of the data array. |
|
The data array missing data value. |
Mathematical functions¶
Sums, products, differences¶
Calculate sum values. |
Set routines¶
Making proper sets¶
The unique elements of the data. |
Sorting, searching, and counting¶
Statistics¶
Order statistics¶
|
|
|
|
Calculate maximum values. |
|
Calculate minimum values. |
Sums¶
Calculate sum values. |
Compression by convention¶
Returns the dimensions that are compressed in the array. |
|
Returns the compressed dimension’s array position. |
|
Returns the type of compression applied to the array. |
|
Return the count variable for a compressed array. |
|
Return the index variable for a compressed array. |
|
Return the list variable for a compressed array. |
|
Return the list variable for a compressed array. |
|
Return the list variable for a compressed array. |
|
Return the list variable for a compressed array. |
|
Uncompress the data. |
Attributes
Returns an independent numpy array of the compressed data. |
Miscellaneous¶
Return the commands that would create the data object. |
|
Returns the data. |
|
The names of files containing parts of the data array. |
|
The names of files containing the original data and metadata. |
|
Return the underlying array object. |
|
Return indices of the data that define each dask chunk. |
Attributes
The data as an object identity. |
|
Return the data as a scalar or (nested) list. |
Performance¶
Clear the HDF5 chunking strategy for the data. |
|
Get the HDF5 chunking strategy for the data. |
|
Set the HDF5 chunking strategy for the data. |
|
|
Aggregation¶
The directories of files containing parts of the data. |
|
Replace file directories in-place. |
|
Replace file locations in-place. |
|
Remove the netCDF aggregated_data terms. |
|
Set the netCDF aggregation write status to |
|
Return the netCDF aggregated data terms. |
|
The type of fragments in the aggregated data. |
|
Get the netCDF aggregation write status. |
|
Whether any netCDF aggregated_data terms have been set. |
|
Set the netCDF aggregated_data elements. |
|
Set the netCDF aggregation write status. |
Special¶
The numpy array interface. |
|
Called by the |
|
Return a subspace of the data defined by indices. |
|
Called to implement the built-in function |
|
Called when an iterator is required. |
|
Called by the |
|
Implement indexed assignment. |
|
Called by the |
Docstring substitutions¶
Methods
Return the special docstring substitutions. |
|
Returns the substitutions that apply to methods of the class. |
|
Returns the class {{package}} substitutions package depth. |
|
Returns method names excluded in the class substitutions. |