cf.Data¶

class cf.Data(*args, **kwargs)[source]¶

An N-dimensional data array with units and masked values.

Contains an N-dimensional, indexable and broadcastable array with many similarities to a numpy array.
Contains the units of the array elements.
Supports masked arrays, regardless of whether or not it was initialised with a masked array.
Stores and operates on data arrays which are larger than the available memory.

Indexing

A data array is indexable in a similar way to numpy array:

>>> d.shape
(12, 19, 73, 96)
>>> d[...].shape
(12, 19, 73, 96)
>>> d[slice(0, 9), 10:0:-2, :, :].shape
(9, 5, 73, 96)

There are three extensions to the numpy indexing functionality:

Size 1 dimensions are never removed by indexing.

An integer index i takes the i-th element but does not reduce the rank of the output array by one:
```
>>> d.shape
(12, 19, 73, 96)
>>> d[0, ...].shape
(1, 19, 73, 96)
>>> d[:, 3, slice(10, 0, -2), 95].shape
(12, 1, 5, 1)
```
Size 1 dimensions may be removed with the squeeze method.
The indices for each axis work independently.

When more than one dimension’s slice is a 1-d boolean sequence or 1-d sequence of integers, then these indices work independently along each dimension (similar to the way vector subscripts work in Fortran), rather than by their elements:
```
>>> d.shape
(12, 19, 73, 96)
>>> d[0, :, [0, 1], [0, 13, 27]].shape
(1, 19, 2, 3)
```

Boolean indices may be any object which exposes the numpy array interface.

>>> d.shape
(12, 19, 73, 96)
>>> d[..., d[0, 0, 0]>d[0, 0, 0].min()]

Cyclic axes

Initialisation

Parameters:

array: optional

The array of values. May be a scalar or array-like object, including another Data instance, anything with a to_dask_array method, numpy array, dask array, xarray array, cf.Array subclass, list, tuple, scalar.

Parameter example:: array=34.6
Parameter example:: array=[[1, 2], [3, 4]]
Parameter example:: array=numpy.ma.arange(10).reshape(2, 1, 5)

units: str or Units, optional

The physical units of the data. if a Units object is provided then this an also set the calendar.

The units (without the calendar) may also be set after initialisation with the set_units method.

Parameter example:: units='km hr-1'
Parameter example:: units='days since 2018-12-01'

calendar: str, optional

The calendar for reference time units.

The calendar may also be set after initialisation with the set_calendar method.

Parameter example:: calendar='360_day'

fill_value: optional

The fill value of the data. By default, or if set to None, the numpy fill value appropriate to the array’s data-type will be used (see numpy.ma.default_fill_value).

The fill value may also be set after initialisation with the set_fill_value method.

Parameter example:: fill_value=-999.

dtype: data-type, optional

The desired data-type for the data. By default the data-type will be inferred form the array parameter.

The data-type may also be set after initialisation with the dtype attribute.

Parameter example:: dtype=float
Parameter example:: dtype='float32'
Parameter example:: dtype=numpy.dtype('i2')

Added in version 3.0.4.

mask: optional

Apply this mask to the data given by the array parameter. By default, or if mask is None, no mask is applied. May be any scalar or array-like object (such as a list, numpy array or Data instance) that is broadcastable to the shape of array. Masking will be carried out where the mask elements evaluate to True.

This mask will applied in addition to any mask already defined by the array parameter.

mask_value: scalar array_like

Mask array where it is equal to mask_value, using numerically tolerant floating point equality.

Added in version (cfdm): 1.11.0.0

hardmask: bool, optional

If True (the default) then the mask is hard. If False then the mask is soft.

dt: bool, optional

If True then strings (such as '1990-12-01 12:00') given by the array parameter are re-interpreted as date-time objects. By default they are not.

source: optional

Convert source, which can be any type of object, to a Data instance.

All other parameters, apart from copy, are ignored and their values are instead inferred from source by assuming that it has the Data API. Any parameters that can not be retrieved from source in this way are assumed to have their default value.

Note that if x is also a Data instance then cf.Data(source=x) is equivalent to x.copy().

copy: bool, optional

If True (the default) then deep copy the input parameters prior to initialisation. By default the parameters are not deep copied.

chunks: int, tuple, dict or str, optional

Specify the chunking of the underlying dask array.

Any value accepted by the chunks parameter of the dask.array.from_array function is allowed.

By default, "auto" is used to specify the array chunking, which uses a chunk size in bytes defined by the cf.chunksize function, preferring square-like chunk shapes.

Parameter example:: A blocksize like 1000.
Parameter example:: A blockshape like (1000, 1000).
Parameter example:: Explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).
Parameter example:: A size in bytes, like "100MiB" which will choose a uniform block-like shape, preferring square-like chunk shapes.
Parameter example:: A blocksize of -1 or None in a tuple or dictionary indicates the size of the corresponding dimension.
Parameter example:: Blocksizes of some or all dimensions mapped to dimension positions, like {1: 200}, or {0: -1, 1: (400, 400)}.

Added in version (cfdm): 1.11.2.0

to_memory: bool, optional

If True then ensure that the original data are in memory, rather than on disk.

If the original data are on disk, then reading data into memory during initialisation will slow down the initialisation process, but can considerably improve downstream performance by avoiding the need for independent reads for every dask chunk, each time the data are computed.

In general, setting to_memory to True is not the same as calling the persist of the newly created Data object, which also decompresses data compressed by convention and computes any data type, mask and date-time modifications.

If the input array is a dask.array.Array object then to_memory is ignored.

Added in version (cfdm): 1.11.2.0

init_options: dict, optional

Provide optional keyword arguments to methods and functions called during the initialisation process. A dictionary key identifies a method or function. The corresponding value is another dictionary whose key/value pairs are the keyword parameter names and values to be applied.

Supported keys are:

'from_array': Provide keyword arguments to the dask.array.from_array function. This is used when initialising data that is not already a dask array and is not compressed by convention.
'first_non_missing_value': Provide keyword arguments to the cfdm.data.utils.first_non_missing_value function. This is used when the input array contains date-time strings or objects, and may affect performance.

Parameter example:
{'from_array': {'inline_array': True}}

Examples

>>> d = cf.Data(5)
>>> d = cf.Data([1,2,3], units='K')
>>> import numpy
>>> d = cf.Data(numpy.arange(10).reshape(2,5),
...             units='m/s', fill_value=-999)
>>> d = cf.Data('fly')
>>> d = cf.Data(tuple('fly'))

Inspection¶

Attributes

`array`	A numpy array copy of the data.
`dtype`	The `numpy` data-type of the data.
`ndim`	Number of dimensions in the data array.
`shape`	Tuple of the data array's dimension sizes.
`size`	Number of elements in the data array.
`nbytes`	Total number of bytes consumed by the elements of the array.
`dump`	Return a string containing a full description of the instance.
`inspect`	Inspect the object for debugging.
`isscalar`	True if the data is a 0-d scalar array.
`sparse_array`	Return an independent `scipy` sparse array of the data.

Units¶

`del_units`	Delete the units.
`get_units`	Return the units.
`has_units`	Whether units have been set.
`set_units`	Set the units.
`to_units`	Change the data array units.
`override_units`	Override the units.
`del_calendar`	Delete the calendar.
`get_calendar`	Return the calendar.
`has_calendar`	Whether a calendar has been set.
`set_calendar`	Set the calendar.
`override_calendar`	Override the calendar of date-time units.
`change_calendar`	Change the calendar of date-time array elements.

Attributes

Units

The Units object containing the units of the data array.

Dask¶

`compute`	A view of the computed data.
`cull_graph`	Remove unnecessary tasks from the dask graph in-place.
`dask_compressed_array`	Returns a dask array of the compressed data.
`rechunk`	Change the chunk structure of the data.
`chunk_indices`	Return indices of the data that define each dask chunk.
`todict`	Return a dictionary of the dask graph key/value pairs.
`to_dask_array`	Convert the data to a `dask` array.
`get_deterministic_name`	Get the deterministic name for the data.
`has_deterministic_name`	Whether there is a deterministic name for the data.

Attributes

`chunks`	The `dask` chunk sizes for each dimension.
`chunksize`	The largest `dask` chunk size for each dimension.
`chunk_positions`	Find the position of each chunk.
`npartitions`	The total number of chunks.
`numblocks`	The number of chunks along each dimension.

Data creation routines¶

Ones and zeros¶

`empty`	Return a new array, without initialising entries.
`full`	Return new data filled with a fill value.
`ones`	Returns a new array filled with ones of set shape and type.
`zeros`	Returns a new array filled with zeros of set shape and type.
`masked_all`	Return an empty masked array with all elements masked.

From existing data¶

`copy`	Return a deep copy of the data.
`asdata`	Convert the input to a `Data` object.
`loadd`	Reset the data in place from a dictionary serialisation.
`loads`	Reset the data in place from a string serialisation.

Data manipulation routines¶

Changing data shape¶

`flatten`	Flatten specified axes of the data.
`reshape`	Change the shape of the data without changing its values.

Transpose-like operations¶

`swapaxes`	Interchange two axes of an array.
`transpose`	Permute the axes of the data array.

Changing number of dimensions¶

`insert_dimension`	Expand the shape of the data array in place.
`reshape`	Change the shape of the data without changing its values.
`squeeze`	Remove size 1 axes from the data array.

Joining data¶

`concatenate`	Join a sequence of data arrays together.
`concatenate_data`	Concatenates a list of Data objects along the specified axis.

Adding and removing elements¶

unique

The unique elements of the data.

Rearranging elements¶

`flip`	Reverse the direction of axes of the data array.
`roll`	Roll array elements along one or more axes.

Expanding the data¶

`halo`	Expand the data by adding a halo.
`pad_missing`	Pad an axis with missing data.

Binary operations¶

Date-time support¶

Attributes

`change_calendar`	Change the calendar of date-time array elements.
`convert_reference_time`	Convert reference time data values to have new units.
`datetime_array`	An independent numpy array of date-time objects.
`datetime_as_string`	Returns an independent numpy array with datetimes as strings.
`day`	The day of each date-time value.
`dtarray`	Alias for `datetime_array`
`hour`	The hour of each date-time value.
`minute`	The minute of each date-time value.
`month`	The month of each date-time value.
`second`	The second of each date-time value.
`year`	The year of each date-time value.

Indexing routines¶

Single value selection¶

`datum`	Return an element of the data array as a standard Python scalar.
`first_element`	Return the first element of the data as a scalar.
`second_element`	Return the second element of the data as a scalar.
`last_element`	Return the last element of the data as a scalar.

Iterating over data¶

`flat`	Return a flat iterator over elements of the data array.
`ndindex`	Return an iterator over the N-dimensional indices of the data array.

Cyclic axes¶

cyclic

Get or set the cyclic axes.

Input and output¶

`dumpd`	Return a serialisation of the data array.
`dumps`	Return a JSON string serialisation of the data array.
`tolist`	Return the data as a scalar or (nested) list.

Linear algebra¶

outerproduct

Compute the outer product with another data array.

Logic functions¶

Truth value testing¶

`all`	Test whether all data array elements evaluate to True.
`any`	Test whether any data array elements evaluate to True.

Comparison¶

`allclose`	Whether an array is element-wise equal within a tolerance.
`isclose`	Return where data are element-wise equal within a tolerance.
`equals`	True if two data arrays are logically equal, False otherwise.

Mask support¶

`apply_masking`	Apply masking.
`count`	Count the non-masked elements of the data.
`count_masked`	Count the masked elements of the data.
`compressed`	Return all non-masked values in a one dimensional data array.
`filled`	Replace masked elements with a fill value.
`harden_mask`	Force the mask to hard.
`masked_invalid`	Mask the array where invalid values occur (NaN or inf).
`masked_values`	Mask using floating point equality.
`del_fill_value`	Delete the fill value.
`get_fill_value`	Return the missing data value.
`has_fill_value`	Whether a fill value has been set.
`set_fill_value`	Set the missing data value.
`soften_mask`	Force the mask to soft.
`masked_where`	Mask the data where a condition is met.

Attributes

`binary_mask`	A binary (0 and 1) mask of the data array.
`hardmask`	Hardness of the mask.
`is_masked`	True if the data array has any masked values.
`mask`	The Boolean missing data mask of the data array.
`fill_value`	The data array missing data value.

Mathematical functions¶

Trigonometric functions¶

`sin`	Take the trigonometric sine of the data element-wise.
`cos`	Take the trigonometric cosine of the data element-wise.
`tan`	Take the trigonometric tangent of the data element-wise.
`arcsin`	Take the trigonometric inverse sine of the data element-wise.
`arccos`	Take the trigonometric inverse cosine of the data element- wise.
`arctan`	Take the trigonometric inverse tangent of the data element- wise.
`arctan2`	Element-wise arc tangent of `x1/x2` with correct quadrant.

Hyperbolic functions¶

`sinh`	Take the hyperbolic sine of the data element-wise.
`cosh`	Take the hyperbolic cosine of the data element-wise.
`tanh`	Take the hyperbolic tangent of the data element-wise.
`arcsinh`	Take the inverse hyperbolic sine of the data element-wise.
`arccosh`	Take the inverse hyperbolic cosine of the data element-wise.
`arctanh`	Take the inverse hyperbolic tangent of the data element-wise.

Rounding¶

`ceil`	The ceiling of the data, element-wise.
`floor`	Return the floor of the data array.
`rint`	Round the data to the nearest integer, element-wise.
`round`	Evenly round elements of the data array to the given number of decimals.
`trunc`	Return the truncated values of the data array.

Sums, products, differences, powers¶

`cumsum`	Return the data cumulatively summed along the given axis.
`diff`	Calculate the n-th discrete difference along the given axis.
`square`	Calculate the element-wise square.
`sqrt`	Calculate the non-negative square root.
`sum`	Calculate sum values.

Convolution filters

convolution_filter

Return the data convolved along the given axis with the specified filter.

Exponents and logarithms¶

`exp`	Take the exponential of the data array.
`log`	Takes the logarithm of the data array.

Miscellaneous¶

`clip`	Clip (limit) the values in the data array in place.
`func`	Apply an element-wise array operation to the data array.

Set routines¶

Making proper sets¶

unique

The unique elements of the data.

Sorting, searching, and counting¶

Searching¶

`argmax`	Return the indices of the maximum values along an axis.
`argmin`	Return the indices of the minimum values along an axis.
`where`	Assign array elements depending on a condition.

Counting¶

`count`	Count the non-masked elements of the data.
`count_masked`	Count the masked elements of the data.

Statistics¶

Order statistics¶

`maximum`	Alias for `max`
`maximum_absolute_value`	Calculate maximum absolute values.
`minimum`	Alias for `min`
`minimum_absolute_value`	Calculate minimum absolute values.
`percentile`	Compute percentiles of the data along the specified axes.
`max`	Calculate maximum values.
`min`	Calculate minimum values.

Averages and variances¶

`mean`	Calculate mean values.
`mean_absolute_value`	Calculate mean absolute values.
`mean_of_upper_decile`	Mean of values defined by the upper tenth of their distribution.
`median`	Calculate median values.
`mid_range`	Calculate mid-range values.
`range`	Calculate range values.
`root_mean_square`	Calculate root mean square (RMS) values.
`standard_deviation`	Alias for `std`
`variance`	Alias for `var`
`sd`	Alias for `std`
`std`	Calculate standard deviations.
`var`	Calculate variances.

Sums¶

`integral`	Calculate summed values.
`sum`	Calculate sum values.
`sum_of_squares`	Calculate sums of squares.

Histograms¶

digitize

Return the indices of the bins to which each value belongs.

Miscellaneous¶

`sample_size`	Calculate sample size values.
`stats`	Calculate statistics of the data.
`sum_of_weights`	Calculate sums of weights.
`sum_of_weights2`	Calculate sums of squares of weights.

Error handling¶

seterr

Set how floating-point errors in the results of arithmetic operations are handled.

Compression by convention¶

`get_compressed_axes`	Returns the dimensions that are compressed in the array.
`get_compressed_dimension`	Returns the compressed dimension's array position.
`get_compression_type`	Returns the type of compression applied to the array.
`get_count`	Return the count variable for a compressed array.
`get_index`	Return the index variable for a compressed array.
`get_list`	Return the list variable for a compressed array.
`get_dependent_tie_points`	Return the list variable for a compressed array.
`get_interpolation_parameters`	Return the list variable for a compressed array.
`get_tie_point_indices`	Return the list variable for a compressed array.
`uncompress`	Uncompress the data.

Attributes

compressed_array

Returns an independent numpy array of the compressed data.

Active storage¶

Attributes

Miscellaneous¶

`creation_commands`	Return the commands that would create the data object.
`get_data`	Returns the data.
`get_filenames`	The names of files containing parts of the data array.
`get_original_filenames`	The names of files containing the original data and metadata.
`source`	Return the underlying array object.

Attributes

data

The data as an object identity.

Performance¶

`nc_clear_dataset_chunksizes`	Clear the dataset chunking strategy for the data.
`nc_dataset_chunksizes`	Get the dataset chunking strategy for the data.
`nc_set_dataset_chunksizes`	Set the dataset chunking strategy for the data.
`rechunk`	Change the chunk structure of the data.
`close`	Close all files referenced by the data array.
`chunks`	The `dask` chunk sizes for each dimension.
`rechunk`	Change the chunk structure of the data.
`add_partitions`	Add partition boundaries.
`partition_boundaries`	Return the partition boundaries for each partition matrix dimension.
`ispartitioned`	True if the data array is partitioned.
`to_disk`	Store the data array on disk.
`to_memory`	Bring data on disk into memory.
`in_memory`	True if the array is retained in memory.
`fits_in_memory`	Return True if the array is small enough to be retained in memory.
`section`	Returns a dictionary of sections of the `Data` object.
`persist`	Persist data into memory.

Attributes

`chunks`	The `dask` chunk sizes for each dimension.
`npartitions`	The total number of chunks.
`numblocks`	The number of chunks along each dimension.

Aggregation¶

`file_directories`	The directories of files containing parts of the data.
`replace_directory`	Replace file directories in-place.
`replace_filenames`	Replace file locations in-place.
`nc_del_aggregated_data`	Remove the netCDF aggregated_data terms.
`nc_del_aggregation_write_status`	Set the netCDF aggregation write status to `False`.
`nc_get_aggregated_data`	Return the netCDF aggregated data terms.
`nc_get_aggregation_fragment_type`	The type of fragments in the aggregated data.
`nc_get_aggregation_write_status`	Get the netCDF aggregation write status.
`nc_has_aggregated_data`	Whether any netCDF aggregated_data terms have been set.
`nc_set_aggregated_data`	Set the netCDF aggregated_data elements.
`nc_set_aggregation_write_status`	Set the netCDF aggregation write status.

Element-wise arithmetic, bit and comparison operations¶

Arithmetic, bit and comparison operations are defined as element-wise data array operations which yield a new cf.Data object or, for augmented assignments, modify the data in-place.

Comparison operators

`__lt__`	The rich comparison operator `<`
`__le__`	The rich comparison operator `<=`
`__eq__`	The rich comparison operator `==`
`__ne__`	The rich comparison operator `!=`
`__gt__`	The rich comparison operator `>`
`__ge__`	The rich comparison operator `>=`

Truth value of an array

__bool__

Truth value testing and the built-in operation bool

Binary arithmetic operators

`__add__`	The binary arithmetic operation `+`
`__sub__`	The binary arithmetic operation `-`
`__mul__`	The binary arithmetic operation `*`
`__div__`	The binary arithmetic operation `/`
`__truediv__`	The binary arithmetic operation `/` (true division)
`__floordiv__`	The binary arithmetic operation `//`
`__pow__`	The binary arithmetic operations `**` and `pow`
`__mod__`	The binary arithmetic operation `%`

Binary arithmetic operators with reflected (swapped) operands

`__radd__`	The binary arithmetic operation `+` with reflected operands.
`__rsub__`	The binary arithmetic operation `-` with reflected operands.
`__rmul__`	The binary arithmetic operation `*` with reflected operands.
`__rdiv__`	The binary arithmetic operation `/` with reflected operands.
`__rtruediv__`	The binary arithmetic operation `/` (true division) with reflected operands.
`__rfloordiv__`	The binary arithmetic operation `//` with reflected operands.
`__rpow__`	The binary arithmetic operations `**` and `pow` with reflected operands.
`__rmod__`	The binary arithmetic operation `%` with reflected operands.

Augmented arithmetic assignments

`__iadd__`	The augmented arithmetic assignment `+=`
`__isub__`	The augmented arithmetic assignment `-=`
`__imul__`	The augmented arithmetic assignment `*=`
`__idiv__`	The augmented arithmetic assignment `/=`
`__itruediv__`	The augmented arithmetic assignment `/=` (true division)
`__ifloordiv__`	The augmented arithmetic assignment `//=`
`__ipow__`	The augmented arithmetic assignment `**=`
`__imod__`	The binary arithmetic operation `%=`

Unary arithmetic operators

`__neg__`	The unary arithmetic operation `-`
`__pos__`	The unary arithmetic operation `+`
`__abs__`	The unary arithmetic operation `abs`

Binary bitwise operators

`__and__`	The binary bitwise operation `&`
`__or__`	The binary bitwise operation `\|`
`__xor__`	The binary bitwise operation `^`
`__lshift__`	The binary bitwise operation `<<`
`__rshift__`	The binary bitwise operation `>>`

..rubric:: Binary bitwise operators with reflected (swapped) operands

`__rand__`	The binary bitwise operation `&` with reflected operands.
`__ror__`	The binary bitwise operation `\|` with reflected operands.
`__rxor__`	The binary bitwise operation `^` with reflected operands.
`__rlshift__`	The binary bitwise operation `<<` with reflected operands.
`__rrshift__`	The binary bitwise operation `>>` with reflected operands.

Augmented bitwise assignments

`__iand__`	The augmented bitwise assignment `&=`
`__ior__`	The augmented bitwise assignment `\|=`
`__ixor__`	The augmented bitwise assignment `^=`
`__ilshift__`	The augmented bitwise assignment `<<=`
`__irshift__`	The augmented bitwise assignment `>>=`

Unary bitwise operators

__invert__

The unary bitwise operation ~

Special¶

`__contains__`	Membership test operator `in`
`__deepcopy__`	Called by the `copy.deepcopy` function.
`__getitem__`	Return a subspace of the data defined by indices.
`__hash__`	The built-in function `hash`.
`__iter__`	Called when an iterator is required.
`__len__`	Called to implement the built-in function `len`.
`__repr__`	Called by the `repr` built-in function.
`__setitem__`	Implement indexed assignment.
`__str__`	Called by the `str` built-in function.
`__array__`	The numpy array interface.
`__data__`	Returns a new reference to self.
`__query_isclose__`	Query interface method for an "is close" condition.

Deprecated¶

Methods

`chunk`	Partition the data array.
`Data`	Deprecated at version 3.0.0, use attribute `data` instead.
`dtvarray`	Deprecated at version 3.0.0.
`dumpd`	Return a serialisation of the data array.
`dumps`	Return a JSON string serialisation of the data array.
`expand_dims`	Deprecated at version 3.0.0, use method `insert_dimension` instead.
`files`	Deprecated at version 3.4.0, consider using method `get_filenames` instead.
`fits_in_one_chunk_in_memory`	Return True if the master array is small enough to be retained in memory.
`HDF_chunks`	Get or set HDF chunk sizes.
`in_memory`	True if the array is retained in memory.
`ismasked`	True if the data array has any masked values.
`mask_fpe`	Masking of floating-point errors in the results of arithmetic operations.
`mask_invalid`	Mask the array where invalid values occur (NaN or inf).
`partition_boundaries`	Return the partition boundaries for each partition matrix dimension.
`reconstruct_sectioned_data`	Expects a dictionary of Data objects with ordering information as keys, as output by the section method when called with a Data object.
`save_to_disk`	Deprecated.
`seterr`	Set how floating-point errors in the results of arithmetic operations are handled.
`to_disk`	Store the data array on disk.
`to_memory`	Bring data on disk into memory.
`unsafe_array`	Deprecated at version 3.0.0.
`nc_clear_hdf5_chunksizes`	Clear the HDF5 chunking strategy for the data.
`nc_hdf5_chunksizes`	Get the HDF5 chunking strategy for the data.
`nc_set_hdf5_chunksizes`	Set the HDF5 chunking strategy for the data.

Attributes

`ispartitioned`	True if the data array is partitioned.
`varray`	A numpy array view of the data array.