cf.Data

class cf.Data(array=None, units=None, calendar=None, fill_value=None, hardmask=True, chunks='auto', dt=False, source=None, copy=True, dtype=None, mask=None, mask_value=None, to_memory=False, init_options=None, _use_array=True)[source]

Bases: cf.data.mixin.deprecations.DataClassDeprecationsMixin, cf.mixin2.cfanetcdf.CFANetCDF, cf.mixin2.container.Container, cfdm.data.data.Data

An N-dimensional data array with units and masked values.

  • Contains an N-dimensional, indexable and broadcastable array with many similarities to a numpy array.

  • Contains the units of the array elements.

  • Supports masked arrays, regardless of whether or not it was initialised with a masked array.

  • Stores and operates on data arrays which are larger than the available memory.

Indexing

A data array is indexable in a similar way to numpy array:

>>> d.shape
(12, 19, 73, 96)
>>> d[...].shape
(12, 19, 73, 96)
>>> d[slice(0, 9), 10:0:-2, :, :].shape
(9, 5, 73, 96)

There are three extensions to the numpy indexing functionality:

  • Size 1 dimensions are never removed by indexing.

    An integer index i takes the i-th element but does not reduce the rank of the output array by one:

    >>> d.shape
    (12, 19, 73, 96)
    >>> d[0, ...].shape
    (1, 19, 73, 96)
    >>> d[:, 3, slice(10, 0, -2), 95].shape
    (12, 1, 5, 1)
    

    Size 1 dimensions may be removed with the squeeze method.

  • The indices for each axis work independently.

    When more than one dimension’s slice is a 1-d boolean sequence or 1-d sequence of integers, then these indices work independently along each dimension (similar to the way vector subscripts work in Fortran), rather than by their elements:

    >>> d.shape
    (12, 19, 73, 96)
    >>> d[0, :, [0, 1], [0, 13, 27]].shape
    (1, 19, 2, 3)
    
  • Boolean indices may be any object which exposes the numpy array interface.

    >>> d.shape
    (12, 19, 73, 96)
    >>> d[..., d[0, 0, 0]>d[0, 0, 0].min()]
    

Cyclic axes

Initialisation

Parameters
array: optional

The array of values. May be a scalar or array-like object, including another Data instance, anything with a to_dask_array method, numpy array, dask array, xarray array, cf.Array subclass, list, tuple, scalar.

Parameter example:

array=34.6

Parameter example:

array=[[1, 2], [3, 4]]

Parameter example:

array=numpy.ma.arange(10).reshape(2, 1, 5)

units: str or Units, optional

The physical units of the data. if a Units object is provided then this an also set the calendar.

The units (without the calendar) may also be set after initialisation with the set_units method.

Parameter example:

units='km hr-1'

Parameter example:

units='days since 2018-12-01'

calendar: str, optional

The calendar for reference time units.

The calendar may also be set after initialisation with the set_calendar method.

Parameter example:

calendar='360_day'

fill_value: optional

The fill value of the data. By default, or if set to None, the numpy fill value appropriate to the array’s data-type will be used (see numpy.ma.default_fill_value).

The fill value may also be set after initialisation with the set_fill_value method.

Parameter example:

fill_value=-999.

dtype: data-type, optional

The desired data-type for the data. By default the data-type will be inferred form the array parameter.

The data-type may also be set after initialisation with the dtype attribute.

Parameter example:

dtype=float

Parameter example:

dtype='float32'

Parameter example:

dtype=numpy.dtype('i2')

New in version 3.0.4.

mask: optional

Apply this mask to the data given by the array parameter. By default, or if mask is None, no mask is applied. May be any scalar or array-like object (such as a list, numpy array or Data instance) that is broadcastable to the shape of array. Masking will be carried out where the mask elements evaluate to True.

This mask will applied in addition to any mask already defined by the array parameter.

New in version 3.0.5.

mask_value: scalar array_like

Mask array where it is equal to mask_value, using numerically tolerant floating point equality.

New in version 3.16.0.

source: optional

Convert source, which can be any type of object, to a Data instance.

All other parameters, apart from copy, are ignored and their values are instead inferred from source by assuming that it has the Data API. Any parameters that can not be retrieved from source in this way are assumed to have their default value.

Note that if x is also a Data instance then cf.Data(source=x) is equivalent to x.copy().

hardmask: bool, optional

If False then the mask is soft. By default the mask is hard.

dt: bool, optional

If True then strings (such as '1990-12-01 12:00') given by the array parameter are re-interpreted as date-time objects. By default they are not.

copy: bool, optional

If True (the default) then deep copy the input parameters prior to initialisation. By default the parameters are not deep copied.

chunks: int, tuple, dict or str, optional

Specify the chunking of the underlying dask array.

Any value accepted by the chunks parameter of the dask.array.from_array function is allowed.

By default, "auto" is used to specify the array chunking, which uses a chunk size in bytes defined by the cf.chunksize function, preferring square-like chunk shapes.

Parameter example:

A blocksize like 1000.

Parameter example:

A blockshape like (1000, 1000).

Parameter example:

Explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).

Parameter example:

A size in bytes, like "100MiB" which will choose a uniform block-like shape, preferring square-like chunk shapes.

Parameter example:

A blocksize of -1 or None in a tuple or dictionary indicates the size of the corresponding dimension.

Parameter example:

Blocksizes of some or all dimensions mapped to dimension positions, like {1: 200}, or {0: -1, 1: (400, 400)}.

New in version 3.14.0.

to_memory: bool, optional

If True then ensure that the original data are in memory, rather than on disk.

If the original data are on disk, then reading data into memory during initialisation will slow down the initialisation process, but can considerably improve downstream performance by avoiding the need for independent reads for every dask chunk, each time the data are computed.

In general, setting to_memory to True is not the same as calling the persist of the newly created Data object, which also decompresses data compressed by convention and computes any data type, mask and date-time modifications.

If the input array is a dask.array.Array object then to_memory is ignored.

New in version 3.14.0.

init_options: dict, optional

Provide optional keyword arguments to methods and functions called during the initialisation process. A dictionary key identifies a method or function. The corresponding value is another dictionary whose key/value pairs are the keyword parameter names and values to be applied.

Supported keys are:

  • 'from_array': Provide keyword arguments to the dask.array.from_array function. This is used when initialising data that is not already a dask array and is not compressed by convention.

  • 'first_non_missing_value': Provide keyword arguments to the cf.data.utils.first_non_missing_value function. This is used when the input array contains date-time strings or objects, and may affect performance.

Parameter example:

{'from_array': {'inline_array': True}}

chunk: deprecated at version 3.14.0

Use the chunks parameter instead.

Examples

>>> d = cf.Data(5)
>>> d = cf.Data([1,2,3], units='K')
>>> import numpy
>>> d = cf.Data(numpy.arange(10).reshape(2,5),
...             units=Units('m/s'), fill_value=-999)
>>> d = cf.Data('fly')
>>> d = cf.Data(tuple('fly'))

Inspection

Attributes

array

A numpy array copy of the data.

dtype

The numpy data-type of the data.

ndim

Number of dimensions in the data array.

shape

Tuple of the data array’s dimension sizes.

size

Number of elements in the data array.

nbytes

Total number of bytes consumed by the elements of the array.

dump

Return a string containing a full description of the instance.

inspect

Inspect the object for debugging.

isscalar

True if the data is a 0-d scalar array.

sparse_array

Return an independent scipy sparse array of the data.

Units

del_units

Delete the units.

get_units

Return the units.

has_units

Whether units have been set.

set_units

Set the units.

override_units

Override the data array units.

del_calendar

Delete the calendar.

get_calendar

Return the calendar.

has_calendar

Whether a calendar has been set.

set_calendar

Set the calendar.

override_calendar

Override the calendar of the data array elements.

change_calendar

Change the calendar of date-time array elements.

Attributes

Units

The cf.Units object containing the units of the data array.

Dask

compute

A view of the computed data.

cull_graph

Remove unnecessary tasks from the dask graph in-place.

dask_compressed_array

Returns a dask array of the compressed data.

rechunk

Change the chunk structure of the data.

chunk_indices

Return indices that define each dask compute chunk.

todict

Return a dictionary of the dask graph key/value pairs.

to_dask_array

Convert the data to a dask array.

get_deterministic_name

Get the deterministic name for the data.

has_deterministic_name

Whether there is a deterministic name for the data.

Attributes

chunks

The chunk sizes for each dimension.

npartitions

The total number of chunks.

numblocks

The number of chunks along each dimension.

Data creation routines

Ones and zeros

empty

Return a new array of given shape and type, without initialising entries.

full

Return a new array of given shape and type, filled with a fill value.

ones

Returns a new array filled with ones of set shape and type.

zeros

Returns a new array filled with zeros of set shape and type.

masked_all

Return an empty masked array with all elements masked.

From existing data

copy

Return a deep copy.

asdata

Convert the input to a Data object.

loadd

Reset the data in place from a dictionary serialisation.

loads

Reset the data in place from a string serialisation.

Data manipulation routines

Changing data shape

flatten

Flatten specified axes of the data.

reshape

Change the shape of the data without changing its values.

Transpose-like operations

swapaxes

Interchange two axes of an array.

transpose

Permute the axes of the data array.

Changing number of dimensions

insert_dimension

Expand the shape of the data array in place.

reshape

Change the shape of the data without changing its values.

squeeze

Remove size 1 axes from the data array.

Joining data

concatenate

Join a sequence of data arrays together.

concatenate_data

Concatenates a list of Data objects along the specified axis.

Adding and removing elements

unique

The unique elements of the data.

Rearranging elements

flip

Reverse the direction of axes of the data array.

roll

Roll array elements along one or more axes.

Expanding the data

halo

Expand the data by adding a halo.

pad_missing

Pad an axis with missing data.

Binary operations

Date-time support

Attributes

change_calendar

Change the calendar of date-time array elements.

convert_reference_time

Convert reference time data values to have new units.

datetime_array

An independent numpy array of date-time objects.

datetime_as_string

Returns an independent numpy array with datetimes as strings.

day

The day of each date-time value.

dtarray

Alias for datetime_array

hour

The hour of each date-time value.

minute

The minute of each date-time value.

month

The month of each date-time value.

second

The second of each date-time value.

year

The year of each date-time value.

Indexing routines

Single value selection

datum

Return an element of the data array as a standard Python scalar.

first_element

Return the first element of the data as a scalar.

second_element

Return the second element of the data as a scalar.

last_element

Return the last element of the data as a scalar.

Iterating over data

flat

Return a flat iterator over elements of the data array.

ndindex

Return an iterator over the N-dimensional indices of the data array.

Cyclic axes

cyclic

Get or set the cyclic axes.

Input and output

dumpd

Return a serialisation of the data array.

dumps

Return a JSON string serialisation of the data array.

tolist

Return the data as a scalar or (nested) list.

Linear algebra

outerproduct

Compute the outer product with another data array.

Logic functions

Truth value testing

all

Test whether all data array elements evaluate to True.

any

Test whether any data array elements evaluate to True.

Comparison

allclose

Whether an array is element-wise equal within a tolerance.

isclose

Return where data are element-wise equal within a tolerance.

equals

True if two data arrays are logically equal, False otherwise.

Mask support

apply_masking

Apply masking.

count

Count the non-masked elements of the data.

count_masked

Count the masked elements of the data.

compressed

Return all non-masked values in a one dimensional data array.

filled

Replace masked elements with a fill value.

harden_mask

Force the mask to hard.

masked_invalid

Mask the array where invalid values occur (NaN or inf).

masked_values

Mask using floating point equality.

del_fill_value

Delete the fill value.

get_fill_value

Return the missing data value.

has_fill_value

Whether a fill value has been set.

set_fill_value

Set the missing data value.

soften_mask

Force the mask to soft.

Attributes

binary_mask

A binary (0 and 1) mask of the data array.

hardmask

Hardness of the mask.

is_masked

True if the data array has any masked values.

mask

The Boolean missing data mask of the data array.

fill_value

The data array missing data value.

Mathematical functions

Trigonometric functions

sin

Take the trigonometric sine of the data element-wise.

cos

Take the trigonometric cosine of the data element-wise.

tan

Take the trigonometric tangent of the data element-wise.

arcsin

Take the trigonometric inverse sine of the data element-wise.

arccos

Take the trigonometric inverse cosine of the data element- wise.

arctan

Take the trigonometric inverse tangent of the data element- wise.

arctan2

Element-wise arc tangent of x1/x2 with correct quadrant.

Hyperbolic functions

sinh

Take the hyperbolic sine of the data element-wise.

cosh

Take the hyperbolic cosine of the data element-wise.

tanh

Take the hyperbolic tangent of the data element-wise.

arcsinh

Take the inverse hyperbolic sine of the data element-wise.

arccosh

Take the inverse hyperbolic cosine of the data element-wise.

arctanh

Take the inverse hyperbolic tangent of the data element-wise.

Rounding

ceil

The ceiling of the data, element-wise.

floor

Return the floor of the data array.

rint

Round the data to the nearest integer, element-wise.

round

Evenly round elements of the data array to the given number of decimals.

trunc

Return the truncated values of the data array.

Sums, products, differences, powers

cumsum

Return the data cumulatively summed along the given axis.

diff

Calculate the n-th discrete difference along the given axis.

square

Calculate the element-wise square.

sqrt

Calculate the non-negative square root.

sum

Calculate sum values.

Convolution filters

convolution_filter

Return the data convolved along the given axis with the specified filter.

Exponents and logarithms

exp

Take the exponential of the data array.

log

Takes the logarithm of the data array.

Miscellaneous

clip

Clip (limit) the values in the data array in place.

func

Apply an element-wise array operation to the data array.

Set routines

Making proper sets

unique

The unique elements of the data.

Sorting, searching, and counting

Searching

argmax

Return the indices of the maximum values along an axis.

argmin

Return the indices of the minimum values along an axis.

where

Assign array elements depending on a condition.

Counting

count

Count the non-masked elements of the data.

count_masked

Count the masked elements of the data.

Statistics

Order statistics

maximum

Alias for max

maximum_absolute_value

Calculate maximum absolute values.

minimum

Alias for min

minimum_absolute_value

Calculate minimum absolute values.

percentile

Compute percentiles of the data along the specified axes.

max

Calculate maximum values.

min

Calculate minimum values.

Averages and variances

mean

Calculate mean values.

mean_absolute_value

Calculate mean absolute values.

mean_of_upper_decile

Mean of values defined by the upper tenth of their distribution.

median

Calculate median values.

mid_range

Calculate mid-range values.

range

Calculate range values.

root_mean_square

Calculate root mean square (RMS) values.

standard_deviation

Alias for std

variance

Alias for var

sd

Alias for std

std

Calculate standard deviations.

var

Calculate variances.

Sums

integral

Calculate summed values.

sum

Calculate sum values.

sum_of_squares

Calculate sums of squares.

Histograms

digitize

Return the indices of the bins to which each value belongs.

Miscellaneous

sample_size

Calculate sample size values.

stats

Calculate statistics of the data.

sum_of_weights

Calculate sums of weights.

sum_of_weights2

Calculate sums of squares of weights.

Error handling

seterr

Set how floating-point errors in the results of arithmetic operations are handled.

Compression by convention

get_compressed_axes

Returns the dimensions that are compressed in the array.

get_compressed_dimension

Returns the compressed dimension’s array position.

get_compression_type

Returns the type of compression applied to the array.

get_count

Return the count variable for a compressed array.

get_index

Return the index variable for a compressed array.

get_list

Return the list variable for a compressed array.

get_dependent_tie_points

Return the list variable for a compressed array.

get_interpolation_parameters

Return the list variable for a compressed array.

get_tie_point_indices

Return the list variable for a compressed array.

uncompress

Uncompress the data.

Attributes

compressed_array

Returns an independent numpy array of the compressed data.

Miscellaneous

creation_commands

Return the commands that would create the data object.

get_data

Returns the data.

get_filenames

The names of files containing parts of the data array.

get_original_filenames

The names of files containing the original data and metadata.

source

Return the underlying array object.

cf.Data.source.get_deterministic_name

cf.Data.source.has_deterministic_name

Attributes

data

The data as an object identity.

Performance

nc_clear_hdf5_chunksizes

Clear the HDF5 chunksizes for the data.

nc_hdf5_chunksizes

Return the HDF5 chunksizes for the data.

nc_set_hdf5_chunksizes

Set the HDF5 chunksizes for the data.

rechunk

Change the chunk structure of the data.

close

Close all files referenced by the data array.

chunks

The chunk sizes for each dimension.

rechunk

Change the chunk structure of the data.

cf.Data.cull

add_partitions

Add partition boundaries.

partition_boundaries

Return the partition boundaries for each partition matrix dimension.

cf.Data.partition_configuration

cf.Data.partitions

ispartitioned

True if the data array is partitioned.

to_disk

Store the data array on disk.

to_memory

Bring data on disk into memory.

in_memory

True if the array is retained in memory.

fits_in_memory

Return True if the array is small enough to be retained in memory.

section

Returns a dictionary of sections of the Data object.

persist

Persist the underlying dask array into memory.

Attributes

chunks

The chunk sizes for each dimension.

npartitions

The total number of chunks.

numblocks

The number of chunks along each dimension.

CFA

file_locations

The locations of files containing parts of the data.

del_file_location

Remove a file location in-place.

add_file_location

Add a new file location in-place.

cfa_clear_file_substitutions

Remove all of the CFA-netCDF file name substitutions.

cfa_file_substitutions

Return the CFA-netCDF file name substitutions.

cfa_update_file_substitutions

Set CFA-netCDF file name substitutions.

cfa_del_file_substitution

Remove a CFA-netCDF file name substitution.

cfa_has_file_substitutions

Whether any CFA-netCDF file name substitutions have been set.

cfa_del_aggregated_data

Remove the CFA-netCDF aggregation instruction terms.

cfa_get_aggregated_data

Return the CFA-netCDF aggregation instruction terms.

cfa_has_aggregated_data

Whether any CFA-netCDF aggregation instruction terms have been set.

cfa_set_aggregated_data

Set the CFA-netCDF aggregation instruction terms.

cfa_get_term

The CFA aggregation instruction term status.

cfa_get_write

The CFA write status of the data.

cfa_set_term

Set the CFA aggregation instruction term status.

cfa_set_write

Set the CFA write status of the data.

Element-wise arithmetic, bit and comparison operations

Arithmetic, bit and comparison operations are defined as element-wise data array operations which yield a new cf.Data object or, for augmented assignments, modify the data in-place.

Comparison operators

__lt__

The rich comparison operator <

__le__

The rich comparison operator <=

__eq__

The rich comparison operator ==

__ne__

The rich comparison operator !=

__gt__

The rich comparison operator >

__ge__

The rich comparison operator >=

Truth value of an array

__bool__

Truth value testing and the built-in operation bool

Binary arithmetic operators

__add__

The binary arithmetic operation +

__sub__

The binary arithmetic operation -

__mul__

The binary arithmetic operation *

__div__

The binary arithmetic operation /

__truediv__

The binary arithmetic operation / (true division)

__floordiv__

The binary arithmetic operation //

__pow__

The binary arithmetic operations ** and pow

__mod__

The binary arithmetic operation %

Binary arithmetic operators with reflected (swapped) operands

__radd__

The binary arithmetic operation + with reflected operands.

__rsub__

The binary arithmetic operation - with reflected operands.

__rmul__

The binary arithmetic operation * with reflected operands.

__rdiv__

The binary arithmetic operation / with reflected operands.

__rtruediv__

The binary arithmetic operation / (true division) with reflected operands.

__rfloordiv__

The binary arithmetic operation // with reflected operands.

__rpow__

The binary arithmetic operations ** and pow with reflected operands.

__rmod__

The binary arithmetic operation % with reflected operands.

Augmented arithmetic assignments

__iadd__

The augmented arithmetic assignment +=

__isub__

The augmented arithmetic assignment -=

__imul__

The augmented arithmetic assignment *=

__idiv__

The augmented arithmetic assignment /=

__itruediv__

The augmented arithmetic assignment /= (true division)

__ifloordiv__

The augmented arithmetic assignment //=

__ipow__

The augmented arithmetic assignment **=

__imod__

The binary arithmetic operation %=

Unary arithmetic operators

__neg__

The unary arithmetic operation -

__pos__

The unary arithmetic operation +

__abs__

The unary arithmetic operation abs

Binary bitwise operators

__and__

The binary bitwise operation &

__or__

The binary bitwise operation |

__xor__

The binary bitwise operation ^

__lshift__

The binary bitwise operation <<

__rshift__

The binary bitwise operation >>

..rubric:: Binary bitwise operators with reflected (swapped) operands

__rand__

The binary bitwise operation & with reflected operands.

__ror__

The binary bitwise operation | with reflected operands.

__rxor__

The binary bitwise operation ^ with reflected operands.

__rlshift__

The binary bitwise operation << with reflected operands.

__rrshift__

The binary bitwise operation >> with reflected operands.

Augmented bitwise assignments

__iand__

The augmented bitwise assignment &=

__ior__

The augmented bitwise assignment |=

__ixor__

The augmented bitwise assignment ^=

__ilshift__

The augmented bitwise assignment <<=

__irshift__

The augmented bitwise assignment >>=

Unary bitwise operators

__invert__

The unary bitwise operation ~

Special

__contains__

Membership test operator in

__deepcopy__

Called by the copy.deepcopy function.

__getitem__

Return a subspace of the data defined by indices.

__hash__

The built-in function hash.

__iter__

Called when an iterator is required.

__len__

Called to implement the built-in function len.

__repr__

Called by the repr built-in function.

__setitem__

Implement indexed assignment.

__str__

Called by the str built-in function.

__array__

The numpy array interface.

__data__

Returns a new reference to self.

__query_isclose__

Query interface method for an “is close” condition.

Deprecated

Methods

chunk

Partition the data array.

Data

Deprecated at version 3.0.0, use attribute data instead.

dtvarray

Deprecated at version 3.0.0.

dumpd

Return a serialisation of the data array.

dumps

Return a JSON string serialisation of the data array.

expand_dims

Deprecated at version 3.0.0, use method insert_dimension instead.

files

Deprecated at version 3.4.0, consider using method get_filenames instead.

fits_in_one_chunk_in_memory

Return True if the master array is small enough to be retained in memory.

HDF_chunks

Get or set HDF chunk sizes.

in_memory

True if the array is retained in memory.

ismasked

True if the data array has any masked values.

mask_fpe

Masking of floating-point errors in the results of arithmetic operations.

mask_invalid

Mask the array where invalid values occur (NaN or inf).

partition_boundaries

Return the partition boundaries for each partition matrix dimension.

cf.Data.partition_configuration

cf.Data.partitions

reconstruct_sectioned_data

Expects a dictionary of Data objects with ordering information as keys, as output by the section method when called with a Data object.

save_to_disk

Deprecated.

seterr

Set how floating-point errors in the results of arithmetic operations are handled.

to_disk

Store the data array on disk.

to_memory

Bring data on disk into memory.

unsafe_array

Deprecated at version 3.0.0.

Attributes

ispartitioned

True if the data array is partitioned.

varray

A numpy array view of the data array.