cf.Data.stats

Data.stats(all=False, compute=True, minimum=True, mean=True, median=True, maximum=True, range=True, mid_range=True, standard_deviation=True, root_mean_square=True, sample_size=True, minimum_absolute_value=False, maximum_absolute_value=False, mean_absolute_value=False, mean_of_upper_decile=False, sum=False, sum_of_squares=False, variance=False, weights=None)[source]

Calculate statistics of the data.

By default the minimum, mean, median, maximum, range, mid-range, standard deviation, root mean square, and sample size are calculated. But this selection may be edited, and other metrics are available.

Parameters
all: bool, optional

Calculate all possible statistics, regardless of the value of individual metric parameters.

compute: bool, optional

If True (the default), returned values for the statistical calculations in the output dictionary are computed, else each is given in the form of a delayed Data operation.

minimum: bool, optional

Calculate the minimum of the values.

maximum: bool, optional

Calculate the maximum of the values.

maximum_absolute_value: bool, optional

Calculate the maximum of the absolute values.

minimum_absolute_value: bool, optional

Calculate the minimum of the absolute values.

mid_range: bool, optional

Calculate the average of the maximum and the minimum of the values.

median: bool, optional

Calculate the median of the values.

range: bool, optional

Calculate the absolute difference between the maximum and the minimum of the values.

sum: bool, optional

Calculate the sum of the values.

sum_of_squares: bool, optional

Calculate the sum of the squares of values.

sample_size: bool, optional

Calculate the sample size, i.e. the number of non-missing values.

mean: bool, optional

Calculate the weighted or unweighted mean of the values.

mean_absolute_value: bool, optional

Calculate the mean of the absolute values.

mean_of_upper_decile: bool, optional

Calculate the mean of the upper group of data values defined by the upper tenth of their distribution.

variance: bool, optional

Calculate the weighted or unweighted variance of the values, with a given number of degrees of freedom.

standard_deviation: bool, optional

Calculate the square root of the weighted or unweighted variance.

root_mean_square: bool, optional

Calculate the square root of the weighted or unweighted mean of the squares of the values.

weights: data_like, dict, or None, optional

Weights associated with values of the data. By default weights is None, meaning that all non-missing elements of the data have a weight of 1 and all missing elements have a weight of 0.

If weights is a data_like object then it must be broadcastable to the array.

If weights is a dictionary then each key specifies axes of the data (an int or tuple of int), with a corresponding value of data_like weights for those axes. The dimensions of a weights value must correspond to its key axes in the same order. Not all of the axes need weights assigned to them. The weights that will be used will be an outer product of the dictionary’s values.

However they are specified, the weights are internally broadcast to the shape of the data, and those weights that are missing data, or that correspond to the missing elements of the data, are assigned a weight of 0.

Returns
dict

The statistics, with keys giving the operation names and values being the result of the corresponding statistical calculation, which are either the computed numerical values if compute is True, else the delayed Data operations which encapsulate those.

Examples

>>> d = cf.Data([[0, 1, 2], [3, -99, 5]], mask=[[0, 0, 0], [0, 1, 0]])
>>> print(d.array)
[[0  1  2]
 [3 --  5]]
>>> d.stats()
{'minimum': 0,
 'mean': 2.2,
 'median': 2.0,
 'maximum': 5,
 'range': 5,
 'mid_range': 2.5,
 'standard_deviation': 1.7204650534085255,
 'root_mean_square': 2.792848008753788,
 'sample_size': 5}
>>> d.stats(all=True)
{'minimum': 0,
 'mean': 2.2,
 'median': 2.0,
 'maximum': 5,
 'range': 5,
 'mid_range': 2.5,
 'standard_deviation': 1.7204650534085255,
 'root_mean_square': 2.792848008753788,
 'minimum_absolute_value': 0,
 'maximum_absolute_value': 5,
 'mean_absolute_value': 2.2,
 'mean_of_upper_decile': 5.0,
 'sum': 11,
 'sum_of_squares': 39,
 'variance': 2.9600000000000004,
 'sample_size': 5}
>>> d.stats(mean_of_upper_decile=True, range=False)
{'minimum': 0,
 'mean': 2.2,
 'median': 2.0,
 'maximum': 5,
 'mid_range': 2.5,
 'standard_deviation': 1.7204650534085255,
 'root_mean_square': 2.792848008753788,
 'mean_of_upper_decile': 5.0,
 'sample_size': 5}

To ask for delayed operations instead of computed values:

>>> d.stats(compute=False)
{'minimum': <CF Data(): 0>,
 'mean': <CF Data(): 2.2>,
 'median': <CF Data(): 2.0>,
 'maximum': <CF Data(): 5>,
 'range': <CF Data(): 5>,
 'mid_range': <CF Data(): 2.5>,
 'standard_deviation': <CF Data(): 1.7204650534085255>,
 'root_mean_square': <CF Data(): 2.792848008753788>,
 'sample_size': <CF Data(1, 1): [[5]]>}