cf.Data.mean_of_upper_decile¶

Data.
mean_of_upper_decile
(axes=None, weights=None, method='linear', squeeze=False, mtol=1, include_decile=True, split_every=None, inplace=False)[source]¶ Mean of values defined by the upper tenth of their distribution.
For the values defined by the upper tenth of their distribution, calculates their mean, or their mean along axes.
See https://ncascms.github.io/cfpython/analysis.html#collapsemethods for mathematical definitions.
..seealso::
mean
,median
,percentile
 Parameters
 axes: (sequence of)
int
, optional The axes to be collapsed. By default all axes are collapsed, resulting in output with size 1. Each axis is identified by its integer position. If axes is an empty sequence then the collapse is applied to each scalar element and the result has the same shape as the input data.
 weights: data_like,
dict
, orNone
, optional Weights associated with values of the data. By default weights is
None
, meaning that all nonmissing elements of the data have a weight of 1 and all missing elements have a weight of 0.If weights is a data_like object then it must be broadcastable to the array.
If weights is a dictionary then each key specifies axes of the data (an
int
ortuple
ofint
), with a corresponding value of data_like weights for those axes. The dimensions of a weights value must correspond to its key axes in the same order. Not all of the axes need weights assigned to them. The weights that will be used will be an outer product of the dictionary’s values.However they are specified, the weights are internally broadcast to the shape of the data, and those weights that are missing data, or that correspond to the missing elements of the data, are assigned a weight of 0.
Note
weights only applies to the calculation of the mean defined by the upper tenth of their distribution.
 method:
str
, optional Specify the interpolation method to use when the percentile lies between two data values. The methods are listed here, but their definitions must be referenced from the documentation for
numpy.percentile
.For the default
'linear'
method, if the percentile lies between two adjacent data valuesi < j
then the percentile is calculated asi+(ji)*fraction
, wherefraction
is the fractional part of the index surrounded byi
andj
.'inverted_cdf'
'averaged_inverted_cdf'
'closest_observation'
'interpolated_inverted_cdf'
'hazen'
'weibull'
'linear'
(default)'median_unbiased'
'normal_unbiased'
'lower'
'higher'
'nearest'
'midpoint'
===============================New in version 3.14.0.
 squeeze:
bool
, optional By default, the axes which are collapsed are left in the result as dimensions with size one, so that the result will broadcast correctly against the input array. If set to True then collapsed axes are removed from the data.
 mtol: number, optional
The sample size threshold below which collapsed values are set to missing data. It is defined as a fraction (between 0 and 1 inclusive) of the contributing input data values.
The default of mtol is 1, meaning that a missing datum in the output array occurs whenever all of its contributing input array elements are missing data.
For other values, a missing datum in the output array occurs whenever more than
100*mtol%
of its contributing input array elements are missing data.Note that for nonzero values of mtol, different collapsed elements may have different sample sizes, depending on the distribution of missing data in the input data.
Note
mtol only applies to the calculation of the location of the 90th percentile.
 include_decile:
bool
, optional If True then include in the mean any values that are equal to the 90th percentile. By default these are excluded.
 split_every:
int
ordict
, optional Determines the depth of the recursive aggregation. If set to or more than the number of input chunks, the aggregation will be performed in two steps, one partial collapse per input chunk and a single aggregation at the end. If set to less than that, an intermediate aggregation step will be used, so that any of the intermediate or final aggregation steps operates on no more than
split_every
inputs. The depth of the aggregation graph will be \(log_{split_every}(input chunks along reduced axes)\). Setting to a low value can reduce cache size and network transfers, at the cost of more CPU and a larger dask graph.By default,
dask
heuristically decides on a good value. A default can also be set globally with thesplit_every
key indask.config
. Seedask.array.reduction
for details.New in version 3.14.0.
 inplace:
bool
, optional If True then do the operation inplace and return
None
.
 axes: (sequence of)
 Returns
Examples
>>> d = cf.Data(np.arange(20).reshape(4, 5), 'm') >>> print(d.array) [[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19]] >>> e = d.mean_of_upper_decile() >>> e <CF Data(1, 1): [[18.5]] m>