cf.Data.digitize¶

Data.
digitize
(bins, upper=False, open_ends=False, closed_ends=None, return_bins=False)[source]¶ Return the indices of the bins to which each value belongs.
Values (including masked values) that do not belong to any bin result in masked values in the output data.
Bins defined by percentiles are easily created with the
percentiles
method Example:
Find the indices for bins defined by the 10th, 50th and 90th percentiles:
>>> bins = d.percentile([0, 10, 50, 90, 100], squeeze=True) >>> i = f.digitize(bins, closed_ends=True)
New in version 3.0.2.
See also
 Parameters
 bins: array_like
The bin boundaries. One of:
An integer.
Create this many equally sized, contiguous bins spanning the range of the data. I.e. the smallest bin boundary is the minimum of the data and the largest bin boundary is the maximum of the data. In order to guarantee that each data value lies inside a bin, the closed_ends parameter is assumed to be True.
A 1d array of numbers.
When sorted into a monotonically increasing sequence, each boundary, with the exception of the two end boundaries, counts as the upper boundary of one bin and the lower boundary of next. If the open_ends parameter is True then the lowest lower bin boundary also defines a leftopen (i.e. not bounded below) bin, and the largest upper bin boundary also defines a rightopen (i.e. not bounded above) bin.
A 2d array of numbers.
The second dimension, that must have size 2, contains the lower and upper bin boundaries. Different bins may share a boundary, but may not overlap. If the open_ends parameter is True then the lowest lower bin boundary also defines a leftopen (i.e. not bounded below) bin, and the largest upper bin boundary also defines a rightopen (i.e. not bounded above) bin.
 upper:
bool
, optional If True then each bin includes its upper bound but not its lower bound. By default the opposite is applied, i.e. each bin includes its lower bound but not its upper bound.
 open_ends:
bool
, optional If True then create leftopen (i.e. not bounded below) and rightopen (i.e. not bounded above) bins from the lowest lower bin boundary and largest upper bin boundary respectively. By default these bins are not created
 closed_ends:
bool
, optional If True then extend the most extreme open boundary by a small amount so that its bin includes values that are equal to the unadjusted boundary value. This is done by multiplying it by
1.0  epsilon
or1.0 + epsilon
, whichever extends the boundary in the appropriate direction, whereepsilon
is the smallest positive 64bit float such that1.0 + epsilson != 1.0
. I.e. if upper is False then the largest upper bin boundary is made slightly larger and if upper is True then the lowest lower bin boundary is made slightly lower.By default closed_ends is assumed to be True if bins is a scalar and False otherwise.
 return_bins:
bool
, optional If True then also return the bins in their 2d form.
 Returns
Examples:
>>> d = cf.Data(numpy.arange(12).reshape(3, 4)) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
Equivalant ways to create indices for the four bins
[inf, 2), [2, 6), [6, 10), [10, inf)
>>> e = d.digitize([2, 6, 10]) >>> e = d.digitize([[2, 6], [6, 10]]) >>> print(e.array) [[0 0 1 1] [1 1 2 2] [2 2 3 3]]
Equivalant ways to create indices for the two bins
(2, 6], (6, 10]
>>> e = d.digitize([2, 6, 10], upper=True, open_ends=False) >>> e = d.digitize([[2, 6], [6, 10]], upper=True, open_ends=False) >>> print(e.array) [[   0] [ 0 0 0 1] [ 1 1 1 ]]
Create indices for the two bins
[2, 6), [8, 10)
, which are noncontiguous>>> e = d.digitize([[2, 6], [8, 10]]) >>> print(e.array) [[ 0 0 1 1] [ 1 1  ] [ 2 2 3 3]]
Masked values result in masked indices in the output array.
>>> d[1, 1] = cf.masked >>> print(d.array) [[ 0 1 2 3] [ 4  6 7] [ 8 9 10 11]] >>> print(d.digitize([2, 6, 10], open_ends=True).array) [[ 0 0 1 1] [ 1  2 2] [ 2 2 3 3]] >>> print(d.digitize([2, 6, 10]).array) [[  0 0] [ 0  1 1] [ 1 1  ]] >>> print(d.digitize([2, 6, 10], closed_ends=True).array) [[  0 0] [ 0  1 1] [ 1 1 1 ]]