cf.Data.digitize

Data.digitize(bins, upper=False, open_ends=False, closed_ends=None, return_bins=False, inplace=False)[source]

Return the indices of the bins to which each value belongs.

Values (including masked values) that do not belong to any bin result in masked values in the output data.

Bins defined by percentiles are easily created with the percentiles method

Example:

Find the indices for bins defined by the 10th, 50th and 90th percentiles:

>>> bins = d.percentile([0, 10, 50, 90, 100], squeeze=True)
>>> i = f.digitize(bins, closed_ends=True)

New in version 3.0.2.

See also

percentile

Parameters
bins: array_like

The bin boundaries. One of:

  • An integer.

    Create this many equally sized, contiguous bins spanning the range of the data. I.e. the smallest bin boundary is the minimum of the data and the largest bin boundary is the maximum of the data. In order to guarantee that each data value lies inside a bin, the closed_ends parameter is assumed to be True.

  • A 1-d array of numbers.

    When sorted into a monotonically increasing sequence, each boundary, with the exception of the two end boundaries, counts as the upper boundary of one bin and the lower boundary of next. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-open (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-open (i.e. not bounded above) bin.

  • A 2-d array of numbers.

    The second dimension, that must have size 2, contains the lower and upper bin boundaries. Different bins may share a boundary, but may not overlap. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-open (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-open (i.e. not bounded above) bin.

upper: bool, optional

If True then each bin includes its upper bound but not its lower bound. By default the opposite is applied, i.e. each bin includes its lower bound but not its upper bound.

open_ends: bool, optional

If True then create left-open (i.e. not bounded below) and right-open (i.e. not bounded above) bins from the lowest lower bin boundary and largest upper bin boundary respectively. By default these bins are not created

closed_ends: bool, optional

If True then extend the most extreme open boundary by a small amount so that its bin includes values that are equal to the unadjusted boundary value. This is done by multiplying it by 1.0 - epsilon or 1.0 + epsilon, whichever extends the boundary in the appropriate direction, where epsilon is the smallest positive 64-bit float such that 1.0 + epsilson != 1.0. I.e. if upper is False then the largest upper bin boundary is made slightly larger and if upper is True then the lowest lower bin boundary is made slightly lower.

By default closed_ends is assumed to be True if bins is a scalar and False otherwise.

return_bins: bool, optional

If True then also return the bins in their 2-d form.

inplace: bool, optional

If True then do the operation in-place and return None.

Returns
Data, [Data]

The indices of the bins to which each value belongs.

If return_bins is True then also return the bins in their 2-d form.

Examples

>>> d = cf.Data(numpy.arange(12).reshape(3, 4))
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Equivalent ways to create indices for the four bins [-inf, 2), [2, 6), [6, 10), [10, inf)

>>> e = d.digitize([2, 6, 10])
>>> e = d.digitize([[2, 6], [6, 10]])
>>> print(e.array)
[[0 0 1 1]
 [1 1 2 2]
 [2 2 3 3]]

Equivalent ways to create indices for the two bins (2, 6], (6, 10]

>>> e = d.digitize([2, 6, 10], upper=True, open_ends=False)
>>> e = d.digitize([[2, 6], [6, 10]], upper=True, open_ends=False)
>>> print(e.array)
[[-- -- --  0]
 [ 0  0  0  1]
 [ 1  1  1 --]]

Create indices for the two bins [2, 6), [8, 10), which are non-contiguous

>>> e = d.digitize([[2, 6], [8, 10]])
>>> print(e.array)
[[ 0 0  1  1]
 [ 1 1 -- --]
 [ 2 2  3  3]]

Masked values result in masked indices in the output array.

>>> d[1, 1] = cf.masked
>>> print(d.array)
[[ 0  1  2  3]
 [ 4 --  6  7]
 [ 8  9 10 11]]
>>> print(d.digitize([2, 6, 10], open_ends=True).array)
[[ 0  0  1  1]
 [ 1 --  2  2]
 [ 2  2  3  3]]
>>> print(d.digitize([2, 6, 10]).array)
[[-- --  0  0]
 [ 0 --  1  1]
 [ 1  1 -- --]]
>>> print(d.digitize([2, 6, 10], closed_ends=True).array)
[[-- --  0  0]
 [ 0 --  1  1]
 [ 1  1  1 --]]