cf.Field.digitize

Field.digitize(bins, upper=False, open_ends=False, closed_ends=None, return_bins=False, inplace=False)[source]

Return the indices of the bins to which each value belongs.

Values (including masked values) that do not belong to any bin result in masked values in the output field construct of indices.

Bins defined by percentiles are easily created with the percentile method

Example:

Find the indices for bins defined by the 10th, 50th and 90th percentiles:

>>> bins = f.percentile([0, 10, 50, 90, 100], squeeze=True)
>>> i = f.digitize(bins, closed_ends=True)

The output field construct is given a long_name property, and some or all of the following properties that define the bins:

Property

Description

bin_count

An integer giving the number of bins

bin_bounds

A 1-d vector giving the bin bounds. The first two numbers describe the lower and upper boundaries of the first bin, the second two numbers describe the lower and upper boundaries of the second bin, and so on. The presence of left-unbounded and right-unbounded bins (see the bins and open_ends parameters) is deduced from the bin_count property. If the bin_bounds vector has 2N elements then the bin_count property will be N+2 if there are left-unbounded and right-unbounded bins, or N if no such bins are present.

bin_interval_type

A string that specifies the nature of the bin boundaries, i.e. if they are closed or open. For example, if the lower boundary is closed and the upper boundary is open (which is the case when the upper parameter is False) then bin_interval_type will have the value 'lower: closed upper: open'.

bin_units

A string giving the units of the bin boundary values (e.g. 'Kelvin'). If the bins parameter is a Data object with units then these are used to set this property, otherwise the field construct’s units are used.

bin_calendar

A string giving the calendar of reference date-time units for the bin boundary values (e.g. 'noleap'). If the units are not reference date-time units this property will be omitted. If the calendar is the CF default calendar, then this property may be omitted. If the bins parameter is a Data object with a calendar then this is used to set this property, otherwise the field construct’s calendar is used.

bin_standard_name

A string giving the standard name of the bin boundaries (e.g. 'air_temperature'). If there is no standard name then this property will be omitted.

bin_long_name

A string giving the long name of the bin boundaries (e.g. 'Air Temperature'). If there is no long name, or the bin_standard_name is present, then this property will be omitted.

Of these properties, the bin_count and bin_bounds are guaranteed to be output, with the others being dependent on the available metadata.

New in version 3.0.2.

See also

bin, histogram, percentile

Parameters
bins: array_like

The bin boundaries. One of:

  • An integer

    Create this many equally sized, contiguous bins spanning the range of the data. I.e. the smallest bin boundary is the minimum of the data and the largest bin boundary is the maximum of the data. In order to guarantee that each data value lies inside a bin, the closed_ends parameter is assumed to be True.

  • A 1-d array

    When sorted into a monotonically increasing sequence, each boundary, with the exception of the two end boundaries, counts as the upper boundary of one bin and the lower boundary of next. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-unbounded (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-unbounded (i.e. not bounded above) bin.

  • A 2-d array

    The second dimension, that must have size 2, contains the lower and upper boundaries of each bin. The bins to not have to be contiguous, but must not overlap. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-unbounded (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-unbounded (i.e. not bounded above) bin.

upper: bool, optional

If True then each bin includes its upper bound but not its lower bound. By default the opposite is applied, i.e. each bin includes its lower bound but not its upper bound.

open_ends: bool, optional

If True then create left-unbounded (i.e. not bounded below) and right-unbounded (i.e. not bounded above) bins from the lowest lower bin boundary and largest upper bin boundary respectively. By default these bins are not created

closed_ends: bool, optional

If True then extend the most extreme open boundary by a small amount so that its bin includes values that are equal to the unadjusted boundary value. This is done by multiplying it by 1.0 - epsilon or 1.0 + epsilon, whichever extends the boundary in the appropriate direction, where epsilon is the smallest positive 64-bit float such that 1.0 + epsilson != 1.0. I.e. if upper is False then the largest upper bin boundary is made slightly larger and if upper is True then the lowest lower bin boundary is made slightly lower.

By default closed_ends is assumed to be True if bins is a scalar and False otherwise.

return_bins: bool, optional

If True then also return the bins in their 2-d form.

inplace: bool, optional

If True then do the operation in-place and return None.

Returns
Field or None, [Data]

The field construct containing indices of the bins to which each value belongs, or None if the operation was in-place.

If return_bins is True then also return the bins in their 2-d form.

Examples

>>> f = cf.example_field(0)
>>> f
<CF Field: specific_humidity(latitude(5), longitude(8)) 0.001 1>
>>> f.properties()
{'Conventions': 'CF-1.7',
 'standard_name': 'specific_humidity',
 'units': '0.001 1'}
>>> print(f.array)
[[  7.  34.   3.  14.  18.  37.  24.  29.]
 [ 23.  36.  45.  62.  46.  73.   6.  66.]
 [110. 131. 124. 146.  87. 103.  57.  11.]
 [ 29.  59.  39.  70.  58.  72.   9.  17.]
 [  6.  36.  19.  35.  18.  37.  34.  13.]]
>>> g = f.digitize([0, 50, 100, 150])
>>> g
<CF Field: long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))>
>>> print(g.array)
[[0 0 0 0 0 0 0 0]
 [0 0 0 1 0 1 0 1]
 [2 2 2 2 1 2 1 0]
 [0 1 0 1 1 1 0 0]
 [0 0 0 0 0 0 0 0]]
>>> g.properties()
{'Conventions': 'CF-1.7',
 'long_name': "Bin index to which each 'specific_humidity' value belongs",
 'bin_bounds': array([  0,  50,  50, 100, 100, 150]),
 'bin_count': 3,
 'bin_interval_type': 'lower: closed upper: open',
 'bin_standard_name': 'specific_humidity',
 'bin_units': '0.001 1'}
>>> g = f.digitize([[10, 20], [40, 60], [100, 140]])
>>> print(g.array)
[[-- -- --  0  0 -- -- --]
 [-- --  1 --  1 -- -- --]
 [ 2  2  2 -- --  2  1  0]
 [--  1 -- --  1 -- --  0]
 [-- --  0 --  0 -- --  0]]
>>> g.properties()
{'Conventions': 'CF-1.7',
 'long_name': "Bin index to which each 'specific_humidity' value belongs",
 'bin_bounds': array([ 10,  20,  40,  60, 100, 140]),
 'bin_count': 3,
 'bin_interval_type': 'lower: closed upper: open',
 'bin_standard_name': 'specific_humidity',
 'bin_units': '0.001 1'}
>>> g = f.digitize([[10, 20], [40, 60], [100, 140]], open_ends=True)
>>> print(g.array)
[[ 0 --  0  1  1 -- -- --]
 [-- --  2 --  2 --  0 --]
 [ 3  3  3 -- --  3  2  1]
 [--  2 -- --  2 --  0  1]
 [ 0 --  1 --  1 -- --  1]]
>>> g.properties()
{'Conventions': 'CF-1.7',
 'long_name': "Bin index to which each 'specific_humidity' value belongs",
 'bin_bounds': array([ 10,  20,  40,  60, 100, 140]),
 'bin_count': 5,
 'bin_interval_type': 'lower: closed upper: open',
 'bin_standard_name': 'specific_humidity',
 'bin_units': '0.001 1'}
>>> g = f.digitize([2, 6, 45, 100], upper=True)
>>> g
<CF Field: long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))>
>>> print(g.array)
[[ 1  1  0  1  1  1  1  1]
 [ 1  1  1  2  2  2  0  2]
 [-- -- -- --  2 --  2  1]
 [ 1  2  1  2  2  2  1  1]
 [ 0  1  1  1  1  1  1  1]]
>>> g.properties()
{'Conventions': 'CF-1.7',
 'long_name': "Bin index to which each 'specific_humidity' value belongs",
 'bin_bounds': array([  2,   6,   6,  45,  45, 100]),
 'bin_count': 3,
 'bin_interval_type': 'lower: open upper: closed',
 'bin_standard_name': 'specific_humidity',
 'bin_units': '0.001 1'}
>>> g, bins = f.digitize(10, return_bins=True)
>>> bins
<CF Data(10, 2): [[3.0, ..., 146.00000000000003]] 0.001 1>
>>> g, bins = f.digitize(10, upper=True, return_bins=True)
<CF Data(10, 2): [[2.999999999999999, ..., 146.0]] 0.001 1>
>>> print(g.array)
[[0 2 0 0 1 2 1 1]
 [1 2 2 4 3 4 0 4]
 [7 8 8 9 5 6 3 0]
 [1 3 2 4 3 4 0 0]
 [0 2 1 2 1 2 2 0]]
>>> f[1, [2, 5]] = cf.masked
>>> print(f.array)
[[  7.  34.   3.  14.  18.  37.  24.  29.]
 [ 23.  36.   --  62.  46.   --   6.  66.]
 [110. 131. 124. 146.  87. 103.  57.  11.]
 [ 29.  59.  39.  70.  58.  72.   9.  17.]
 [  6.  36.  19.  35.  18.  37.  34.  13.]]
>>> g = f.digitize(10)
>>> print(g.array)
[[ 0  2  0  0  1  2  1  1]
 [ 1  2 --  4  3 --  0  4]
 [ 7  8  8  9  5  6  3  0]
 [ 1  3  2  4  3  4  0  0]
 [ 0  2  1  2  1  2  2  0]]
>>> g.properties()
{'Conventions': 'CF-1.7',
 'long_name': "Bin index to which each 'specific_humidity' value belongs",
 'bin_bounds': array([  3. ,  17.3,  17.3,  31.6,  31.6,  45.9,  45.9,  60.2,
        60.2,  74.5,  74.5,  88.8,  88.8, 103.1, 103.1, 117.4, 117.4, 131.7,
        131.7, 146. ]),
 'bin_count': 10,
 'bin_interval_type': 'lower: closed upper: open',
 'bin_standard_name': 'specific_humidity',
 'bin_units': '0.001 1'}