cf.Field.digitize¶
-
Field.
digitize
(bins, upper=False, open_ends=False, closed_ends=None, return_bins=False, inplace=False)[source]¶ Return the indices of the bins to which each value belongs.
Values (including masked values) that do not belong to any bin result in masked values in the output field construct of indices.
Bins defined by percentiles are easily created with the
percentile
method- Example:
Find the indices for bins defined by the 10th, 50th and 90th percentiles:
>>> bins = f.percentile([0, 10, 50, 90, 100], squeeze=True) >>> i = f.digitize(bins, closed_ends=True)
The output field construct is given a
long_name
property, and some or all of the following properties that define the bins:Property
Description
bin_count
An integer giving the number of bins
bin_bounds
A 1-d vector giving the bin bounds. The first two numbers describe the lower and upper boundaries of the first bin, the second two numbers describe the lower and upper boundaries of the second bin, and so on. The presence of left-unbounded and right-unbounded bins (see the bins and open_ends parameters) is deduced from the
bin_count
property. If thebin_bounds
vector has 2N elements then thebin_count
property will be N+2 if there are left-unbounded and right-unbounded bins, or N if no such bins are present.bin_interval_type
A string that specifies the nature of the bin boundaries, i.e. if they are closed or open. For example, if the lower boundary is closed and the upper boundary is open (which is the case when the upper parameter is False) then
bin_interval_type
will have the value'lower: closed upper: open'
.bin_units
A string giving the units of the bin boundary values (e.g.
'Kelvin'
). If the bins parameter is aData
object with units then these are used to set this property, otherwise the field construct’s units are used.bin_calendar
A string giving the calendar of reference date-time units for the bin boundary values (e.g.
'noleap'
). If the units are not reference date-time units this property will be omitted. If the calendar is the CF default calendar, then this property may be omitted. If the bins parameter is aData
object with a calendar then this is used to set this property, otherwise the field construct’s calendar is used.bin_standard_name
A string giving the standard name of the bin boundaries (e.g.
'air_temperature'
). If there is no standard name then this property will be omitted.bin_long_name
A string giving the long name of the bin boundaries (e.g.
'Air Temperature'
). If there is no long name, or thebin_standard_name
is present, then this property will be omitted.Of these properties, the
bin_count
andbin_bounds
are guaranteed to be output, with the others being dependent on the available metadata.New in version 3.0.2.
See also
- Parameters
- bins: array_like
The bin boundaries. One of:
An integer
Create this many equally sized, contiguous bins spanning the range of the data. I.e. the smallest bin boundary is the minimum of the data and the largest bin boundary is the maximum of the data. In order to guarantee that each data value lies inside a bin, the closed_ends parameter is assumed to be True.
A 1-d array
When sorted into a monotonically increasing sequence, each boundary, with the exception of the two end boundaries, counts as the upper boundary of one bin and the lower boundary of next. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-unbounded (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-unbounded (i.e. not bounded above) bin.
A 2-d array
The second dimension, that must have size 2, contains the lower and upper boundaries of each bin. The bins to not have to be contiguous, but must not overlap. If the open_ends parameter is True then the lowest lower bin boundary also defines a left-unbounded (i.e. not bounded below) bin, and the largest upper bin boundary also defines a right-unbounded (i.e. not bounded above) bin.
- upper:
bool
, optional If True then each bin includes its upper bound but not its lower bound. By default the opposite is applied, i.e. each bin includes its lower bound but not its upper bound.
- open_ends:
bool
, optional If True then create left-unbounded (i.e. not bounded below) and right-unbounded (i.e. not bounded above) bins from the lowest lower bin boundary and largest upper bin boundary respectively. By default these bins are not created
- closed_ends:
bool
, optional If True then extend the most extreme open boundary by a small amount so that its bin includes values that are equal to the unadjusted boundary value. This is done by multiplying it by
1.0 - epsilon
or1.0 + epsilon
, whichever extends the boundary in the appropriate direction, whereepsilon
is the smallest positive 64-bit float such that1.0 + epsilson != 1.0
. I.e. if upper is False then the largest upper bin boundary is made slightly larger and if upper is True then the lowest lower bin boundary is made slightly lower.By default closed_ends is assumed to be True if bins is a scalar and False otherwise.
- return_bins:
bool
, optional If True then also return the bins in their 2-d form.
- inplace:
bool
, optional If True then do the operation in-place and return
None
.
- Returns
Examples
>>> f = cf.example_field(0) >>> f <CF Field: specific_humidity(latitude(5), longitude(8)) 0.001 1> >>> f.properties() {'Conventions': 'CF-1.7', 'standard_name': 'specific_humidity', 'units': '0.001 1'} >>> print(f.array) [[ 7. 34. 3. 14. 18. 37. 24. 29.] [ 23. 36. 45. 62. 46. 73. 6. 66.] [110. 131. 124. 146. 87. 103. 57. 11.] [ 29. 59. 39. 70. 58. 72. 9. 17.] [ 6. 36. 19. 35. 18. 37. 34. 13.]] >>> g = f.digitize([0, 50, 100, 150]) >>> g <CF Field: long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))> >>> print(g.array) [[0 0 0 0 0 0 0 0] [0 0 0 1 0 1 0 1] [2 2 2 2 1 2 1 0] [0 1 0 1 1 1 0 0] [0 0 0 0 0 0 0 0]] >>> g.properties() {'Conventions': 'CF-1.7', 'long_name': "Bin index to which each 'specific_humidity' value belongs", 'bin_bounds': array([ 0, 50, 50, 100, 100, 150]), 'bin_count': 3, 'bin_interval_type': 'lower: closed upper: open', 'bin_standard_name': 'specific_humidity', 'bin_units': '0.001 1'}
>>> g = f.digitize([[10, 20], [40, 60], [100, 140]]) >>> print(g.array) [[-- -- -- 0 0 -- -- --] [-- -- 1 -- 1 -- -- --] [ 2 2 2 -- -- 2 1 0] [-- 1 -- -- 1 -- -- 0] [-- -- 0 -- 0 -- -- 0]] >>> g.properties() {'Conventions': 'CF-1.7', 'long_name': "Bin index to which each 'specific_humidity' value belongs", 'bin_bounds': array([ 10, 20, 40, 60, 100, 140]), 'bin_count': 3, 'bin_interval_type': 'lower: closed upper: open', 'bin_standard_name': 'specific_humidity', 'bin_units': '0.001 1'}
>>> g = f.digitize([[10, 20], [40, 60], [100, 140]], open_ends=True) >>> print(g.array) [[ 0 -- 0 1 1 -- -- --] [-- -- 2 -- 2 -- 0 --] [ 3 3 3 -- -- 3 2 1] [-- 2 -- -- 2 -- 0 1] [ 0 -- 1 -- 1 -- -- 1]] >>> g.properties() {'Conventions': 'CF-1.7', 'long_name': "Bin index to which each 'specific_humidity' value belongs", 'bin_bounds': array([ 10, 20, 40, 60, 100, 140]), 'bin_count': 5, 'bin_interval_type': 'lower: closed upper: open', 'bin_standard_name': 'specific_humidity', 'bin_units': '0.001 1'}
>>> g = f.digitize([2, 6, 45, 100], upper=True) >>> g <CF Field: long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))> >>> print(g.array) [[ 1 1 0 1 1 1 1 1] [ 1 1 1 2 2 2 0 2] [-- -- -- -- 2 -- 2 1] [ 1 2 1 2 2 2 1 1] [ 0 1 1 1 1 1 1 1]] >>> g.properties() {'Conventions': 'CF-1.7', 'long_name': "Bin index to which each 'specific_humidity' value belongs", 'bin_bounds': array([ 2, 6, 6, 45, 45, 100]), 'bin_count': 3, 'bin_interval_type': 'lower: open upper: closed', 'bin_standard_name': 'specific_humidity', 'bin_units': '0.001 1'}
>>> g, bins = f.digitize(10, return_bins=True) >>> bins <CF Data(10, 2): [[3.0, ..., 146.00000000000003]] 0.001 1> >>> g, bins = f.digitize(10, upper=True, return_bins=True) <CF Data(10, 2): [[2.999999999999999, ..., 146.0]] 0.001 1> >>> print(g.array) [[0 2 0 0 1 2 1 1] [1 2 2 4 3 4 0 4] [7 8 8 9 5 6 3 0] [1 3 2 4 3 4 0 0] [0 2 1 2 1 2 2 0]]
>>> f[1, [2, 5]] = cf.masked >>> print(f.array) [[ 7. 34. 3. 14. 18. 37. 24. 29.] [ 23. 36. -- 62. 46. -- 6. 66.] [110. 131. 124. 146. 87. 103. 57. 11.] [ 29. 59. 39. 70. 58. 72. 9. 17.] [ 6. 36. 19. 35. 18. 37. 34. 13.]] >>> g = f.digitize(10) >>> print(g.array) [[ 0 2 0 0 1 2 1 1] [ 1 2 -- 4 3 -- 0 4] [ 7 8 8 9 5 6 3 0] [ 1 3 2 4 3 4 0 0] [ 0 2 1 2 1 2 2 0]] >>> g.properties() {'Conventions': 'CF-1.7', 'long_name': "Bin index to which each 'specific_humidity' value belongs", 'bin_bounds': array([ 3. , 17.3, 17.3, 31.6, 31.6, 45.9, 45.9, 60.2, 60.2, 74.5, 74.5, 88.8, 88.8, 103.1, 103.1, 117.4, 117.4, 131.7, 131.7, 146. ]), 'bin_count': 10, 'bin_interval_type': 'lower: closed upper: open', 'bin_standard_name': 'specific_humidity', 'bin_units': '0.001 1'}