cf.Field.bin¶
-
Field.
bin
(method, digitized, weights=None, measure=False, scale=None, mtol=1, ddof=1, radius='earth', great_circle=False, return_indices=False, verbose=None)[source]¶ Collapse the data values that lie in N-dimensional bins.
The data values of the field construct are binned according to how they correspond to the N-dimensional histogram bins of another set of variables (see
cf.histogram
for details), and each bin of values is collapsed with one of the collapse methods allowed by the method parameter.The number of dimensions of the output binned data is equal to the number of field constructs provided by the digitized argument. Each such field construct defines a sequence of bins and provides indices to the bins that each value of another field construct belongs. There is no upper limit to the number of dimensions of the output binned data.
The output bins are defined by the exterior product of the one-dimensional bins of each digitized field construct. For example, if only one digitized field construct is provided then the output bins simply comprise its one-dimensional bins; if there are two digitized field constructs then the output bins comprise the two-dimensional matrix formed by all possible combinations of the two sets of one-dimensional bins; etc.
An output value for a bin is formed by collapsing (using the method given by the method parameter) the elements of the data for which the corresponding locations in the digitized field constructs, taken together, index that bin. Note that it may be the case that not all output bins are indexed by the digitized field constructs, and for these bins missing data is returned.
The returned field construct will have a domain axis construct for each dimension of the output bins, with a corresponding dimension coordinate construct that defines the bin boundaries.
New in version 3.0.2.
See also
- Parameters
- method:
str
The collapse method used to combine values that map to each cell of the output field construct. The following methods are available (see https://ncas-cms.github.io/cf-python/analysis.html#collapse-methods for precise definitions):
method
Description
Weighted
'maximum'
The maximum of the values.
Never
'minimum'
The minimum of the values.
Never
'maximum_absolute_value'
The maximum of the absolute values.
Never
'minimum_absolute_value'
The minimum of the absolute values.
Never
'mid_range'
The average of the maximum and the minimum of the values.
Never
'range'
The absolute difference between the maximum and the minimum of the values.
Never
'median'
The median of the values.
Never
'sum'
The sum of the values.
Never
'sum_of_squares'
The sum of the squares of values.
Never
'sample_size'
The sample size, i.e. the number of non-missing values.
Never
'sum_of_weights'
The sum of weights, as would be used for other calculations.
Never
'sum_of_weights2'
The sum of squares of weights, as would be used for other calculations.
Never
'mean'
The weighted or unweighted mean of the values.
May be
'mean_absolute_value'
The mean of the absolute values.
May be
'mean_of_upper_decile'
The mean of the upper group of data values defined by the upper tenth of their distribution.
May be
'variance'
The weighted or unweighted variance of the values, with a given number of degrees of freedom.
May be
'standard_deviation'
The square root of the weighted or unweighted variance.
May be
'root_mean_square'
The square root of the weighted or unweighted mean of the squares of the values.
May be
'integral'
The integral of values.
Always
Collapse methods that are “Never” weighted ignore the weights parameter, even if it is set.
Collapse methods that “May be” weighted will only be weighted if the weights parameter is set.
Collapse methods that are “Always” weighted require the weights parameter to be set.
- digitized: (sequence of)
Field
One or more field constructs that contain digitized data with corresponding metadata, as would be output by
cf.Field.digitize
. Each field construct contains indices to the one-dimensional bins to which each value of an original field construct belongs; and there must bebin_count
andbin_bounds
properties as defined by thedigitize
method (and any of the extra properties defined by that method are also recommended).The bins defined by the
bin_count
andbin_bounds
properties are used to create a dimension coordinate construct for the output field construct.Each digitized field construct must be transformable so that it is broadcastable to the input field construct’s data. This is done by using the metadata constructs of the to create a mapping of physically compatible dimensions between the fields, and then manipulating the dimensions of the digitized field construct’s data to ensure that broadcasting can occur.
- weights: optional
Specify the weights for the collapse calculations. The weights are those that would be returned by this call of the field construct’s
weights
method:f.weights(weights, measure=measure, scale=scale, radius=radius, great_circle=great_circle, components=True)
. See the measure, *scale, radius and great_circle parameters andcf.Field.weights
for details.Note
By default weights is
None
, resulting in unweighted calculations.Note
Setting weights to
True
is generally a good way to ensure that all collapses are appropriately weighted according to the field construct’s metadata. In this case, if it is not possible to create weights for any axis then an exception will be raised.However, care needs to be taken if weights is
True
when cell volume weights are desired. The volume weights will be taken from a “volume” cell measure construct if one exists, otherwise the cell volumes will be calculated as being proportional to the sizes of one-dimensional vertical coordinate cells. In the latter case if the vertical dimension coordinates do not define the actual height or depth thickness of every cell in the domain then the weights will be incorrect.If weights is the boolean
True
then weights are calculated for all of the domain axis constructs.- Parameter example:
To specify weights based on the field construct’s metadata for all axes use
weights=True
.- Parameter example:
To specify weights based on cell areas, leaving all other axes unweighted, use
weights='area'
.- Parameter example:
To specify weights based on cell areas and linearly in time, leaving all other axes unweighted, you could set
weights=('area', 'T')
.
- measure:
bool
, optional Create weights, as defined by the weights parameter, which are cell measures, i.e. which describe actual cell sizes (e.g. cell areas) with appropriate units (e.g. metres squared). By default the weights are scaled to lie between 0 and 1 and have arbitrary units (see the scale parameter).
Cell measures can be created for any combination of axes. For example, cell measures for a time axis are the time span for each cell with canonical units of seconds; cell measures for the combination of four axes representing time and three dimensional space could have canonical units of metres cubed seconds.
When collapsing with the
'integral'
method, measure must be True, and the units of the weights are incorporated into the units of the returned field construct.Note
Specifying cell volume weights via
weights=['X', 'Y', 'Z']
orweights=['area', 'Z']
(or other equivalents) will produce an incorrect result if the vertical dimension coordinates do not define the actual height or depth thickness of every cell in the domain. In this case,weights='volume'
should be used instead, which requires the field construct to have a “volume” cell measure construct.If
weights=True
then care also needs to be taken, as a “volume” cell measure construct will be used if present, otherwise the cell volumes will be calculated using the size of the vertical coordinate cells.- scale: number, optional
If set to a positive number then scale the weights, as defined by the weights parameter, so that they are less than or equal to that number. By default the weights are scaled to lie between 0 and 1 (i.e. scale is 1).
- Parameter example:
To scale all weights so that they lie between 0 and 0.5:
scale=0.5
.
- mtol: number, optional
Set the fraction of input data elements which is allowed to contain missing data when contributing to an individual output data element. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning that a missing datum in the output array occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.
- Parameter example:
To ensure that an output array element is a missing datum if more than 25% of its input array elements are missing data:
mtol=0.25
.
- ddof: number, optional
The delta degrees of freedom in the calculation of a standard deviation or variance. The number of degrees of freedom used in the calculation is (N-ddof) where N represents the number of non-missing elements contributing to the calculation. By default ddof is 1, meaning the standard deviation and variance of the population is estimated according to the usual formula with (N-1) in the denominator to avoid the bias caused by the use of the sample mean (Bessel’s correction).
- radius: optional
Specify the radius used for calculating the areas of cells defined in spherical polar coordinates. The radius is that which would be returned by this call of the field construct’s
radius
method:f.radius(radius)
. See thecf.Field.radius
for details.By default radius is
'earth'
which means that if and only if the radius can not found from the datums of any coordinate reference constructs, then the default radius taken as 6371229 metres.- great_circle:
bool
, optional If True then allow, if required, the derivation of i) area weights from polygon geometry cells by assuming that each cell part is a spherical polygon composed of great circle segments; and ii) and the derivation of line-length weights from line geometry cells by assuming that each line part is composed of great circle segments.
New in version 3.2.0.
- verbose:
int
orstr
orNone
, optional If an integer from
-1
to3
, or an equivalent string equal ignoring case to one of:'DISABLE'
(0
)'WARNING'
(1
)'INFO'
(2
)'DETAIL'
(3
)'DEBUG'
(-1
)
set for the duration of the method call only as the minimum cut-off for the verboseness level of displayed output (log) messages, regardless of the globally-configured
cf.log_level
. Note that increasing numerical value corresponds to increasing verbosity, with the exception of-1
as a special case of maximal and extreme verbosity.Otherwise, if
None
(the default value), output messages will be shown according to the value of thecf.log_level
setting.Overall, the higher a non-negative integer or equivalent string that is set (up to a maximum of
3
/'DETAIL'
) for increasing verbosity, the more description that is printed to convey information about the operation.
- method:
- Returns
Field
The field construct containing the binned values.
Examples
Find the range of values that lie in each bin:
>>> print(q) Field: specific_humidity (ncvar%q) ---------------------------------- Data : specific_humidity(latitude(5), longitude(8)) 0.001 1 Cell methods : area: mean Dimension coords: latitude(5) = [-75.0, ..., 75.0] degrees_north : longitude(8) = [22.5, ..., 337.5] degrees_east : time(1) = [2019-01-01 00:00:00] >>> print(q.array) [[ 7. 34. 3. 14. 18. 37. 24. 29.] [ 23. 36. 45. 62. 46. 73. 6. 66.] [110. 131. 124. 146. 87. 103. 57. 11.] [ 29. 59. 39. 70. 58. 72. 9. 17.] [ 6. 36. 19. 35. 18. 37. 34. 13.]] >>> indices = q.digitize(10) >>> b = q.bin('range', digitized=indices) >>> print(b) Field: specific_humidity ------------------------ Data : specific_humidity(specific_humidity(10)) 0.001 1 Cell methods : latitude: longitude: range Dimension coords: specific_humidity(10) = [10.15, ..., 138.85000000000002] 0.001 1 >>> print(b.array) [14. 11. 11. 13. 11. 0. 0. 0. 7. 0.]
Find various metrics describing how
tendency_of_sea_water_potential_temperature_expressed_as_heat_content
data varies withsea_water_potential_temperature
andsea_water_salinity
:>>> t Field: sea_water_potential_temperature (ncvar%sea_water_potential_temperature) ------------------------------------------------------------------------------ Data : sea_water_potential_temperature(time(1), depth(1), latitude(5), longitude(8)) K Cell methods : area: mean time(1): mean Dimension coords: time(1) = [2290-06-01 00:00:00] 360_day : depth(1) = [3961.89990234375] m : latitude(5) = [-1.875, ..., 3.125] degrees_north : longitude(8) = [75.0, ..., 83.75] degrees_east Auxiliary coords: model_level_number(depth(1)) = [18] >>> s Field: sea_water_salinity (ncvar%sea_water_salinity) ---------------------------------------------------- Data : sea_water_salinity(time(1), depth(1), latitude(5), longitude(8)) psu Cell methods : area: mean time(1): mean Dimension coords: time(1) = [2290-06-01 00:00:00] 360_day : depth(1) = [3961.89990234375] m : latitude(5) = [-1.875, ..., 3.125] degrees_north : longitude(8) = [75.0, ..., 83.75] degrees_east Auxiliary coords: model_level_number(depth(1)) = [18] >>> x Field: tendency_of_sea_water_potential_temperature_expressed_as_heat_content (ncvar%tend) ----------------------------------------------------------------------------------------- Data : tendency_of_sea_water_potential_temperature_expressed_as_heat_content(time(1), depth(1), latitude(5), longitude(8)) W m-2 Cell methods : area: mean time(1): mean Dimension coords: time(1) = [2290-06-01 00:00:00] 360_day : depth(1) = [3961.89990234375] m : latitude(5) = [-1.875, ..., 3.125] degrees_north : longitude(8) = [75.0, ..., 83.75] degrees_east Auxiliary coords: model_level_number(depth(1)) = [18] >>> print(x.array) [[[[-209.72 340.86 94.75 154.21 38.54 -262.75 158.22 154.58] [ 311.67 245.91 -168.16 47.61 -219.66 -270.33 226.1 52.0 ] [ -- -112.34 271.67 189.22 9.92 232.39 221.17 206.0 ] [ -- -- -92.31 -285.57 161.55 195.89 -258.29 8.35] [ -- -- -7.82 -299.79 342.32 -169.38 254.5 -75.4 ]]]]
>>> t_indices = t.digitize(6) >>> s_indices = s.digitize(4)
>>> n = x.bin('sample_size', [t_indices, s_indices]) >>> print(n) Field: number_of_observations ----------------------------- Data : number_of_observations(sea_water_salinity(4), sea_water_potential_temperature(6)) 1 Cell methods : latitude: longitude: point Dimension coords: sea_water_salinity(4) = [6.3054151982069016, ..., 39.09366758167744] psu : sea_water_potential_temperature(6) = [278.1569468180338, ..., 303.18466695149743] K >>> print(n.array) [[ 1 2 2 2 -- 2] [ 2 1 3 3 3 2] [-- -- 3 -- 1 --] [ 1 -- 1 3 2 1]]
>>> m = x.bin('mean', [t_indices, s_indices], weights=['X', 'Y', 'Z', 'T']) >>> print(m) Field: tendency_of_sea_water_potential_temperature_expressed_as_heat_content ---------------------------------------------------------------------------- Data : tendency_of_sea_water_potential_temperature_expressed_as_heat_content(sea_water_salinity(4), sea_water_potential_temperature(6)) W m-2 Cell methods : latitude: longitude: mean Dimension coords: sea_water_salinity(4) = [6.3054151982069016, ..., 39.09366758167744] psu : sea_water_potential_temperature(6) = [278.1569468180338, ..., 303.18466695149743] K >>> print(m.array) [[ 189.22 131.36 6.75 -41.61 -- 100.04] [-116.73 232.38 -4.82 180.47 134.25 -189.55] [ -- -- 180.69 -- 47.61 --] [158.22 -- -262.75 64.12 -51.83 -219.66]]
>>> i = x.bin( ... 'integral', [t_indices, s_indices], ... weights=['X', 'Y', 'Z', 'T'], measure=True ... ) >>> print(i) Field: long_name=integral of tendency_of_sea_water_potential_temperature_expressed_as_heat_content -------------------------------------------------------------------------------------------------- Data : long_name=integral of tendency_of_sea_water_potential_temperature_expressed_as_heat_content(sea_water_salinity(4), sea_water_potential_temperature(6)) 86400 m3.kg.s-2 Cell methods : latitude: longitude: sum Dimension coords: sea_water_salinity(4) = [6.3054151982069016, ..., 39.09366758167744] psu : sea_water_potential_temperature(6) = [278.1569468180338, ..., 303.18466695149743] K >>> print(i.array) [[ 3655558758400.0 5070927691776.0 260864491520.0 -1605439586304.0 -- 3863717609472.0] [-4509735059456.0 4489564127232.0 -280126521344.0 10454746267648.0 7777254113280.0 -7317268463616.0] [ -- -- 10470463373312.0 -- 919782031360.0 --] [ 3055211773952.0 -- -5073676009472.0 3715958833152.0 -2000787079168.0 -4243632160768.0]]
>>> w = x.bin('sum_of_weights', [t_indices, s_indices], weights=['X', 'Y', 'Z', 'T'], measure=True) Field: long_name=sum_of_weights of tendency_of_sea_water_potential_temperature_expressed_as_heat_content -------------------------------------------------------------------------------------------------------- Data : long_name=sum_of_weights of tendency_of_sea_water_potential_temperature_expressed_as_heat_content(sea_water_salinity(4), sea_water_potential_temperature(6)) 86400 m3.s Cell methods : latitude: longitude: sum Dimension coords: sea_water_salinity(4) = [7.789749830961227, ..., 36.9842486679554] psu : sea_water_potential_temperature(6) = [274.50717671712243, ..., 302.0188242594401] K >>> print(w.array) [[19319093248.0 38601412608.0 38628990976.0 38583025664.0 -- 38619795456.0] [38628990976.0 19319093248.0 57957281792.0 57929699328.0 57929695232.0 38601412608.0] [ -- -- 57948086272.0 -- 19319093248.0 --] [19309897728.0 -- 19309897728.0 57948086272.0 38601412608.0 19319093248.0]]
Demonstrate that the integral divided by the sum of the cell measures is equal to the mean:
>>> print(i/w) Field: ------- Data : (sea_water_salinity(4), sea_water_potential_temperature(6)) kg.s-3 Cell methods : latitude: longitude: sum Dimension coords: sea_water_salinity(4) = [7.789749830961227, ..., 36.9842486679554] psu : sea_water_potential_temperature(6) = [274.50717671712243, ..., 302.0188242594401] K >>> (i/w == m).all() True