cf.histogram

cf.histogram(*digitized)[source]

Return the distribution of a set of variables in the form of an N-dimensional histogram.

The number of dimensions of the histogram is equal to the number of field constructs provided by the digitized argument. Each such field construct defines a sequence of bins and provides indices to the bins that each value of one of the variables belongs. There is no upper limit to the number of dimensions of the histogram.

The output histogram bins are defined by the exterior product of the one-dimensional bins of each digitized field construct. For example, if only one digitized field construct is provided then the histogram bins simply comprise its one-dimensional bins; if there are two digitized field constructs then the histogram bins comprise the two-dimensional matrix formed by all possible combinations of the two sets of one-dimensional bins; etc.

An output value for an histogram bin is formed by counting the number cells for which the digitized field constructs, taken together, index that bin. Note that it may be the case that not all output bins are indexed by the digitized field constructs, and for these bins missing data is returned.

The returned field construct will have a domain axis construct for each dimension of the histogram, with a corresponding dimension coordinate construct that defines the bin boundaries.

New in version 3.0.2.

Parameters
digitized: one or more Field

One or more field constructs that contain digitized data with corresponding metadata, as would be output by cf.Field.digitize. Each field construct contains indices to the one-dimensional bins to which each value of an original field construct belongs; and there must be bin_count and bin_bounds properties as defined by the cf.Field.digitize method (and any of the extra properties defined by that method are also recommended).

The bins defined by the bin_count and bin_bounds properties are used to create a dimension coordinate construct for the output field construct.

Each digitized field construct must be transformable so that its data is broadcastable to any other digitized field contruct’s data. This is done by using the metadata constructs of the to create a mapping of physically compatible dimensions between the fields, and then manipulating the dimensions of the digitized field construct’s data to ensure that broadcasting can occur.

Returns
Field

The field construct containing the histogram.

Examples

Create a one-dimensional histogram based on 10 equally-sized bins that exactly span the data range:

>>> f = cf.example_field(0)
>>> print(f)
Field: specific_humidity (ncvar%q)
----------------------------------
Data            : specific_humidity(latitude(5), longitude(8)) 1
Cell methods    : area: mean
Dimension coords: latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : time(1) = [2019-01-01 00:00:00]
>>> print(f.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
 [0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
 [0.11  0.131 0.124 0.146 0.087 0.103 0.057 0.011]
 [0.029 0.059 0.039 0.07  0.058 0.072 0.009 0.017]
 [0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> indices, bins = f.digitize(10, return_bins=True)
>>> print(indices)
Field: long_name=Bin index to which each 'specific_humidity' value belongs (ncvar%q)
------------------------------------------------------------------------------------
Data            : long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))
Cell methods    : area: mean
Dimension coords: latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : time(1) = [2019-01-01 00:00:00]
>>> print(bins.array)
[[0.003  0.0173]
 [0.0173 0.0316]
 [0.0316 0.0459]
 [0.0459 0.0602]
 [0.0602 0.0745]
 [0.0745 0.0888]
 [0.0888 0.1031]
 [0.1031 0.1174]
 [0.1174 0.1317]
 [0.1317 0.146 ]]
>>> h = cf.histogram(indices)
>>> rint(h)
Field: number_of_observations
-----------------------------
Data            : number_of_observations(specific_humidity(10)) 1
Cell methods    : latitude: longitude: point
Dimension coords: specific_humidity(10) = [0.01015, ..., 0.13885] 1
>>> print(h.array)
[9 7 9 4 5 1 1 1 2 1]
>>> print(h.coordinate('specific_humidity').bounds.array)
[[0.003  0.0173]
 [0.0173 0.0316]
 [0.0316 0.0459]
 [0.0459 0.0602]
 [0.0602 0.0745]
 [0.0745 0.0888]
 [0.0888 0.1031]
 [0.1031 0.1174]
 [0.1174 0.1317]
 [0.1317 0.146 ]]

Create a two-dimensional histogram based on specific humidity and temperature bins. The temperature bins in this example are derived from a dummy temperature field construct with the same shape as the specific humidity field construct already in use:

>>> g = f.copy()
>>> g.standard_name = 'air_temperature'
>>> import numpy
>>> g[...] = numpy.random.normal(loc=290, scale=10, size=40).reshape(5, 8)
>>> g.override_units('K', inplace=True)
>>> print(g)
Field: air_temperature (ncvar%q)
--------------------------------
Data            : air_temperature(latitude(5), longitude(8)) K
Cell methods    : area: mean
Dimension coords: latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : time(1) = [2019-01-01 00:00:00]
>>> indices_t = g.digitize(5)
>>> h = cf.histogram(indices, indices_t)
>>> print(h)
Field: number_of_observations
-----------------------------
Data            : number_of_observations(air_temperature(5), specific_humidity(10)) 1
Cell methods    : latitude: longitude: point
Dimension coords: air_temperature(5) = [281.1054839143287, ..., 313.9741786365939] K
                : specific_humidity(10) = [0.01015, ..., 0.13885] 1
>>> print(h.array)
[[2  1  5  3  2 -- -- -- -- --]
 [1  1  2 --  1 --  1  1 -- --]
 [4  4  2  1  1  1 -- --  1  1]
 [1  1 -- --  1 -- -- --  1 --]
 [1 -- -- -- -- -- -- -- -- --]]
>>> h.sum()
<CF Data(): 40 1>