Analysis¶
Version 3.0.6 for version 1.7 of the CF conventions.
Note
This version of cf is for Python 3 only and there are incompatible differences between versions 2.x and 3.x of cf.
Scripts written for version 2.x but running under version 3.x should either work as expected, or provide informative error messages on the new API usage. However, it is advised that the outputs of older scripts be checked when running with Python 3 versions of the cf library.
For version 2.x documentation, see the older releases page.
Sample datasets¶
This tutorial uses a number of small sample datasets, all of which can
be found in the zip file cf_tutorial_files.zip
(download
,
164kB):
$ unzip q cf_tutorial_files.zip
$ ls 1
air_temperature.nc
cf_tutorial_files.zip
contiguous.nc
external.nc
file2.nc
file.nc
gathered.nc
parent.nc
precipitation_flux.nc
timeseries.nc
umfile.pp
vertical.nc
wind_components.nc
The tutorial examples assume that the Python session is being run from the directory that contains the zip file and its unpacked contents, and no other files.
The tutorial files may be also found in the downloads directory of the online code repository.
Statistical collapses¶
Collapsing one or more dimensions reduces their size and replaces the
data along those axes with representative statistical values. The
result is a new field construct with consistent metadata for the
collapsed values. Collapses are carried with the collapse
method of the field construct.
By default all axes with size greater than 1 are collapsed completely (i.e. to size 1) with a given collapse method.
>>> a = cf.read('timeseries.nc')[0]
>>> print(a)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(5), longitude(8)) K
Cell methods : area: mean
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> b = a.collapse('minimum')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(1), latitude(1), longitude(1)) K
Cell methods : area: mean time(1): latitude(1): longitude(1): minimum
Dimension coords: time(1) = [19641130 12:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.array)
[[[198.9]]]
In the above example, note that the operation has been recorded in a
new cell method construct (time(1): latitude(1): longitude(1):
minimum
) in the output field construct, and the dimension coordinate
constructs each now have a single cell. The air pressure time
dimension was not included in the collapse because it already had size
1 in the original field construct.
The collapse can also be applied to any subset of the field construct’s dimensions. In this case, the domain axis and coordinate constructs for the noncollapsed dimensions remain the same. This is implemented either with the axes keyword, or with a CFnetCDF cell methodslike syntax for describing both the collapse dimensions and the collapse method in a single string. The latter syntax uses construct identities instead of netCDF dimension names to identify the collapse axes.
Statistics may be created to represent variation over one dimension or a combination of dimensions.
>>> b = a.collapse('maximum', axes='T')
>>> b = a.collapse('T: maximum')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(1), latitude(5), longitude(8)) K
Cell methods : area: mean time(1): maximum
Dimension coords: time(1) = [19641130 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.array)
[[[310.6 309.1 309.9 311.2 310.4 310.1 310.7 309.6]
[310. 310.7 311.1 311.3 310.9 311.2 310.6 310. ]
[308.9 309.8 311.2 311.2 311.2 309.3 311.1 310.7]
[310.1 310.3 308.8 311.1 310. 311.3 311.2 309.7]
[310.9 307.9 310.3 310.4 310.8 310.9 311.3 309.3]]]
>>> b = a.collapse('maximum', axes=['X', 'Y'])
>>> b = a.collapse('X: Y: maximum')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(1), longitude(1)) K
Cell methods : area: mean latitude(1): longitude(1): maximum
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: air_pressure(1) = [850.0] hPa
Variation over horizontal area may also be specified by the special
identity 'area'
. This may be used for any horizontal coordinate
reference system.
>>> b = a.collapse('area: maximum')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(1), longitude(1)) K
Cell methods : area: mean area: maximum
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: air_pressure(1) = [850.0] hPa
Collapse methods¶
The following collapse methods are available, over any subset of the domain axes. The “Cell method” column in the table gives the method of the new cell method construct (if one is created).
Method  Description  Cell method 

'maximum' 
The maximum of the values.  maximum 
'minimum' 
The minimum of the values.  minimum 
'maximum_absolute_value' 
The maximum of the absolute values.  maximum_absolute_value 
'minimum_absolute_value' 
The minimum of the absolute values.  minimum_absolute_value 
'mid_range' 
The average of the maximum and the minimum of the values.  mid_range 
'range' 
The absolute difference between the maximum and the minimum of the values.  range 
'median' 
The median of the values.  median 
'sample_size' 
The sample size, \(N\), as would be used for other calculations, i.e. the number of nonmissing values.  point 
'sum_of_weights' 
The sum of \(N\) weights \(w_i\), as would be used for other calculations, is
\[V_{1}=\sum_{i=1}^{N} w_i\]

sum 
'sum_of_weights2' 
The sum of the squares of \(N\) weights \(w_i\), as would be used for other calculations, is
\[V_{2}=\sum_{i=1}^{N} w_i^{2}\]

sum 
'sum' 
The unweighted sum of \(N\) values \(x_i\) is
\[t=\sum_{i=1}^{N} x_i\]

sum 
'sum_of_squares' 
The unweighted sum of the squares of \(N\) values \(x_i\) is
\[t_2=\sum_{i=1}^{N} x_{i}^{2}\]

sum_of_squares 
'integral' 
The integral of \(N\) values \(x_i\) with corresponding cell measures \(m_i\) is
\[i=\sum_{i=1}^{N} m_i x_i\]
Note that the integral differs from a weighted sum in that the units of the cell measures are incorporated into the result. 
sum 
'mean' 
The unweighted mean of \(N\) values \(x_i\) is
\[\mu=\frac{1}{N}\sum_{i=1}^{N}
x_i\]
The weighted mean of \(N\) values \(x_i\) with corresponding weights \(w_i\) is
\[\hat{\mu}=\frac{1}{V_{1}}
\sum_{i=1}^{N}
w_i x_i\]

mean 
'mean_absolute_value' 
The unweighted mean of \(N\) values \(x_i\) absoluted is
\[\mu_{abs}=\frac{1}{N}
\sum_{i=1}^{N}x_i\]
The weighted mean of \(N\) values \(x_i\) absoluted with corresponding weights \(w_i\) is
\[\hat{\mu}_{abs}=
\frac{1}{V_{1}}
\sum_{i=1}^{N} w_i x_i\]

mean_absolute_value 
'mean_of_upper_decile' 
The weighted or unweighted mean of the upper group of data values defined by the upper tenth of their distribution  mean_of_upper_decile 
'variance' 
The unweighted variance of \(N\) values \(x_i\) and with \(Nddof\) degrees of freedom (\(ddof\ge0\)) is
\[s_{Nddof}^{2}=
\frac{1}{Nddof}
\sum_{i=1}^{N} (x_i  \mu)^2\]
The unweighted biased estimate of the variance (\(s_{N}^{2}\)) is given by \(ddof=0\) and the unweighted unbiased estimate of the variance using Bessel’s correction (\(s^{2}=s_{N1}^{2}\)) is given by \(ddof=1\). The weighted biased estimate of the variance of \(N\) values \(x_i\) with corresponding weights \(w_i\) is
\[\hat{s}_{N}^{2}=
\frac{1}{V_{1}}
\sum_{i=1}^{N}
w_i(x_i 
\hat{\mu})^{2}\]
The corresponding weighted unbiased estimate of the variance is
\[\hat{s}^{2}=\frac{1}{V_{1} 
(V_{1}/V_{2})}
\sum_{i=1}^{N}
w_i(x_i 
\hat{\mu})^{2}\]
In both cases, the weights are assumed to be nonrandom reliability weights, as opposed to frequency weights. 
variance 
'standard_deviation' 
The standard deviation is the square root of the unweighted or weighted variance, as defined in this table.  standard_deviation 
'root_mean_square' 
The unweighted root mean square of \(N\) values \(x_i\) is
\[RMS=\sqrt{\frac{1}{N}
\sum_{i=1}^{N}
x_{i}^2}\]
The weighted root mean square of \(N\) values \(x_i\) with corresponding weights \(w_i\) is
\[\hat{RMS}=\sqrt{
\frac{1}{V_{1}}
\sum_{i=1}^{N} w_i
x_{i}^2}\]

root_mean_square 
Data type and missing data¶
In all collapses, missing data array elements are accounted for in the calculation.
Any collapse method that involves a calculation (such as calculating a
mean), as opposed to just selecting a value (such as finding a
maximum), will return a field containing double precision floating
point numbers. If this is not desired then the data type can be reset
after the collapse with the dtype
attribute of the field
construct.
Collapse weights¶
For weights to be incorporated in the collapse, the axes to be weighted must be identified with the weights keyword. A collapse by a particular method is either never weighted, or may be weighted, or is always weighted, as described in the following table:
Method  Description  Weighted 

'maximum' 
The maximum of the values.  Never 
'minimum' 
The minimum of the values.  Never 
'maximum_absolute_value' 
The maximum of the absolute.  Never 
'minimum_absolute_value' 
The minimum of the absolute.  Never 
'mid_range' 
The average of the maximum and the minimum of the values.  Never 
'range' 
The absolute difference between the maximum and the minimum of the values.  Never 
'median' 
The median of the values.  Never 
'sum' 
The sum of the values.  Never 
'sum_of_squares' 
The sum of the squares of values.  Never 
'sample_size' 
The sample size, i.e. the number of nonmissing values.  Never 
'sum_of_weights' 
The sum of weights, as would be used for other calculations.  Never 
'sum_of_weights2' 
The sum of squares of weights, as would be used for other calculations.  Never 
'mean' 
The weighted or unweighted mean of the values.  May be 
'mean_absolute_value' 
The mean of the absolute values.  May be 
'mean_of_upper_decile' 
The mean of the upper group of data values defined by the upper tenth of their distribution.  May be 
'variance' 
The weighted or unweighted variance of the values, with a given number of degrees of freedom.  May be 
'standard_deviation' 
The square root of the weighted or unweighted variance.  May be 
'root_mean_square' 
The square root of the weighted or unweighted mean of the squares of the values.  May be 
'integral' 
The integral of values.  Always 
 Collapse methods that are “Never” weighted ignore the weights parameter, even if it is set.
 Collapse methods that “May be” weighted will only be weighted if the weights parameter is set.
 Collapse methods that are “Always” weighted require the weights parameter to be set.
Weights are either derived from the field construct’s metadata (such
as cell sizes), or may be provided explicitly in the form of other
field constructs containing data of weights values. In either case,
the weights actually used are those derived by the weights
method of the field construct called with the same weights keyword
value. Collapsed axes that are not identified by the weights keyword
are unweighted during the collapse operation.
>>> b = a.collapse('T: mean', weights='T')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(1), latitude(5), longitude(8)) K
Cell methods : area: mean time(1): mean
Dimension coords: time(1) = [19641130 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print (b.array)
[[[254.03120723 255.89723515 253.06490556 254.17815494 255.18458801 253.3684369 253.26624692 253.63818779]
[248.92058582 253.99597591 259.67957843 257.08967972 252.57333698 252.5746236 258.90938954 253.86939502]
[255.94716671 254.77330961 254.35929373 257.91478237 251.87670408 252.72723789 257.26038872 258.19698878]
[258.08639474 254.8087873 254.9881741 250.98064604 255.3513003 256.66337257 257.86895702 259.49299206]
[263.80016425 253.35825349 257.8026006 254.3173556 252.2061867 251.74150014 255.60930742 255.06260608]]]
To inspect the weights, call the weights
method directly.
>>> w = a.weights(weights='T')
>>> print(w)
Field: long_name=weights (ncvar%air_potential_temperature)

Data : long_name=weights(time(120)) d
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
>>> print(w.array)
[31. 31. 29. 31. 30. 31. 30. 31. 31. 30. 31. 30. 31. 31. 28. 31. 30. 31.
30. 31. 31. 30. 31. 30. 31. 31. 28. 31. 30. 31. 30. 31. 31. 30. 31. 30.
31. 31. 28. 31. 30. 31. 30. 31. 31. 30. 31. 30. 31. 31. 29. 31. 30. 31.
30. 31. 31. 30. 31. 30. 31. 31. 28. 31. 30. 31. 30. 31. 31. 30. 31. 30.
31. 31. 28. 31. 30. 31. 30. 31. 31. 30. 31. 30. 31. 31. 28. 31. 30. 31.
30. 31. 31. 30. 31. 30. 31. 31. 29. 31. 30. 31. 30. 31. 31. 30. 31. 30.
31. 31. 28. 31. 30. 31. 30. 31. 31. 30. 31. 30.]
>>> b = a.collapse('T: Y: mean', weights='Y')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(1), latitude(1), longitude(8)) K
Cell methods : area: mean time(1): latitude(1): mean
Dimension coords: time(1) = [19641130 12:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print (b.array)
[[[256.15819444 254.625 255.73666667 255.43041667 253.19444444 253.31277778 256.68236111 256.42055556]]]
Specifying weighting by horizontal cell area may also use the special
'area'
syntax.
>>> b = a.collapse('area: mean', weights='area')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(1), longitude(1)) K
Cell methods : area: mean area: mean
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: air_pressure(1) = [850.0] hPa
See the weights
method for full details on how weights may be specified.
Multiple collapses¶
Multiple collapses normally require multiple calls to
collapse
: one on the original field construct and then one on
each interim field construct.
>>> b = a.collapse('area: mean', weights='area').collapse('T: maximum')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(1), latitude(1), longitude(1)) K
Cell methods : area: mean latitude(1): longitude(1): mean time(1): maximum
Dimension coords: time(1) = [19641130 12:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.array)
[[[271.77199724]]]
If preferred, multiple collapses may be carried out in a single call
to collapse
by using the CFnetCDF cell methodslike
syntax (note that the colon (:
) is only used after the construct
identity that specifies each axis, and a space delimits the separate
collapses).
>>> b = a.collapse('area: mean T: maximum', weights='area')
>>> print(b.array)
[[[271.77199724]]]
Grouped collapses¶
A grouped collapse is one for which an axis is not collapsed completely to size 1. Instead the collapse axis is partitioned into groups and each group is collapsed to size 1. The resulting axis will generally have more than one element. For example, creating 12 annual means from a timeseries of 120 months would be a grouped collapse. The groups do not need to be created from adjacent cells, as would be the case when creating 12 multiannual monthly means from a timeseries of 120 months.
The group keyword of collapse
defines the size of the
groups. Groups can be defined in a variety of ways, including with
cf.Query
, cf.TimeDuration
(see the Time duration section)
and cf.Data
instances.
Not every element of the collapse axis needs to be in group. Elements that are not selected by the group keyword are excluded from the result.
>>> y = cf.Y(month=12)
>>> y
<CF TimeDuration: P1Y (Y1201 00:00:00)>
>>> b = a.collapse('T: maximum', group=y)
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(10), latitude(5), longitude(8)) K
Cell methods : area: mean time(10): maximum
Dimension coords: time(10) = [19600601 00:00:00, ..., 19690601 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> b = a.collapse('T: maximum', group=6)
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(20), latitude(5), longitude(8)) K
Cell methods : area: mean time(20): maximum
Dimension coords: time(20) = [19600301 12:00:00, ..., 19690831 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> b = a.collapse('T: maximum', group=cf.djf())
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(10), latitude(5), longitude(8)) K
Cell methods : area: mean time(10): maximum time(10): maximum
Dimension coords: time(10) = [19600115 12:00:00, ..., 19690115 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> c = cf.seasons()
>>> c
[<CF Query: month[(ge 12)  (le 2)]>
<CF Query: month(wi (3, 5))>,
<CF Query: month(wi (6, 8))>,
<CF Query: month(wi (9, 11))>]
>>> b = a.collapse('T: maximum', group=c)
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(40), latitude(5), longitude(8)) K
Cell methods : area: mean time(40): maximum time(40): maximum
Dimension coords: time(40) = [19600115 12:00:00, ..., 19691016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> b = a.collapse('X: mean', group=cf.Data(180, 'degrees'))
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(5), longitude(2)) K
Cell methods : area: mean longitude(2): mean longitude(2): mean
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(2) = [90.0, 270.0] degrees_east
: air_pressure(1) = [850.0] hPa
Groups can be further described with the group_span (to ignore
groups whose actual span is less than a given value) and
group_contiguous (to ignore noncontiguous groups, or any contiguous
group containing overlapping cells) keywords of collapse
.
Climatological statistics¶
Climatological statistics may be derived from corresponding portions of the annual cycle in a set of years (e.g. the average January temperatures in the climatology of 19611990, where the values are derived by averaging the 30 Januarys from the separate years); or from corresponding portions of the diurnal cycle in a set of days (e.g. the average temperatures for each hour in the day for May 1997). A diurnal climatology may also be combined with a multiannual climatology (e.g. the minimum temperature for each hour of the average day in May from a 19611990 climatology).
Calculation requires two or three collapses, depending on the quantity being created, all of which are grouped collapses. Each collapse method needs to indicate its climatological nature with one of the following qualifiers,
Method qualifier  Associated keyword 

within years 
within_years 
within days 
within_days 
over years 
over_years (optional) 
over days 
over_days (optional) 
and the associated keyword to collapse
specifies how the
method is to be applied.
>>> b = a.collapse('T: mean within years T: mean over years',
... within_years=cf.seasons(), weights='T')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(4), latitude(5), longitude(8)) K
Cell methods : area: mean time(4): mean within years time(4): mean over years
Dimension coords: time(4) = [19600115 12:00:00, ..., 19601016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.coordinate('T').bounds.datetime_array)
[[cftime.DatetimeGregorian(19591201 00:00:00) cftime.DatetimeGregorian(19690301 00:00:00)]
[cftime.DatetimeGregorian(19600301 00:00:00) cftime.DatetimeGregorian(19690601 00:00:00)]
[cftime.DatetimeGregorian(19600601 00:00:00) cftime.DatetimeGregorian(19690901 00:00:00)]
[cftime.DatetimeGregorian(19600901 00:00:00) cftime.DatetimeGregorian(19691201 00:00:00)]]
>>> b = a.collapse('T: minimum within years T: variance over years',
... within_years=cf.seasons(), weights='T')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(4), latitude(5), longitude(8)) K2
Cell methods : area: mean time(4): minimum within years time(4): variance over years
Dimension coords: time(4) = [19600115 12:00:00, ..., 19601016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.coordinate('T').bounds.datetime_array)
[[cftime.DatetimeGregorian(19591201 00:00:00) cftime.DatetimeGregorian(19690301 00:00:00)]
[cftime.DatetimeGregorian(19600301 00:00:00) cftime.DatetimeGregorian(19690601 00:00:00)]
[cftime.DatetimeGregorian(19600601 00:00:00) cftime.DatetimeGregorian(19690901 00:00:00)]
[cftime.DatetimeGregorian(19600901 00:00:00) cftime.DatetimeGregorian(19691201 00:00:00)]]
When collapsing over years, it is assumed by default that the each
portion of the annual cycle is collapsed over all years that are
present. This is the case in the above two examples. It is possible,
however, to restrict the years to be included, or group them into
chunks, with the over_years keyword to collapse
.
>>> b = a.collapse('T: mean within years T: mean over years', weights='T',
... within_years=cf.seasons(), over_years=cf.Y(5))
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(8), latitude(5), longitude(8)) K
Cell methods : area: mean time(8): mean within years time(8): mean over years
Dimension coords: time(8) = [19600115 12:00:00, ..., 19651016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.coordinate('T').bounds.datetime_array)
[[cftime.DatetimeGregorian(19591201 00:00:00) cftime.DatetimeGregorian(19640301 00:00:00)]
[cftime.DatetimeGregorian(19600301 00:00:00) cftime.DatetimeGregorian(19640601 00:00:00)]
[cftime.DatetimeGregorian(19600601 00:00:00) cftime.DatetimeGregorian(19640901 00:00:00)]
[cftime.DatetimeGregorian(19600901 00:00:00) cftime.DatetimeGregorian(19641201 00:00:00)]
[cftime.DatetimeGregorian(19641201 00:00:00) cftime.DatetimeGregorian(19690301 00:00:00)]
[cftime.DatetimeGregorian(19650301 00:00:00) cftime.DatetimeGregorian(19690601 00:00:00)]
[cftime.DatetimeGregorian(19650601 00:00:00) cftime.DatetimeGregorian(19690901 00:00:00)]
[cftime.DatetimeGregorian(19650901 00:00:00) cftime.DatetimeGregorian(19691201 00:00:00)]]
>>> b = a.collapse('T: mean within years T: mean over years', weights='T',
... within_years=cf.seasons(), over_years=cf.year(cf.wi(1963, 1968)))
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(4), latitude(5), longitude(8)) K
Cell methods : area: mean time(4): mean within years time(4): mean over years
Dimension coords: time(4) = [19630115 00:00:00, ..., 19631016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(b.coordinate('T').bounds.datetime_array)
[[cftime.DatetimeGregorian(19621201 00:00:00) cftime.DatetimeGregorian(19680301 00:00:00)]
[cftime.DatetimeGregorian(19630301 00:00:00) cftime.DatetimeGregorian(19680601 00:00:00)]
[cftime.DatetimeGregorian(19630601 00:00:00) cftime.DatetimeGregorian(19680901 00:00:00)]
[cftime.DatetimeGregorian(19630901 00:00:00) cftime.DatetimeGregorian(19681201 00:00:00)]]
Similarly for collapses over days, it is assumed by default that the
each portion of the diurnal cycle is collapsed over all days that are
present, But it is possible to restrict the days to be included, or
group them into chunks, with the over_days keyword to
collapse
.
The calculation can be done with multiple collapse calls, which can be useful if the interim stages are needed independently, but be aware that the interim field constructs will have nonCFcompliant cell method constructs.
>>> b = a.collapse('T: standard_deviation within years',
... within_years=cf.seasons(), weights='T')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(40), latitude(5), longitude(8)) K
Cell methods : area: mean time(40): standard_deviation within years
Dimension coords: time(40) = [19600115 12:00:00, ..., 19691016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> c = b.collapse('T: maximum over years')
>>> print(c)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(4), latitude(5), longitude(8)) K
Cell methods : area: mean time(4): standard_deviation within years time(4): maximum over years
Dimension coords: time(4) = [19600115 12:00:00, ..., 19601016 12:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
Other statistical operations¶
Cumulative sums¶
The cumsum
method of the field construct calculates the
cumulative sum of elements along a given axis. The cell bounds of the
axis are updated to describe the ranges over which the sums apply, and
a new sum
cell method construct is added to the resulting field
construct.
>>> a = cf.read('timeseries.nc')[0]
>>> print(a)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(5), longitude(8)) K
Cell methods : area: mean
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> b = a.cumsum('T')
>>> print(b)
Field: air_potential_temperature (ncvar%air_potential_temperature)

Data : air_potential_temperature(time(120), latitude(5), longitude(8)) K
Cell methods : area: mean time(120): sum
Dimension coords: time(120) = [19591216 12:00:00, ..., 19691116 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
>>> print(a.coordinate('T').bounds[1].dtarray)
[[cftime.DatetimeGregorian(19691101 00:00:00)
cftime.DatetimeGregorian(19691201 00:00:00))]]
>>> print(b.coordinate('T').bounds[1].dtarray)
[[cftime.DatetimeGregorian(19591101 00:00:00)
cftime.DatetimeGregorian(19691201 00:00:00))]]
The treatment of missing values can be specified, as well as the positioning of coordinate values in the summed axis of the returned field construct.
Histograms¶
The cf.histogram
function is used to record the distribution of a
set of variables in the form of an Ndimensional histogram.
Each dimension of the histogram is defined by a field construct
returned by the digitize
method of a field construct. This
“digitized” field construct defines a sequence of bins and provides
indices to the bins that each value of one of the variables belongs.
>>> q, t = cf.read('file.nc')
Field: specific_humidity (ncvar%q)

Data : specific_humidity(latitude(5), longitude(8)) 1
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> indices, bins = q.digitize(10, return_bins=True)
>>> print(indices)
Field: long_name=Bin index to which each 'specific_humidity' value belongs (ncvar%q)

Data : long_name=Bin index to which each 'specific_humidity' value belongs(latitude(5), longitude(8))
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_eastg
: time(1) = [20190101 00:00:00]
>>> print(indices.array)
[[0 2 0 0 1 2 1 1]
[1 2 2 4 3 4 0 4]
[7 8 8 9 5 6 3 0]
[1 3 2 4 3 4 0 0]
[0 2 1 2 1 2 2 0]]
>>> print(bins.array)
[[0.003 0.0173]
[0.0173 0.0316]
[0.0316 0.0459]
[0.0459 0.0602]
[0.0602 0.0745]
[0.0745 0.0888]
[0.0888 0.1031]
[0.1031 0.1174]
[0.1174 0.1317]
[0.1317 0.146 ]]
>>> h = cf.histogram(indices)
>>> print(h)
Field: number_of_observations

Data : number_of_observations(specific_humidity(10)) 1
Cell methods : latitude: longitude: point
Dimension coords: specific_humidity(10) = [10.15, ..., 138.85000000000002] 1
>>> print(h.array)
[9 7 9 4 5 1 1 1 2 1]
>>> print(h.coordinate('specific_humidity').bounds.array)
[[0.003 0.0173]
[0.0173 0.0316]
[0.0316 0.0459]
[0.0459 0.0602]
[0.0602 0.0745]
[0.0745 0.0888]
[0.0888 0.1031]
[0.1031 0.1174]
[0.1174 0.1317]
[0.1317 0.146 ]]
Binning operations¶
The bin
method of the field construct groups its data into
bins, where each group is defined by the elements that correspond to
an Ndimensional histogram bin of another set of variables, and collapses the elements in each group to a single
representative value. The same collapse methods and weighting options as
the collapse
method are available.
The result of the binning operation is a field construct whose domain axis and dimension coordinate constructs describe the sizes of the Ndimensional bins of the other set of variables.
>>> q, t = cf.read('file.nc')
Field: specific_humidity (ncvar%q)

Data : specific_humidity(latitude(5), longitude(8)) 0.001 1
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> indices = q.digitize(5)
>>> b = q.bin('range', digitized=indices)
>>> print(b)
Field: specific_humidity

Data : specific_humidity(specific_humidity(5)) 1
Cell methods : latitude: longitude: range
Dimension coords: specific_humidity(5) = [0.0173, ..., 0.1317] 1
>>> print(b.array)
[0.026 0.025 0.025 0.007 0.022]
>>> print(b.coordinate('specific_humidity').bounds.array)
[[0.003 0.0316]
[0.0316 0.0602]
[0.0602 0.0888]
[0.0888 0.1174]
[0.1174 0.146 ]]
>>> p, t = cf.read('file2.nc')
>>> print(t)
Field: air_temperature (ncvar%t)

Data : air_temperature(latitude(5), longitude(8)) degreesC
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> print(p)
Field: air_pressure (ncvar%p)

Data : air_pressure(latitude(5), longitude(8)) hPa
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> t_indices = t.digitize(4)
>>> p_indices = p.digitize(6)
>>> b = q.bin('mean', digitized=[t_indices, p_indices], weights='area')
>>> print(b)
Field: specific_humidity

Data : specific_humidity(air_pressure(6), air_temperature(4)) 1
Cell methods : latitude: longitude: mean
Dimension coords: air_pressure(6) = [966.6225003326126, ..., 1033.6456080043665] hPa
: air_temperature(4) = [12.735821567738295, ..., 9.9702610462581] degreesC
>>> print(b.array)
[[    0.011 ]
[0.131 0.0145 0.0345 0.05052]
[0.05742 0.01727 0.06392 0.0105 ]
[  0.04516 0.05272 0.10194]
[0.124 0.024 0.059 0.006 ]
[  0.08971  ]]
Percentiles¶
Percentiles of the data can be computed along any subset of the axes
with the percentile
method of the field construct.
>>> q, t = cf.read('file.nc')
>>> print(q)
Field: specific_humidity

Data : specific_humidity(latitude(5), longitude(8)) 1
Cell methods : area: mean
Dimension coords: time(1) = [20190101 00:00:00]
: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> p = q.percentile([20, 40, 50, 60, 80])
>>> print(p)
Field: specific_humidity

Data : specific_humidity(long_name=Percentile ranks for latitude, longitude dimensions(5), latitude(1), longitude(1)) 1
Dimension coords: time(1) = [20190101 00:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: long_name=Percentile ranks for latitude, longitude dimensions(5) = [20, ..., 80]
>>> print(p.array)
[[[0.0164]]
[[0.032 ]]
[[0.036 ]]
[[0.0414]]
[[0.0704]]]
>>> p80 = q.percentile(80)
>>> print(p80)
Field: specific_humidity

Data : specific_humidity(latitude(1), longitude(1)) 1
Dimension coords: time(1) = [20190101 00:00:00]
: latitude(1) = [0.0] degrees_north
: longitude(1) = [180.0] degrees_east
: long_name=Percentile ranks for latitude, longitude dimensions(1) = [80]
>>> g = q.where(q<=p80, cf.masked)
>>> print(g.array)
[[        ]
[      0.073  ]
[0.11 0.131 0.124 0.146 0.087 0.103  ]
[      0.072  ]
[        ]]
>>> g.collapse('standard_deviation', weights='area').data
<CF Data(1, 1): [[0.024609938742357642]] 1>
>>> p45 = q.percentile(45, axes='X')
>>> print(p45.array)
[[0.0189 ]
[0.04515]
[0.10405]
[0.04185]
[0.02125]]
>>> g = q.where(q<=p45, cf.masked)
>>> print(g.array)
[[  0.034    0.037 0.024 0.029]
[    0.062 0.046 0.073  0.066]
[0.11 0.131 0.124 0.146    ]
[  0.059  0.07 0.058 0.072  ]
[  0.036  0.035  0.037 0.034 ]]
>>> print(g.collapse('X: mean', weights='X').array)
[[0.031 ]
[0.06175]
[0.12775]
[0.06475]
[0.0355 ]]
>>> bins = q.percentile([0, 10, 50, 90, 100], squeeze=True)
>>> print(bins.array)
[0.003 0.0088 0.036 0.1037 0.146 ]
>>> i = q.digitize(bins, closed_ends=True)
>>> print(i.array)
[[0 1 0 1 1 2 1 1]
[1 2 2 2 2 2 0 2]
[3 3 3 3 2 2 2 1]
[1 2 2 2 2 2 1 1]
[0 2 1 1 1 2 1 1]]
Regridding¶
Regridding, also called remapping or interpolation, is the process of changing the domain of a field construct whilst preserving the qualities of the original data.
The field construct has two regridding methods: regrids
for
regridding data between domains with spherical coordinate systems; and
regridc
for regridding data between domains with Cartesian
coordinate systems. The interpolation is carried by out using the
ESMF package, a
Python interface to the Earth System Modeling Framework regridding
utility.
As with statistical collapses, regridding may be applied over a subset of the domain axes, and the domain axis constructs and coordinate constructs for the nonregridded dimensions remain the same.
Domain ancillary constructs whose data spans the regridding dimensions are also regridded, but field ancillary constructs whose data spans the regridding dimensions are removed from the regridded field construct.
The following regridding methods are available (in this table, “source” and “destination” refer to the domain of the field construct being regridded, and the domain that it is being regridded to, respectively):
Method  Notes 

Linear  One dimensional linear interpolation (only available to Cartesian regridding). 
Bilinear  Two dimensional variant of linear interpolation. 
Trilinear  Three dimensional variant of linear interpolation (only available to Cartesianregridding). 
First order conservative  Preserve the integral of the data across the interpolation from source to destination. It uses the proportion of the area of the overlapping source and destination cells to determine appropriate weights. In particular, the weight of a source cell is the ratio of the area of intersection of the source and destination cells to the area of the whole destination cell. 
Patch  A second degree polynomial regridding method, which uses a least squares algorithm to calculate the polynomial. This method gives better derivatives in the resulting destination data than the bilinear method. 
Nearest neighbour  Nearest neighbour interpolation that is useful for extrapolation of categorical data. Either each destination point is mapped to the closest source; or each source point is mapped to the closest destination point. In the latter case, a given destination point may receive input from multiple source points, but no source point will map to more than one destination point. 
Spherical regridding¶
Regridding from and to spherical coordinate systems using the
regrids
method is only available for the ‘X’ and ‘Y’ axes
simultaneously. All other axes are unchanged. The calculation of the
regridding weights is based on areas and distances on the surface of
the sphere, rather in Euclidean space.
The following combinations of spherical source and destination domain
coordinate systems are available to the regrids
method:
The most convenient usage is for the destination domain to be exist in another field construct. In this case, the regridding command is very simple:
>>> a = cf.read('air_temperature.nc')[0]
>>> b = cf.read('precipitation_flux.nc')[0]
>>> print(a)
Field: air_temperature (ncvar%tas)

Data : air_temperature(time(2), latitude(73), longitude(96)) K
Cell methods : time(2): mean
Dimension coords: time(2) = [18600116 00:00:00, 18600216 00:00:00] 360_day
: latitude(73) = [90.0, ..., 90.0] degrees_north
: longitude(96) = [0.0, ..., 356.25] degrees_east
: height(1) = [2.0] m
>>> print(b)
Field: precipitation_flux (ncvar%tas)

Data : precipitation_flux(time(1), latitude(64), longitude(128)) kg m2 day1
Cell methods : time(1): mean (interval: 1.0 month)
Dimension coords: time(1) = [04501116 00:00:00] noleap
: latitude(64) = [87.86380004882812, ..., 87.86380004882812] degrees_north
: longitude(128) = [0.0, ..., 357.1875] degrees_east
: height(1) = [2.0] m
>>> c = a.regrids(b, 'conservative')
>>> print(c)
Field: air_temperature (ncvar%tas)

Data : air_temperature(time(2), latitude(64), longitude(128)) K
Cell methods : time(2): mean
Dimension coords: time(2) = [18600116 00:00:00, 18600216 00:00:00] 360_day
: latitude(64) = [87.86380004882812, ..., 87.86380004882812] degrees_north
: longitude(128) = [0.0, ..., 357.1875] degrees_east
: height(1) = [2.0] m
It is generally not necessary to specify which are the ‘X’ and ‘Y’
axes in the domains of both the source and destination field
constructs, since they will be automatically identified by their
metadata. However, in cases when this is not possible (such as for
tripolar domains) the src_axes or dst_axes keywords of the
regrids
method can be used.
It may be that the required destination domain does not exist in a field construct. In this case, the latitude and longitudes of the destination domain may be defined solely by dimension or auxiliary coordinate constructs.
>>> import numpy
>>> lat = cf.DimensionCoordinate(data=cf.Data(numpy.arange(90, 92.5, 2.5), 'degrees_north'))
>>> lon = cf.DimensionCoordinate(data=cf.Data(numpy.arange(0, 360, 5.0), 'degrees_east'))
>>> c = a.regrids({'latitude': lat, 'longitude': lon}, 'bilinear')
Field: air_temperature (ncvar%tas)

Data : air_temperature(time(2), latitude(73), longitude(72)) K
Cell methods : time(2): mean
Dimension coords: time(2) = [18600116 00:00:00, 18600216 00:00:00] 360_day
: latitude(73) = [90.0, ..., 90.0] degrees_north
: longitude(72) = [0.0, ..., 355.0] degrees_east
: height(1) = [2.0] m
A destination domain defined by two dimensional (curvilinear) latitude and longitude auxiliary coordinate constructs can also be specified in a similar manner.
An axis is cyclic if cells at both of its ends are actually geographically adjacent. In spherical regridding, only the ‘X’ axis has the potential for being cyclic. For example, a longitude cell spanning 359 to 360 degrees east is proximate to the cell spanning 0 to 1 degrees east.
When a cyclic dimension can not be automatically detected, such as
when its dimension coordinate construct does not have bounds,
cyclicity may be set with the src_cyclic or dst_cyclic keywords of
the regrids
method.
To find out whether a dimension is cyclic use the iscyclic
method of the field construct, or to manually set its cyclicity use
the cyclic
method. If the destination domain has been defined by a
dictionary of dimension coordinate constructs, then cyclicity can be
registered by setting a period of cyclicity with the
period
method of the dimension coordinate
construct.
Cartesian regridding¶
Cartesian regridding with the regridc
method is very
similar to spherical regridding, except
regridding dimensions are not restricted to the horizontal plane, the
source and destination domains are assumed to be Euclidian spaces for the purpose of
calculating regridding weights, and all dimensions are assumed to be
noncyclic by default.
Cartesian regridding can be done in up to three dimensions. It is often used for regridding along the time dimension. A plane projection coordinate system can be regridded with Cartesian regridding, which will produce similar results to using using spherical regridding.
>>> time = cf.DimensionCoordinate()
>>> time.standard_name='time'
>>> time.set_data(cf.Data(numpy.arange(0.5, 60, 1),
... units='days since 18600101', calendar='360_day'))
>>> time
<CF DimensionCoordinate: time(60) days since 18600101 360_day>
>>> c = a.regridc({'T': time}, axes='T', method='bilinear')
Field: air_temperature (ncvar%tas)

Data : air_temperature(time(60), latitude(73), longitude(96)) K
Cell methods : time(60): mean
Dimension coords: time(60) = [18600101 12:00:00, ..., 18600230 12:00:00] 360_day
: latitude(73) = [90.0, ..., 90.0] degrees_north
: longitude(96) = [0.0, ..., 356.25] degrees_east
: height(1) = [2.0] m
>>> c = a.regridc({'T': time}, axes='T', method='conservative') # Raises Exception
ValueError: Destination coordinates must have contiguous, nonoverlapping bounds for conservative regridding.
>>> bounds = time.create_bounds()
>>> time.set_bounds(bounds)
>>> c = a.regridc({'T': time}, axes='T', method='conservative')
>>> print(c)
Field: air_temperature (ncvar%tas)

Data : air_temperature(time(60), latitude(73), longitude(96)) K
Cell methods : time(60): mean
Dimension coords: time(60) = [18600101 12:00:00, ..., 18600230 12:00:00] 360_day
: latitude(73) = [90.0, ..., 90.0] degrees_north
: longitude(96) = [0.0, ..., 356.25] degrees_east
: height(1) = [2.0] m
Cartesian regridding to the dimension of another field construct is also possible, similarly to spherical regridding.
Regridding masked data¶
The data mask of the source field construct is taken into account,
such that the regridded data will be masked in regions where the
source data is masked. By default the mask of the destination field
construct is not used, but can be taken into account by setting
use_dst_mask keyword to the regrids
or regridc
methods. For example, this is useful when part of the destination
domain is not being used (such as the land portion of an ocean grid).
For conservative regridding, masking is done on cells. Masking a destination cell means that the cell won’t participate in the regridding. For all other regridding methods, masking is done on points. For these methods, masking a destination point means that the point will not participate in the regridding.
Vertical regridding¶
The only option for regridding along a vertical axis is to use Cartesian regridding. However, care must be taken to ensure that the vertical axis is transformed so that it’s coordinate values are vary linearly. For example, to regrid data on one set of vertical pressure coordinates to another set, the pressure coordinates may first be transformed into the logarithm of pressure, and then changed back to pressure coordinates after the regridding operation.
>>> v = cf.read('vertical.nc')[0]
>>> print(v)
Field: eastward_wind (ncvar%ua)

Data : eastward_wind(time(3), air_pressure(5), grid_latitude(11), grid_longitude(10)) m s1
Cell methods : time(3): mean
Dimension coords: time(3) = [19790501 12:00:00, 19790502 12:00:00, 19790503 12:00:00] gregorian
: air_pressure(5) = [850.0, ..., 50.0] hPa
: grid_latitude(11) = [23.32, ..., 18.92] degrees
: grid_longitude(10) = [20.54, ..., 16.58] degrees
Auxiliary coords: latitude(grid_latitude(11), grid_longitude(10)) = [[67.12, ..., 66.07]] degrees_north
: longitude(grid_latitude(11), grid_longitude(10)) = [[45.98, ..., 31.73]] degrees_east
Coord references: grid_mapping_name:rotated_latitude_longitude
>>> z_p = v.construct('Z')
>>> print(z_p.array)
[850. 700. 500. 250. 50.]
>>> z_ln_p = z_p.log()
>>> print(z_ln_p.array)
[6.74523635 6.55108034 6.2146081 5.52146092 3.91202301]
>>> _ = v.replace_construct('Z', z_ln_p)
>>> new_z_p = cf.DimensionCoordinate(data=cf.Data([800, 705, 632, 510, 320.], 'hPa'))
>>> new_z_ln_p = new_z_p.log()
>>> new_v = v.regridc({'Z': new_z_ln_p}, axes='Z', method='bilinear')
>>> new_v.replace_construct('Z', new_z_p)
>>> print(new_v)
Field: eastward_wind (ncvar%ua)

Data : eastward_wind(time(3), Z(5), grid_latitude(11), grid_longitude(10)) m s1
Cell methods : time(3): mean
Dimension coords: time(3) = [19790501 12:00:00, 19790502 12:00:00, 19790503 12:00:00] gregorian
: Z(5) = [800.0, ..., 320.0] hPa
: grid_latitude(11) = [23.32, ..., 18.92] degrees
: grid_longitude(10) = [20.54, ..., 16.58] degrees
Auxiliary coords: latitude(grid_latitude(11), grid_longitude(10)) = [[67.12, ..., 66.07]] degrees_north
: longitude(grid_latitude(11), grid_longitude(10)) = [[45.98, ..., 31.73]] degrees_east
Coord references: grid_mapping_name:rotated_latitude_longitude
Note that the replace_construct
method of the field construct
is used to easily replace the vertical dimension coordinate construct,
without having to manually match up the corresponding domain axis
construct and construct key.
Mathematical operations¶
Arithmetical operations¶
A field construct may be arithmetically combined with another field construct, or any other object that is broadcastable to its data. See the comprehensive list of available binary operations.
When combining with another field construct, its data is actually combined, but only after being transformed so that it is broadcastable to the first field construct’s data. This is done by using the metadata constructs of the two field constructs to create a mapping of physically compatible dimensions between the fields, and then manipulating the dimensions of the other field construct’s data to ensure that they are broadcastable.
In any case, a field construct may appear as the left or right operand, and augmented assignments are possible.
Automatic units conversions are also carried out between operands during operations, and if one operand has no units then the units of the other are assumed.
>>> q, t = cf.read('file.nc')
>>> t.data.stats()
{'min': <CF Data(): 260.0 K>,
'mean': <CF Data(): 269.9244444444445 K>,
'max': <CF Data(): 280.0 K>,
'range': <CF Data(): 20.0 K>,
'mid_range': <CF Data(): 270.0 K>,
'standard_deviation': <CF Data(): 5.942452002538104 K>,
'sample_size': 90}
>>> x = t + t
>>> x
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K>
>>> x.min()
<CF Data(): 520.0 K>
>>> (t  2).min()
<CF Data(): 258.0 K>
>>> (2 + t).min()
<CF Data(): 262.0 K>
>>> (t * list(range(9))).min()
<CF Data(): 0.0 K>
>>> (t + cf.Data(numpy.arange(20, 29), '0.1 K')).min()
<CF Data(): 262.6 K>
>>> u = t.copy()
>>> u.transpose(inplace=True)
>>> u.Units = 273.15
>>> u[0]
<CF Field: air_temperature(grid_longitude(1), grid_latitude(10), atmosphere_hybrid_height_coordinate(1)) K @ 273.15>
>>> t + u[0]
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K>
If the physical nature of the result differs from both operands, then the “standard_name” and “long_name” properties are removed. This is the case if the units of the result differ from bother operands, or if they have different standard names.
>>> t.identities()
['air_temperature',
'Conventions=CF1.7',
'project=research',
'units=K',
'standard_name=air_temperature',
'ncvar%ta']
>>> u = t * cf.Data(10, 'ms1')
>>> u.identities()
['Conventions=CF1.7',
'project=research',
'units=1000 s1.K',
'ncvar%ta']
Note
Care must be taken when combining a field construct with a
numpy
array or a Data
instance, due to the ways in which
both of these objects allow themselves to be combined with
other types:
 If the field construct is on the left hand side (LHS) of
the operation then, as expected, a field construct is
returned whose data is the combination of the original
field construct’s data and the
numpy
array orData
instance on the right hand side (RHS).  If, however, the field construct is on the RHS then a
numpy
array orData
instance (which ever type is on the LHS) is returned, containing the same data as in the first case.
>>> import numpy
>>> q, t = cf.read('fil.nc')
>>> a = numpy.array(1000)
>>> type(t * a)
cf.field.Field
>>> type(a + t)
numpy.ndarray
>>> b = numpy.random.randn(t.size).reshape(t.shape)
>>> type(t * b)
cf.field.Field
>>> type(b * t)
numpy.ndarray
>>> type(t  cf.Data(b))
cf.field.Field
>>> type(cf.Data(b) * t)
cf.data.data.Data
Unary operations¶
Python unary operators also work on the field construct’s data, returning a new field construct with modified data values. See the comprehensive list of available unary operations.
>>> q, t = cf.read('file.nc')
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> print(abs(q).array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
Relational operations¶
A field construct may compared with another field construct, or any other object that is broadcastable to its data. See the comprehensive list of available relational operations. The result is a field construct with a boolean data values.
When comparing with another field construct, its data is actually combined, but only after being transformed so that it is broadcastable to the first field construct’s data. This is done by using the metadata constructs of the two field constructs to create a mapping of physically compatible dimensions between the fields, and then manipulating the dimensions of the other field construct’s data to ensure that they are broadcastable.
In any case, a field construct may appear as the left or right operand.
Automatic units conversions are also carried out between operands during operations, and if one operand has no units then the units of the other are assumed.
>>> q, t = cf.read('file.nc')
>>> print(q.array)
[[0.007 0.034 0.003 0.014 0.018 0.037 0.024 0.029]
[0.023 0.036 0.045 0.062 0.046 0.073 0.006 0.066]
[0.11 0.131 0.124 0.146 0.087 0.103 0.057 0.011]
[0.029 0.059 0.039 0.07 0.058 0.072 0.009 0.017]
[0.006 0.036 0.019 0.035 0.018 0.037 0.034 0.013]]
>>> print((q == q).array)
[[ True True True True True True True True]
[ True True True True True True True True]
[ True True True True True True True True]
[ True True True True True True True True]
[ True True True True True True True True]]
>>> print((q < 0.05).array)
[[ True True True True True True True True]
[ True True True False True False True False]
[False False False False False False False True]
[ True False True False False False True True]
[ True True True True True True True True]]
>>> print((q >= q[0]).array)
[[ True True True True True True True True]
[ True True True True True True False True]
[ True True True True True True True False]
[ True True True True True True False False]
[False True True True True True True False]]
The “standard_name” and “long_name” properties are removed from the result, which also has no units.
>>> q.identities()
['specific_humidity',
'Conventions=CF1.7',
'project=research',
'units=1',
'standard_name=specific_humidity',
'ncvar%q']
>>> r = q > q.mean()
>>> r.identities()
['Conventions=CF1.7',
'project=research',
'units=',
'ncvar%q']
Arithmetical and relational operations with insufficient metadata¶
If both operands of an arithmetical or relational operation are field constructs with insufficient metadata to create a mapping of physically compatible dimensions, there are various techniques that allow the operation to proceed.
 Option 1: The operation may applied to the field constructs’ data instead. See below for more details.
 Option 2: If the mapping is not possible due to the absence of
standard_name
properties (orid
attributes) on metadata constructs that are known to correspond, then setting “relaxed identities” with thecf.RELAXED_IDENTITIES
function may help.  Option 3: Add more metadata to the field and metadata constructs.
For Option 1 the resulting data may then be inserted into a copy
of one of the field constructs, either with the set_data
method of the field construct, or with indexed assignment. The former technique is faster and more memory
efficient, but the latter technique allows broadcasting.
Note that it is assumed, and not checked, that the dimensions of both
Data
instance operands are already in the correct order for
physically meaningful broadcasting to occur.
>>> t.min()
<CF Data(): 260.0 K>
>>> u = t.copy()
>>> new_data = t.data + t.data
>>> u.set_data(new_data)
>>> u
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K>
>>> u.min()
<CF Data(): 520.0 K>
>>> u[...] = new_data
>>> u.min()
<CF Data(): 520.0 K>
For augmented assignments, the field construct data may be changed inplace.
>>> t.data = t.data
>>> t.min()
<CF Data(): 0.0 K>
Trigonometrical functions¶
The field construct and metadata constructs have cos
,
sin
and tan
methods for applying trigonometrical
functions elementwise to the data, preserving the metadata but
changing the construct’s units.
>>> q, t = cf.read('file.nc')
>>> lat = q.dimension_coordinate('latitude')
>>> lat.data
<CF Data(5): [75.0, ..., 75.0] degrees_north>
>>> sin_lat = lat.sin()
>>> sin_lat.data
<CF Data(5): [0.9659258262890683, ..., 0.9659258262890683] 1>
The “standard_name” and “long_name” properties are removed from the result.
Exponential and logarithmic functions¶
The field construct and metadata constructs have exp
and
log
methods for applying exponential and logarithmic
functions respectively elementwise to the data, preserving the
metadata but changing the construct’s units where required.
>>> q
<CF Field: specific_humidity(latitude(5), longitude(8)) 1>
>>> q.log()
<CF Field: specific_humidity(latitude(5), longitude(8)) ln(re 1)>
>>> q.exp()
<CF Field: specific_humidity(latitude(5), longitude(8)) 1>
>>> t
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K>
>>> t.log(base=10)
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) lg(re 1 K)>
>>> t.exp() # Raises Exception
ValueError: Can't take exponential of dimensional quantities: <Units: K>
The “standard_name” and “long_name” properties are removed from the result.
Rounding and truncation¶
The field construct and metadata constructs the following methods to round and truncate their data:
Method  Description 

ceil 
The ceiling of the data, elementwise. 
clip 
Limit the values in the data. 
floor 
Floor the data array, elementwise. 
rint 
Round the data to the nearest integer, elementwise. 
round 
Round the data to the given number of decimals. 
trunc 
Truncate the data, elementwise. 
Convolution filters¶
A convolution of the
field construct data with a filter along a single domain axis can be
calculated, which also updates the bounds of a relevant dimension
coordinate construct to account for the width of the
filter. Convolution filters are carried with the
convolution_filter
method of the field construct.
>>> print(q)
Field: specific_humidity (ncvar%q)

Data : specific_humidity(latitude(5), longitude(8)) 1
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> q.iscyclic('X')
True
>>> r = q.convolution_filter([0.1, 0.15, 0.5, 0.15, 0.1], axis='X')
>>> print(r)
Field: specific_humidity (ncvar%q)

Data : specific_humidity(latitude(5), longitude(8)) 1
Cell methods : area: mean
Dimension coords: latitude(5) = [75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: time(1) = [20190101 00:00:00]
>>> print(q.dimension_coordinate('X').bounds.array)
[[ 0. 45.]
[ 45. 90.]
[ 90. 135.]
[135. 180.]
[180. 225.]
[225. 270.]
[270. 315.]
[315. 360.]]
>>> print(r.dimension_coordinate('X').bounds.array)
[[90. 135.]
[45. 180.]
[ 0. 225.]
[ 45. 270.]
[ 90. 315.]
[135. 360.]
[180. 405.]
[225. 450.]]
The convolution_filter
method of the field construct also has
options to
 Specify how the input array is extended when the filter overlaps a border, and
 Control the placement position of the filter window.
Note that the scipy.signal.windows
package has suite of window
functions for creating weights for filtering:
>>> from scipy.signal import windows
>>> exponential_weights = windows.exponential(3)
>>> print(exponential_weights)
[0.36787944 1. 0.36787944]
>>> r = q.convolution_filter(exponential_weights, axis='Y')
>>> print(r.array)
[[        ]
[0.06604 0.0967 0.09172 0.12086 0.08463 0.1245 0.0358 0.08072]
[0.12913 0.16595 0.1549 0.19456 0.12526 0.15634 0.06252 0.04153]
[0.07167 0.12044 0.09161 0.13659 0.09663 0.1235 0.04248 0.02583]
[        ]]
The magnitude of the integral of the filter (i.e. the sum of the
weights defined by the weights parameter) affects the convolved
values. For example, weights of [0.2, 0.2 0.2, 0.2, 0.2]
will
produce a 5point (nonweighted) running mean; and weights of [1, 1,
1, 1, 1]
will produce a 5point running sum. Note that the weights
returned by functions of the scipy.signal.windows
package do not
necessarily sum to 1.
Derivatives¶
The derivative along a dimension of the field construct’s data can be
calculated as a centred finite difference with the derivative
method. If the axis is cyclic then the
derivative wraps around by default, otherwise it may be forced to wrap
around; a onesided difference is calculated at the edges; or missing
data is inserted.
>>> r = q.derivative('X')
>>> r = q.derivative('Y', one_sided_at_boundary=True)
Relative vorticity¶
The relative vorticity of the wind may be calculated on a global or limited area domain, and in Cartesian or spherical polar coordinate systems.
The relative vorticity of wind defined on a Cartesian domain (such as a plane projection) is defined as
where \(x\) and \(y\) are points on along the ‘X’ and ‘Y’ Cartesian dimensions respectively; and \(u\) and \(v\) denote the ‘X’ and ‘Y’ components of the horizontal winds.
If the wind field field is defined on a spherical latitudelongitude domain then a correction factor is included:
where \(u\) and \(v\) denote the longitudinal and latitudinal components of the horizontal wind field; \(a\) is the radius of the Earth; and \(\phi\) is the latitude at each point.
The cf.relative_vorticity
function creates a relative vorticity
field construct from field constructs containing the wind components
using finite differences to approximate the derivatives. Dimensions
other than ‘X’ and ‘Y’ remain unchanged by the operation.
>>> u, v = cf.read('wind_components.nc')
>>> zeta = cf.relative_vorticity(u, v)
>>> print(zeta)
Field: atmosphere_relative_vorticity (ncvar%va)

Data : atmosphere_relative_vorticity(time(1), atmosphere_hybrid_height_coordinate(1), latitude(9), longitude(8)) s1
Dimension coords: time(1) = [19780901 06:00:00] 360_day
: atmosphere_hybrid_height_coordinate(1) = [9.9982] m
: latitude(9) = [90, ..., 70] degrees_north
: longitude(8) = [0, ..., 315] degrees_east
Coord references: standard_name:atmosphere_hybrid_height_coordinate
Domain ancils : atmosphere_hybrid_height_coordinate(atmosphere_hybrid_height_coordinate(1)) = [9.9982] m
: long_name=vertical coordinate formula term: b(k)(atmosphere_hybrid_height_coordinate(1)) = [0.9989]
: surface_altitude(latitude(9), longitude(8)) = [[2816.25, ..., 2325.98]] m
>>> print(zeta.array.round(8))
[[[[        ]
[2.04e06 1.58e06 5.19e06 4.74e06 4.76e06 2.27e06 9.55e06 3.64e06]
[8.4e07 4.37e06 3.55e06 2.31e06 3.6e07 8.58e06 2.45e06 6.5e07 ]
[ 4.08e06 4.55e06 2.75e06 4.15e06 5.16e06 4.17e06 4.67e06 7e07 ]
[1.4e07 3.5e07 1.27e06 1.29e06 2.01e06 4.4e07 2.5e06 2.05e06]
[7.3e07 1.59e06 1.77e06 3.13e06 7.9e07 5.1e07 2.79e06 1.12e06]
[3.7e07 7.1e07 1.52e06 6.5e07 2.75e06 4.3e07 1.62e06 6.6e07 ]
[ 9.5e07 8e07 6.6e07 7.2e07 2.13e06 4.5e07 7.5e07 1.11e06]
[        ]]]]
For axes that are not cyclic, missing data is inserted at the edges by default; otherwise it may be forced to wrap around, or a onesided difference is calculated at the edges. If the longitudinal axis is cyclic then the derivative wraps around by default.