cf.Field.collapse¶

Field.collapse(method, axes=None, squeeze=False, mtol=1, weights=None, ddof=1, a=None, inplace=False, group=None, regroup=False, within_days=None, within_years=None, over_days=None, over_years=None, coordinate=None, group_by=None, group_span=None, group_contiguous=1, measure=False, scale=None, radius='earth', great_circle=False, verbose=None, remove_vertical_crs=True, _create_zero_size_cell_bounds=False, _update_cell_methods=True, i=False, _debug=False, **kwargs)[source]¶

Collapse axes of the field.

Collapsing one or more dimensions reduces their size and replaces the data along those axes with representative statistical values. The result is a new field construct with consistent metadata for the collapsed values.

By default all axes with size greater than 1 are collapsed completely (i.e. to size 1) with a given collapse method.

Example:

Find the minimum of the entire data:

>>> b = a.collapse('minimum')

The collapse can also be applied to any subset of the field construct’s dimensions. In this case, the domain axis and coordinate constructs for the non-collapsed dimensions remain the same. This is implemented either with the axes keyword, or with a CF-netCDF cell methods-like syntax for describing both the collapse dimensions and the collapse method in a single string. The latter syntax uses construct identities instead of netCDF dimension names to identify the collapse axes.

Statistics may be created to represent variation over one dimension or a combination of dimensions.

Example:

Two equivalent techniques for creating a field construct of temporal maxima at each horizontal location:

>>> b = a.collapse('maximum', axes='T')
>>> b = a.collapse('T: maximum')

Example:

Find the horizontal maximum, with two equivalent techniques.

>>> b = a.collapse('maximum', axes=['X', 'Y'])
>>> b = a.collapse('X: Y: maximum')

Variation over horizontal area may also be specified by the special identity ‘area’. This may be used for any horizontal coordinate reference system.

Example:

Find the horizontal maximum using the special identity ‘area’:

>>> b = a.collapse('area: maximum')

Collapse methods

The following collapse methods are available (see https://ncas-cms.github.io/cf-python/analysis.html#collapse-methods for precise definitions):

Method	Description
`'maximum'`	The maximum of the values.
`'minimum'`	The minimum of the values.
`'maximum_absolute_value'`	The maximum of the absolute values.
`'minimum_absolute_value'`	The minimum of the absolute values.
`'mid_range'`	The average of the maximum and the minimum of the values.
`'median'`	The median of the values.
`'range'`	The absolute difference between the maximum and the minimum of the values.
`'sum'`	The sum of the values.
`'sum_of_squares'`	The sum of the squares of values.
`'sample_size'`	The sample size, i.e. the number of non-missing values.
`'sum_of_weights'`	The sum of weights, as would be used for other calculations.
`'sum_of_weights2'`	The sum of squares of weights, as would be used for other calculations.
`'mean'`	The weighted or unweighted mean of the values.
`'mean_absolute_value'`	The mean of the absolute values.
`'mean_of_upper_decile'`	The mean of the upper group of data values defined by the upper tenth of their distribution.
`'variance'`	The weighted or unweighted variance of the values, with a given number of degrees of freedom.
`'standard_deviation'`	The weighted or unweighted standard deviation of the values, with a given number of degrees of freedom.
`'root_mean_square'`	The square root of the weighted or unweighted mean of the squares of the values.
`'integral'`	The integral of values.

Data type and missing data

In all collapses, missing data array elements are accounted for in the calculation.

Any collapse method that involves a calculation (such as calculating a mean), as opposed to just selecting a value (such as finding a maximum), will return a field containing double precision floating point numbers. If this is not desired then the data type can be reset after the collapse with the dtype attribute of the field construct.

Collapse weights

The calculations of means, standard deviations and variances are, by default, not weighted. For weights to be incorporated in the collapse, the axes to be weighted must be identified with the weights keyword.

Weights are either derived from the field construct’s metadata (such as cell sizes), or may be provided explicitly in the form of other field constructs containing data of weights values. In either case, the weights actually used are those derived by the weights method of the field construct with the same weights keyword value. Collapsed axes that are not identified by the weights keyword are unweighted during the collapse operation.

Example:

Create a weighted time average:

>>> b = a.collapse('T: mean', weights=True)

Example:

Calculate the mean over the time and latitude axes, with weights only applied to the latitude axis:

>>> b = a.collapse('T: Y: mean', weights='Y')

Example

Alternative syntax for specifying area weights:

>>> b = a.collapse('area: mean', weights=True)

An alternative technique for specifying weights is to set the weights keyword to the output of a call to the weights method.

Example

Alternative syntax for specifying weights:

>>> b = a.collapse('area: mean', weights=a.weights('area'))

Multiple collapses

Multiple collapses normally require multiple calls to collapse: one on the original field construct and then one on each interim field construct.

Example:

Calculate the temporal maximum of the weighted areal means using two independent calls:

>>> b = a.collapse('area: mean', weights=True).collapse('T: maximum')

If preferred, multiple collapses may be carried out in a single call by using the CF-netCDF cell methods-like syntax (note that the colon (:) is only used after the construct identity that specifies each axis, and a space delimits the separate collapses).

Example:

Calculate the temporal maximum of the weighted areal means in a single call, using the cf-netCDF cell methods-like syntax:

>>> b =a.collapse('area: mean T: maximum', weights=True)

Grouped collapses

A grouped collapse is one for which as axis is not collapsed completely to size 1. Instead the collapse axis is partitioned into non-overlapping groups and each group is collapsed to size 1. The resulting axis will generally have more than one element. For example, creating 12 annual means from a timeseries of 120 months would be a grouped collapse.

Selected statistics for overlapping groups can be calculated with the moving_window method.

The group keyword defines the size of the groups. Groups can be defined in a variety of ways, including with Query, TimeDuration and Data instances.

An element of the collapse axis can not be a member of more than one group, and may be a member of no groups. Elements that are not selected by the group keyword are excluded from the result.

Example:

Create annual maxima from a time series, defining a year to start on 1st December.

>>> b = a.collapse('T: maximum', group=cf.Y(month=12))

Example:

Find the maximum of each group of 6 elements along an axis.

>>> b = a.collapse('T: maximum', group=6)

Example:

Create December, January, February maxima from a time series.

>>> b = a.collapse('T: maximum', group=cf.djf())

Example:

Create maxima for each 3-month season of a timeseries (DJF, MAM, JJA, SON).

>>> b = a.collapse('T: maximum', group=cf.seasons())

Example:

Calculate zonal means for the western and eastern hemispheres.

>>> b = a.collapse('X: mean', group=cf.Data(180, 'degrees'))

Groups can be further described with the group_span parameter (to include groups whose actual span is not equal to a given value) and the group_contiguous parameter (to include non-contiguous groups, or any contiguous group containing overlapping cells).

Climatological statistics

Climatological statistics may be derived from corresponding portions of the annual cycle in a set of years (e.g. the average January temperatures in the climatology of 1961-1990, where the values are derived by averaging the 30 Januarys from the separate years); or from corresponding portions of the diurnal cycle in a set of days (e.g. the average temperatures for each hour in the day for May 1997). A diurnal climatology may also be combined with a multiannual climatology (e.g. the minimum temperature for each hour of the average day in May from a 1961-1990 climatology).

Calculation requires two or three collapses, depending on the quantity being created, all of which are grouped collapses. Each collapse method needs to indicate its climatological nature with one of the following qualifiers,

Method qualifier	Associated keyword
`within years`	within_years
`within days`	within_days
`over years`	over_years (optional)
`over days`	over_days (optional)

and the associated keyword specifies how the method is to be applied.

Example

Calculate the multiannual average of the seasonal means:

>>> b = a.collapse('T: mean within years T: mean over years',
...                within_years=cf.seasons(), weights=True)

Example:

Calculate the multiannual variance of the seasonal minima. Note that the units of the result have been changed from ‘K’ to ‘K2’:

>>> b = a.collapse('T: minimum within years T: variance over years',
...                within_years=cf.seasons(), weights=True)

When collapsing over years, it is assumed by default that each portion of the annual cycle is collapsed over all years that are present. This is the case in the above two examples. It is possible, however, to restrict the years to be included, or group them into chunks, with the over_years keyword.

Example:

Calculate the multiannual average of the seasonal means in 5 year chunks:

>>> b = a.collapse(
...     'T: mean within years T: mean over years', weights=True,
...     within_years=cf.seasons(), over_years=cf.Y(5)
... )

Example:

Calculate the multiannual average of the seasonal means, restricting the years from 1963 to 1968:

>>> b = a.collapse(
...     'T: mean within years T: mean over years', weights=True,
...     within_years=cf.seasons(),
...     over_years=cf.year(cf.wi(1963, 1968))
... )

Similarly for collapses over days, it is assumed by default that each portion of the diurnal cycle is collapsed over all days that are present, But it is possible to restrict the days to be included, or group them into chunks, with the over_days keyword.

The calculation can be done with multiple collapse calls, which can be useful if the interim stages are needed independently, but be aware that the interim field constructs will have non-CF-compliant cell method constructs.

Example:

Calculate the multiannual maximum of the seasonal standard deviations with two separate collapse calls:

>>> b = a.collapse('T: standard_deviation within years',
...                within_years=cf.seasons(), weights=True)

New in version 1.0.

Parameters

method: str

Define the collapse method. All of the axes specified by the axes parameter are collapsed simultaneously by this method. The method is given by one of the following strings (see https://ncas-cms.github.io/cf-python/analysis.html#collapse-methods for precise definitions):

method	Description	Weighted
`'maximum'`	The maximum of the values.	Never
`'minimum'`	The minimum of the values.	Never
`'maximum_absolute_value'`	The maximum of the absolute values.	Never
`'minimum_absolute_value'`	The minimum of the absolute values.	Never
`'mid_range'`	The average of the maximum and the minimum of the values.	Never
`'median'`	The median of the values.	Never
`'range'`	The absolute difference between the maximum and the minimum of the values.	Never
`'sum'`	The sum of the values.	Never
`'sum_of_squares'`	The sum of the squares of values.	Never
`'sample_size'`	The sample size, i.e. the number of non-missing values.	Never
`'sum_of_weights'`	The sum of weights, as would be used for other calculations.	Never
`'sum_of_weights2'`	The sum of squares of weights, as would be used for other calculations.	Never
`'mean'`	The weighted or unweighted mean of the values.	May be
`'mean_absolute_value'`	The mean of the absolute values.	May be
`'mean_of_upper_decile'`	The mean of the upper group of data values defined by the upper tenth of their distribution.	May be
`'variance'`	The weighted or unweighted variance of the values, with a given number of degrees of freedom.	May be
`'standard_deviation'`	The weighted or unweighted standard deviation of the values, with a given number of degrees of freedom.	May be
`'root_mean_square'`	The square root of the weighted or unweighted mean of the squares of the values.	May be
`'integral'`	The integral of values.	Always

Collapse methods that are “Never” weighted ignore the weights parameter, even if it is set.
Collapse methods that “May be” weighted will only be weighted if the weights parameter is set.
Collapse methods that are “Always” weighted require the weights parameter to be set.

An alternative form of providing the collapse method is to provide a CF cell methods-like string. In this case an ordered sequence of collapses may be defined and both the collapse methods and their axes are provided. The axes are interpreted as for the axes parameter, which must not also be set. For example:

>>> g = f.collapse(
...     'time: max (interval 1 hr) X: Y: mean dim3: sd')

is equivalent to:

>>> g = f.collapse('max', axes='time')
>>> g = g.collapse('mean', axes=['X', 'Y'])
>>> g = g.collapse('sd', axes='dim3')

Climatological collapses are carried out if a method string contains any of the modifiers 'within days', 'within years', 'over days' or 'over years'. For example, to collapse a time axis into multiannual means of calendar monthly minima:

>>> g = f.collapse(
...     'time: minimum within years T: mean over years',
...     within_years=cf.M()
... )

which is equivalent to:

>>> g = f.collapse(
...     'time: minimum within years', within_years=cf.M())
>>> g = g.collapse('mean over years', axes='T')

axes: (sequence of) str, optional

The axes to be collapsed, defined by those which would be selected by passing each given axis description to a call of the field construct’s domain_axis method. For example, for a value of 'X', the domain axis construct returned by f.domain_axis('X') is selected. If a selected axis has size 1 then it is ignored. By default all axes with size greater than 1 are collapsed.

Parameter example:: axes='X'
Parameter example:: axes=['X']
Parameter example:: axes=['X', 'Y']
Parameter example:: axes=['Z', 'time']

If the axes parameter has the special value 'area' then it is assumed that the X and Y axes are intended.

Parameter example:: axes='area' is equivalent to axes=['X', 'Y'].
Parameter example:: axes=['area', Z'] is equivalent to axes=['X', 'Y', 'Z'].

weights: optional

Specify the weights for the collapse axes. The weights are, in general, those that would be returned by this call of the field construct’s weights method: f.weights(weights, axes=axes, measure=measure, scale=scale, radius=radius, great_circle=great_circle, components=True). See the axes, measure, scale, radius and great_circle parameters and cf.Field.weights for details, and note that the value of scale may be modified depending on the value of measure.

Note

By default weights is None, resulting in unweighted calculations.

Note

Unless the method is 'integral', the units of the weights are not combined with the field’s units in the collapsed field.

If the alternative form of providing the collapse method and axes combined as a CF cell methods-like string via the method parameter has been used, then the axes parameter is ignored and the axes are derived from the method parameter. For example, if method is 'T: area: minimum' then this defines axes of ['T', 'area']. If method specifies multiple collapses, e.g. 'T: minimum area: mean' then this implies axes of 'T' for the first collapse, and axes of 'area' for the second collapse.

Note

Setting weights to True is generally a good way to ensure that all collapses are appropriately weighted according to the field construct’s metadata. In this case, if it is not possible to create weights for any axis then an exception will be raised.

However, care needs to be taken if weights is True when cell volume weights are desired. The volume weights will be taken from a “volume” cell measure construct if one exists, otherwise the cell volumes will be calculated as being proportional to the sizes of one-dimensional vertical coordinate cells. In the latter case if the vertical dimension coordinates do not define the actual height or depth thickness of every cell in the domain then the weights will be incorrect.

Parameter example:: To specify weights based on the field construct’s metadata for all collapse axes use weights=True.
Parameter example:: To specify weights based on cell areas use weights='area'.
Parameter example:: To specify weights based on cell areas and linearly in time you could set weights=('area', 'T').

measure: bool, optional

If True, and weights is not None, create weights which are cell measures, i.e. which describe actual cell sizes (e.g. cell area) with appropriate units (e.g. metres squared). By default the weights units are ignored.

Cell measures can be created for any combination of axes. For example, cell measures for a time axis are the time span for each cell with canonical units of seconds; cell measures for the combination of four axes representing time and three dimensional space could have canonical units of metres cubed seconds.

When collapsing with the 'integral' method, measure must be True, and the units of the weights are incorporated into the units of the returned field construct.

Note

Specifying cell volume weights via weights=['X', 'Y', 'Z'] or weights=['area', 'Z'] (or other equivalents) will produce an incorrect result if the vertical dimension coordinates do not define the actual height or depth thickness of every cell in the domain. In this case, weights='volume' should be used instead, which requires the field construct to have a “volume” cell measure construct.

If weights=True then care also needs to be taken, as a “volume” cell measure construct will be used if present, otherwise the cell volumes will be calculated using the size of the vertical coordinate cells.

New in version 3.0.2.

scale: number or None, optional

If set to a positive number then scale the weights so that they are less than or equal to that number. If set to None, the default, then the weights are not scaled.

Parameter example:: To scale all weights so that they lie between 0 and 1 scale=1.

New in version 3.0.2.

Changed in version 3.16.0: Default changed to None

radius: optional

Specify the radius used for calculating the areas of cells defined in spherical polar coordinates. The radius is that which would be returned by this call of the field construct’s radius method: f.radius(radius). See the cf.Field.radius for details.

By default radius is 'earth' which means that if and only if the radius can not found from the datums of any coordinate reference constructs, then the default radius taken as 6371229 metres.

New in version 3.0.2.

great_circle: bool, optional

If True then allow, if required, the derivation of i) area weights from polygon geometry cells by assuming that each cell part is a spherical polygon composed of great circle segments; and ii) and the derivation of line-length weights from line geometry cells by assuming that each line part is composed of great circle segments.

New in version 3.2.0.

squeeze: bool, optional

If True then size 1 collapsed axes are removed from the output data array. By default the axes which are collapsed are retained in the result’s data array.

mtol: number, optional

Set the fraction of input data elements which is allowed to contain missing data when contributing to an individual output data element. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning that a missing datum in the output array occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.

Parameter example:: To ensure that an output array element is a missing datum if more than 25% of its input array elements are missing data: mtol=0.25.

ddof: number, optional

The delta degrees of freedom in the calculation of a standard deviation or variance. The number of degrees of freedom used in the calculation is (N-ddof) where N represents the number of non-missing elements. By default ddof is 1, meaning the standard deviation and variance of the population is estimated according to the usual formula with (N-1) in the denominator to avoid the bias caused by the use of the sample mean (Bessel’s correction).

coordinate: optional

Specify how the cell coordinate values for collapsed axes are placed. This has no effect on the cell bounds for the collapsed axes, which always represent the extrema of the input coordinates.

The coordinate parameter may be one of:

coordinate	Description
`None`	This is the default. If the collapse is a climatological time collapse over years or over days then assume a value of `'min'`, otherwise assume value of `'mid_range'`.
`'mid_range'`	An output coordinate is the mean of first and last input coordinate bounds (or the first and last coordinates if there are no bounds). This is the default.
`'minimum'`	An output coordinate is the minimum of the input coordinates.
`'maximum'`	An output coordinate is the maximum of the input coordinates.

Parameter example:: coordinate='minimum'

group: optional

A grouped collapse is one for which an axis is not collapsed completely to size 1. Instead, the collapse axis is partitioned into non-overlapping groups and each group is collapsed to size 1, independently of the other groups. The results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

An element of the collapse axis can not be a member of more than one group, and may be a member of no groups. Elements that are not selected by the group parameter are excluded from the result.

The group parameter defines how the axis elements are partitioned into groups, and may be one of:

group	Description
`Data`	Define groups by coordinate values that span the given range. The first group starts at the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each subsequent group immediately follows the preceding one. By default each group contains the consecutive run of elements whose coordinate values lie within the group limits (see the group_by parameter). By default each element will be in exactly one group (see the group_by, group_span and group_contiguous parameters). By default groups may contain different numbers of elements. If no units are specified then the units of the coordinates are assumed.
`TimeDuration`	Define groups by a time interval spanned by the coordinates. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each subsequent group immediately follows the preceding one. By default each group contains the consecutive run of elements whose coordinate values lie within the group limits (see the group_by parameter). By default each element will be in exactly one group (see the group_by, group_span and group_contiguous parameters). By default groups may contain different numbers of elements. The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if `group=cf.Y(month=12)` then the first group will start on the closest 1st December to the first axis element.
`Query`	Define groups from elements whose coordinates satisfy the query condition. Multiple groups are created: one for each maximally consecutive run within the selected elements. If a sequence of `Query` is provided then groups are defined for each query. If a coordinate does not satisfy any of the query conditions then its element will not be in a group. By default groups may contain different numbers of elements. If no units are specified then the units of the coordinates are assumed. If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
`int`	Define groups that contain the given number of elements. The first group starts with the first axis element and spans the defined number of consecutive elements. Each subsequent group immediately follows the preceding one. By default each group has the defined number of elements, apart from the last group which may contain fewer elements (see the group_span parameter).
`numpy.ndarray`	Define groups by selecting elements that map to the same value in the `numpy` array. The array must contain integers and have the same length as the axis to be collapsed and its sequence of values correspond to the axis elements. Each group contains the elements which correspond to a common non-negative integer value in the numpy array. Upon output, the collapsed axis is arranged in order of increasing group number. See the regroup parameter, which allows the creation of such a `numpy.array` for a given grouped collapse. The groups do not have to be in runs of consecutive elements; they may be scattered throughout the axis. An element which corresponds to a negative integer in the array will not be in any group.

Parameter example:: To define groups of 10 kilometres: group=cf.Data(10, 'km').
Parameter example:: To define groups of 5 days, starting and ending at midnight on each day: group=cf.D(5) (see cf.D).
Parameter example:: To define groups of 1 calendar month, starting and ending at day 16 of each month: group=cf.M(day=16) (see cf.M).
Parameter example:: To define groups of the season MAM in each year: group=cf.mam() (see cf.mam).
Parameter example:: To define groups of the seasons DJF and JJA in each year: group=[cf.jja(), cf.djf()]. To define groups for seasons DJF, MAM, JJA and SON in each year: group=cf.seasons() (see cf.djf, cf.jja and cf.season).
Parameter example:: To define groups for longitude elements less than or equal to 90 degrees and greater than 90 degrees: group=[cf.le(90, 'degrees'), cf.gt(90, 'degrees')] (see cf.le and cf.gt).
Parameter example:: To define groups of 5 elements: group=5.
Parameter example:: For an axis of size 8, create two groups, the first containing the first and last elements and the second containing the 3rd, 4th and 5th elements, whilst ignoring the 2nd, 6th and 7th elements: group=numpy.array([0, -1, 4, 4, 4, -1, -2, 0]).

regroup: bool, optional

If True then, for grouped collapses, do not collapse the field construct, but instead return a numpy.array of integers which identifies the groups defined by the group parameter. Each group contains the elements which correspond to a common non-negative integer value in the numpy array. Elements corresponding to negative integers are not in any group. The array may subsequently be used as the value of the group parameter in a separate collapse.

For example:

>>> groups = f.collapse('time: mean', group=10, regroup=True)
>>> g = f.collapse('time: mean', group=groups)

is equivalent to:

>>> g = f.collapse('time: mean', group=10)

group_by: optional

Specify how coordinates are assigned to the groups defined by the group, within_days or within_years parameters. Ignored unless one of these parameters is set to a Data or TimeDuration object.

The group_by parameter may be one of:

group_by

Description

None

This is the default.

If the groups are defined by the group parameter (i.e. collapses other than climatological time collapses) then assume a value of 'coords'.

If the groups are defined by the within_days or within_years parameter (i.e. climatological time collapses) then assume a value of 'bounds'.

'coords'

Each group contains the axis elements whose coordinate values lie within the group limits. Every element will be in a group.

'bounds'

Each group contains the axis elements whose upper and lower coordinate bounds both lie within the group limits. Some elements may not be inside any group, either because the group limits do not coincide with coordinate bounds or because the group size is sufficiently small.

group_span: optional

Specify how to treat groups that may not span the desired range. For example, when creating 3-month means, the group_span parameter can be used to allow groups which only contain 1 or 2 months of data.

By default, group_span is None. This means that only groups whose span equals the size specified by the definition of the groups are collapsed; unless the groups have been defined by one or more Query objects, in which case then the default behaviour is to collapse all groups, regardless of their size.

In effect, the group_span parameter defaults to True unless the groups have been defined by one or more Query objects, in which case group_span defaults to False.

The different behaviour when the groups have been defined by one or more Query objects is necessary because a Query object can only define the composition of a group, and not its size (see the parameter examples below for how to specify a group span in this case).

Note

Prior to version 3.1.0, the default value of group_span was effectively False.

In general, the span of a group is the absolute difference between the lower bound of its first element and the upper bound of its last element. The only exception to this occurs if group_span is (by default or by explicit setting) an integer, in which case the span of a group is the number of elements in the group. See also the group_contiguous parameter for how to deal with groups that have gaps in their coverage.

The group_span parameter is only applied to groups defined by the group, within_days or within_years parameters, and is otherwise ignored.

The group_span parameter may be one of:

group_span	Description
`None`	This is the default. Apply a value of `True` or `False` depending on how the groups have been defined.
`True`	Ignore groups whose span is not equal to the size specified by the definition of the groups. Only applicable if the groups are defined by a `Data`, `TimeDuration` or `int` object, and this is the default in this case.
`False`	Collapse all groups, regardless of their size. This is the default if the groups are defined by one to more `Query` objects.
`Data`	Ignore groups whose span is not equal to the given size. If no units are specified then the units of the coordinates are assumed.
`TimeDuration`	Ignore groups whose span is not equals to the given time duration.
`int`	Ignore groups that contain fewer than the given number of elements

Parameter example:: To collapse into groups of 10km, ignoring any groups that span less than that distance: group=cf.Data(10, 'km'), group_span=True.
Parameter example:: To collapse a daily timeseries into monthly groups, ignoring any groups that span less than 1 calendar month: monthly values: group=cf.M(), group_span=True (see cf.M).
Parameter example:: To collapse a timeseries into seasonal groups, ignoring any groups that span less than three months: group=cf.seasons(), group_span=cf.M(3) (see cf.seasons and cf.M).

group_contiguous: int, optional

Specify how to treat groups whose elements are not contiguous or have overlapping cells. For example, when creating a December to February means, the group_contiguous parameter can be used to allow groups which have no data for January.

A group is considered to be contiguous unless it has coordinates with bounds that do not coincide for adjacent cells. The definition may be expanded to include groups whose coordinate bounds that overlap.

By default group_contiguous is 1, meaning that non-contiguous groups, and those whose coordinate bounds overlap, are not collapsed

Note

Prior to version 3.1.0, the default value of group_contiguous was 0.

The group_contiguous parameter is only applied to groups defined by the group, within_days or within_years parameters, and is otherwise ignored.

The group_contiguous parameter may be one of:

group_contiguous	Description
`0`	Allow non-contiguous groups, and those containing overlapping cells.
`1`	This is the default. Ignore non-contiguous groups, as well as contiguous groups containing overlapping cells.
`2`	Ignore non-contiguous groups, allowing contiguous groups containing overlapping cells.

Parameter example:: To allow non-contiguous groups, and those containing overlapping cells: group_contiguous=0.

within_days: optional

Define the groups for creating CF “within days” climatological statistics.

Each group contains elements whose coordinates span a time interval of up to one day. The results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Note

For CF compliance, a “within days” collapse should be followed by an “over days” collapse.

The within_days parameter defines how the elements are partitioned into groups, and may be one of:

within_days

Description

TimeDuration

Defines the group size in terms of a time interval of up to one day. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each subsequent group immediately follows the preceding one. By default each group contains the consecutive run of elements whose coordinate cells lie within the group limits (see the group_by parameter).

Groups may contain different numbers of elements.
The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.D(hour=12) then the first group will start on the closest midday to the first axis element.

Query

Define groups from elements whose coordinates satisfy the query condition. Multiple groups are created: one for each maximally consecutive run within the selected elements.

If a sequence of Query is provided then groups are defined for each query.

Groups may contain different numbers of elements.
If no units are specified then the units of the coordinates are assumed.
If a coordinate does not satisfy any of the conditions then its element will not be in a group.
If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.

Parameter example:: To define groups of 6 hours, starting at 00:00, 06:00, 12:00 and 18:00: within_days=cf.h(6) (see cf.h).
Parameter example:: To define groups of 1 day, starting at 06:00: within_days=cf.D(1, hour=6) (see cf.D).
Parameter example:: To define groups of 00:00 to 06:00 within each day, ignoring the rest of each day: within_days=cf.hour(cf.le(6)) (see cf.hour and cf.le).
Parameter example:: To define groups of 00:00 to 06:00 and 18:00 to 24:00 within each day, ignoring the rest of each day: within_days=[cf.hour(cf.le(6)), cf.hour(cf.gt(18))] (see cf.gt, cf.hour and cf.le).

within_years: optional

Define the groups for creating CF “within years” climatological statistics.

Each group contains elements whose coordinates span a time interval of up to one calendar year. The results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Note

For CF compliance, a “within years” collapse should be followed by an “over years” collapse.

The within_years parameter defines how the elements are partitioned into groups, and may be one of:

within_years

Description

TimeDuration

Define the group size in terms of a time interval of up to one calendar year. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each subsequent group immediately follows the preceding one. By default each group contains the consecutive run of elements whose coordinate cells lie within the group limits (see the group_by parameter).

Groups may contain different numbers of elements.
The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.Y(month=12) then the first group will start on the closest 1st December to the first axis element.

Query

Define groups from elements whose coordinates satisfy the query condition. Multiple groups are created: one for each maximally consecutive run within the selected elements.

If a sequence of Query is provided then groups are defined for each query.

The first group may start outside of the range of coordinates (the start of the first group is controlled by parameters of the TimeDuration).
If group boundaries do not coincide with coordinate bounds then some elements may not be inside any group.
If the group size is sufficiently small then some elements may not be inside any group.
Groups may contain different numbers of elements.

Parameter example:: To define groups of 90 days: within_years=cf.D(90) (see cf.D).
Parameter example:: To define groups of 3 calendar months, starting on the 15th of a month: within_years=cf.M(3, day=15) (see cf.M).
Parameter example:: To define groups for the season MAM within each year: within_years=cf.mam() (see cf.mam).
Parameter example:: To define groups for February and for November to December within each year: within_years=[cf.month(2), cf.month(cf.ge(11))] (see cf.month and cf.ge).

over_days: optional

Define the groups for creating CF “over days” climatological statistics.

By default (or if over_days is None) each group contains all elements for which the time coordinate cell lower bounds have a common time of day but different dates, and for which the time coordinate cell upper bounds also have a common time of day but different dates. The collapsed dime axis will have a size equal to the number of groups that were found.

For example, elements corresponding to the two time coordinate cells

1999-12-31 06:00:00/1999-12-31 18:00:00

2000-01-01 06:00:00/2000-01-01 18:00:00

would be together in a group; and elements corresponding to the two time coordinate cells

1999-12-31 00:00:00/2000-01-01 00:00:00

2000-01-01 00:00:00/2000-01-02 00:00:00

would also be together in a different group.

Note

For CF compliance, an “over days” collapse should be preceded by a “within days” collapse.

The default groups may be split into smaller groups if the over_days parameter is one of:

over_days

Description

TimeDuration

Split each default group into smaller groups which span the given time duration, which must be at least one day.

Groups may contain different numbers of elements.
The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.M(day=15) then the first group will start on the closest 15th of a month to the first axis element.

Query

Split each default group into smaller groups whose coordinate cells satisfy the query condition.

If a sequence of Query is provided then groups are defined for each query.

Groups may contain different numbers of elements.
If a coordinate does not satisfy any of the conditions then its element will not be in a group.
If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.

Parameter example:: To define groups for January and for June to December, ignoring all other months: over_days=[cf.month(1), cf.month(cf.wi(6, 12))] (see cf.month and cf.wi).
Parameter example:: To define groups spanning 90 days: over_days=cf.D(90) or over_days=cf.h(2160). (see cf.D and cf.h).
Parameter example:: To define groups that each span 3 calendar months, starting and ending at 06:00 in the first day of each month: over_days=cf.M(3, hour=6) (see cf.M).
Parameter example:: To define groups that each span a calendar month over_days=cf.M() (see cf.M).
Parameter example:: To define groups for January and for June to December, ignoring all other months: over_days=[cf.month(1), cf.month(cf.wi(6, 12))] (see cf.month and cf.wi).

over_years: optional

Define the groups for creating CF “over years” climatological statistics.

By default (or if over_years is None) each group contains all elements for which the time coordinate cell lower bounds have a common date of the year but different years, and for which the time coordinate cell upper bounds also have a common date of the year but different years. The collapsed dime axis will have a size equal to the number of groups that were found.

For example, elements corresponding to the two time coordinate cells

1999-12-01 00:00:00/2000-01-01 00:00:00

2000-12-01 00:00:00/2001-01-01 00:00:00

would be together in a group.

Note

For CF compliance, an “over years” collapse should be preceded by a “within years” or “over days” collapse.

The default groups may be split into smaller groups if the over_years parameter is one of:

over_years

Description

TimeDuration

Split each default group into smaller groups which span the given time duration, which must be at least one day.

Groups may contain different numbers of elements.
The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.Y(month=12) then the first group will start on the closest 1st December to the first axis element.

Query

Split each default group into smaller groups whose coordinate cells satisfy the query condition.

If a sequence of Query is provided then groups are defined for each query.

Groups may contain different numbers of elements.
If a coordinate does not satisfy any of the conditions then its element will not be in a group.
If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.

Parameter example:: An element with coordinate bounds {1999-06-01 06:00:00, 1999-09-01 06:00:00} matches an element with coordinate bounds {2000-06-01 06:00:00, 2000-09-01 06:00:00}.
Parameter example:: An element with coordinate bounds {1999-12-01 00:00:00, 2000-12-01 00:00:00} matches an element with coordinate bounds {2000-12-01 00:00:00, 2001-12-01 00:00:00}.
Parameter example:: To define groups spanning 10 calendar years: over_years=cf.Y(10) or over_years=cf.M(120) (see cf.M and cf.Y).
Parameter example:: To define groups spanning 5 calendar years, starting and ending at 06:00 on 01 December of each year: over_years=cf.Y(5, month=12, hour=6) (see cf.Y).
Parameter example:: To define one group spanning 1981 to 1990 and another spanning 2001 to 2005: over_years=[cf.year(cf.wi(1981, 1990), cf.year(cf.wi(2001, 2005)] (see cf.year and cf.wi).

remove_vertical_crs: bool, optional

If True, the default, then remove a vertical coordinate reference construct and all of its domain ancillary constructs if any of its coordinate constructs or domain ancillary constructs span any collapse axes.

If False then only the vertical coordinate reference construct’s domain ancillary constructs that span any collapse axes are removed, but the vertical coordinate reference construct remains. This could result in compute_vertical_coordinates returning incorrect non-parametric vertical coordinate values.

New in version 3.14.1.

inplace: bool, optional

If True then do the operation in-place and return None.

i: deprecated at version 3.0.0

Use the inplace parameter instead.

kwargs: deprecated at version 3.0.0

Returns

Field or numpy.ndarray: The collapsed field construct. Alternatively, if the regroup parameter is True then a numpy array is returned.

Examples

There are further worked examples in https://ncas-cms.github.io/cf-python/analysis.html#statistical-collapses

cf 3.16.2

Related Topics

cf.Field.collapse¶