cf.Field.collapse

Field.collapse(method, axes=None, squeeze=False, mtol=1, weights=None, ddof=1, a=None, inplace=False, group=None, regroup=False, within_days=None, within_years=None, over_days=None, over_years=None, coordinate='mid_range', group_by='coords', group_span=None, group_contiguous=None, measure=False, scale=None, radius='earth', verbose=False, _create_zero_size_cell_bounds=False, _update_cell_methods=True, i=False, _debug=False, **kwargs)[source]

Collapse axes of the field.

Collapsing one or more dimensions reduces their size and replaces the data along those axes with representative statistical values. The result is a new field construct with consistent metadata for the collapsed values.

Collapsing an axis involves reducing its size with a given (typically statistical) method.

By default all axes with size greater than 1 are collapsed completely (i.e. to size 1) with a given collapse method.

Example:

Find the minimum of the entire data:

>>> b = a.collapse('minimum')

The collapse can also be applied to any subset of the field construct’s dimensions. In this case, the domain axis and coordinate constructs for the non-collapsed dimensions remain the same. This is implemented either with the axes keyword, or with a CF-netCDF cell methods-like syntax for describing both the collapse dimensions and the collapse method in a single string. The latter syntax uses construct identities instead of netCDF dimension names to identify the collapse axes.

Statistics may be created to represent variation over one dimension or a combination of dimensions.

Example:

Two equivalent techniques for creating a field construct of temporal maxima at each horizontal location:

>>> b = a.collapse('maximum', axes='T')
>>> b = a.collapse('T: maximum')
Example:

Find the horizontal maximum, with two equivalent techniques.

>>> b = a.collapse('maximum', axes=['X', 'Y'])
>>> b = a.collapse('X: Y: maximum')

Variation over horizontal area may also be specified by the special identity ‘area’. This may be used for any horizontal coordinate reference system.

Example:

Find the horizontal maximum using the special identity ‘area’:

>>> b = a.collapse('area: maximum')

Collapse methods

See the methods parameter for details.

Data type and missing data

In all collapses, missing data array elements are accounted for in the calculation.

Any collapse method that involves a calculation (such as calculating a mean), as opposed to just selecting a value (such as finding a maximum), will return a field containing double precision floating point numbers. If this is not desired then the data type can be reset after the collapse with the dtype attribute of the field construct.

Collapse weights

The calculations of means, standard deviations and variances are, by default, not weighted. For weights to be incorporated in the collapse, the axes to be weighted must be identified with the weights keyword.

Weights are either derived from the field construct’s metadata (such as cell sizes), or may be provided explicitly in the form of other field constructs containing data of weights values. In either case, the weights actually used are those derived by the weights method of the field construct with the same weights keyword value. Collapsed axes that are not identified by the weights keyword are un-weighted during the collapse operation.

Example:

Create a weighted time average:

>>> b = a.collapse('T: mean', weights='T')
Example:

Calculate the mean over the time and latitude axes, with weights only applied to the latitude axis:

>>> b = a.collapse('T: Y: mean', weights='Y')
Example

Alternative syntax for specifying area weights:

>>> b = a.collapse('area: mean', weights='area')

Multiple collapses

Multiple collapses normally require multiple calls to collapse: one on the original field construct and then one on each interim field construct.

Example:

Calculate the temporal maximum of the weighted areal means using two independent calls:

>>> b = a.collapse('area: mean', weights='area').collapse('T: maximum')

If preferred, multiple collapses may be carried out in a single call by using the CF-netCDF cell methods-like syntax (note that the colon (:) is only used after the construct identity that specifies each axis, and a space delimits the separate collapses).

Example:

Calculate the temporal maximum of the weighted areal means in a single call, using the cf-netCDF cell methods-like syntax:

>>> b =a.collapse('area: mean T: maximum', weights='area')

Grouped collapses

A grouped collapse is one for which as axis is not collapsed completely to size 1. Instead the collapse axis is partitioned into groups and each group is collapsed to size 1. The resulting axis will generally have more than one element. For example, creating 12 annual means from a timeseries of 120 months would be a grouped collapse.

The group keyword defines the size of the groups. Groups can be defined in a variety of ways, including with Query, TimeDuration and Data instances.

Not every element of the collapse axis needs to be in group. Elements that are not selected by the group keyword are excluded from the result.

Example:

Create annual maxima from a time series, defining a year to start on 1st December.

>>> b = a.collapse('T: maximum', group=cf.Y(month=12))
Example:

Find the maximum of each group of 6 elements along an axis.

>>> b = a.collapse('T: maximum', group=6)
Example:

Create December, January, February maxima from a time series.

>>> b = a.collapse('T: maximum', group=cf.djf())
Example:

Create maxima for each 3-month season of a timeseries (DJF, MAM, JJA, SON).

>>> b = a.collapse('T: maximum', group=cf.seasons())
Example:

Calculate zonal means for the western and eastern hemispheres.

>>> b = a.collapse('X: mean', group=cf.Data(180, 'degrees'))

Groups can be further described with the group_span (to ignore groups whose actual span is less than a given value) and group_contiguous (to ignore non-contiguous groups, or any contiguous group containing overlapping cells).

Climatological statistics

Climatological statistics may be derived from corresponding portions of the annual cycle in a set of years (e.g. the average January temperatures in the climatology of 1961-1990, where the values are derived by averaging the 30 Januarys from the separate years); or from corresponding portions of the diurnal cycle in a set of days (e.g. the average temperatures for each hour in the day for May 1997). A diurnal climatology may also be combined with a multiannual climatology (e.g. the minimum temperature for each hour of the average day in May from a 1961-1990 climatology).

Calculation requires two or three collapses, depending on the quantity being created, all of which are grouped collapses. Each collapse method needs to indicate its climatological nature with one of the following qualifiers,

Method qualifier Associated keyword
within years within_years
within days within_days
over years over_years (optional)
over days over_days (optional)

and the associated keyword specifies how the method is to be applied.

Example

Calculate the multiannual average of the seasonal means:

>>> b = a.collapse('T: mean within years T: mean over years',
...                within_years=cf.seasons(), weights='T')
Example:

Calculate the multiannual variance of the seasonal minima. Note that the units of the result have been changed from ‘K’ to ‘K2’:

>>> b = a.collapse('T: minimum within years T: variance over years',
...                within_years=cf.seasons(), weights='T')

When collapsing over years, it is assumed by default that the each portion of the annual cycle is collapsed over all years that are present. This is the case in the above two examples. It is possible, however, to restrict the years to be included, or group them into chunks, with the over_years keyword.

Example:

Calculate the multiannual average of the seasonal means in 5 year chunks:

>>> b = a.collapse('T: mean within years T: mean over years', weights='T',
...                within_years=cf.seasons(), over_years=cf.Y(5))
Example:

Calculate the multiannual average of the seasonal means, restricting the years from 1963 to 1968:

>>> b = a.collapse('T: mean within years T: mean over years', weights='T',
...                within_years=cf.seasons(),
...                over_years=cf.year(cf.wi(1963, 1968)))

Similarly for collapses over days, it is assumed by default that the each portion of the diurnal cycle is collapsed over all days that are present, But it is possible to restrict the days to be included, or group them into chunks, with the over_days keyword.

The calculation can be done with multiple collapse calls, which can be useful if the interim stages are needed independently, but be aware that the interim field constructs will have non-CF-compliant cell method constructs.

Example:

Calculate the multiannual maximum of the seasonal standard deviations with two separate collapse calls:

>>> b = a.collapse('T: standard_deviation within years',
...                within_years=cf.seasons(), weights='T')

New in version 1.0.

Parameters:

method: str

Define the collapse method. All of the axes specified by the axes parameter are collapsed simultaneously by this method. The method is given by one of the following strings (see https://ncas-cms.github.io/cf-python/tutorial.html#collapse-methods for precise definitions):

method Description Weighted
'maximum' The maximum of the values. Never
'minimum' The minimum of the values. Never
'maximum_absolute_value' The maximum of the absolute values. Never
'minimum_absolute_value' The minimum of the absolute values. Never
'mid_range' The average of the maximum and the minimum of the values. Never
'median' The median of the values. Never
'range' The absolute difference between the maximum and the minimum of the values. Never
'sum' The sum of the values. Never
'sum_of_squares' The sum of the squares of values. Never
'sample_size' The sample size, i.e. the number of non-missing values. Never
'sum_of_weights' The sum of weights, as would be used for other calculations. Never
'sum_of_weights2' The sum of squares of weights, as would be used for other calculations. Never
'mean' The weighted or unweighted mean of the values. May be
'mean_absolute_value' The mean of the absolute values. May be
'mean_of_upper_decile' The mean of the upper group of data values defined by the upper tenth of their distribution. May be
'variance' The weighted or unweighted variance of the values, with a given number of degrees of freedom. May be
'standard_deviation' The square root of the weighted or unweighted variance. May be
'root_mean_square' The square root of the weighted or unweighted mean of the squares of the values. May be
'integral' The integral of values. Always
  • Collapse methods that are “Never” weighted ignore the weights parameter, even if it is set.
  • Collapse methods that “May be” weighted will only be weighted if the weights parameter is set.
  • Collapse methods that are “Always” weighted require the weights parameter to be set.

An alternative form of providing the collapse method is to provide a CF cell methods-like string. In this case an ordered sequence of collapses may be defined and both the collapse methods and their axes are provided. The axes are interpreted as for the axes parameter, which must not also be set. For example:

>>> g = f.collapse('time: max (interval 1 hr) X: Y: mean dim3: sd')

is equivalent to:

>>> g = f.collapse('max', axes='time')
>>> g = g.collapse('mean', axes=['X', 'Y'])
>>> g = g.collapse('sd', axes='dim3')    

Climatological collapses are carried out if a method string contains any of the modifiers 'within days', 'within years', 'over days' or 'over years'. For example, to collapse a time axis into multiannual means of calendar monthly minima:

>>> g = f.collapse('time: minimum within years T: mean over years',
...                 within_years=cf.M())

which is equivalent to:

>>> g = f.collapse('time: minimum within years', within_years=cf.M())
>>> g = g.collapse('mean over years', axes='T')
axes: (sequence of) str, optional

The axes to be collapsed, defined by those which would be selected by passing each given axis description to a call of the field construct’s domain_axis method. For example, for a value of 'X', the domain axis construct returned by f.domain_axis('X')) is selected. If a selected axis has size 1 then it is ignored. By default all axes with size greater than 1 are collapsed.

Parameter example:

axes='X'

Parameter example:

axes=['X']

Parameter example:

axes=['X', 'Y']

Parameter example:

axes=['Z', 'time']

If the axes parameter has the special value 'area' then it is assumed that the X and Y axes are intended.

Parameter example:

axes='area' is equivalent to axes=['X', 'Y'].

Parameter example:

axes=['area', Z'] is equivalent to axes=['X', 'Y', 'Z'].

weights: optional

Specify the weights for the collapse. The weights are those that would be returned by this call of the field construct’s weights method: f.weights(weights, measure=measure, scale=scale, components=True). See the measure and scale parameters and cf.Field.weights for details.

Note

By default weights is None, resulting in unweighted calculations.

Parameter example:

To specify weights based on cell areas use weights='area'.

Parameter example:

To specify weights based on cell areas and linearly in time you could set weights=('area', 'T').

Parameter example:

To specify weights based on cell areas use weights='area'.

Parameter example:

To specify weights based on cell areas and linearly in time you could set weights=('area', 'T').

measure: bool, optional

Create weights which are cell measures, i.e. which describe actual cell sizes (e.g. cell area) with appropriate units (e.g. metres squared). By default the weights are normalized and have arbitrary units.

Cell measures can be created for any combination of axes. For example, cell measures for a time axis are the time span for each cell with canonical units of seconds; cell measures for the combination of four axes representing time and three dimensional space could have canonical units of metres cubed seconds.

When collapsing with the 'integral' method, measure must be True, and the units of the weights are incorporated into the units of the returned field construct.

Note

Specifying cell volume weights via weights=['X', 'Y', 'Z'] or weights=['area', 'Z'] (or other equivalents) will produce an incorrect result if the vertical dimension coordinates do not define the actual height or depth thickness of every cell in the domain. In this case, weights='volume' should be used instead, which requires the field construct to have a “volume” cell measure construct.

scale: number, optional

If set to a positive number then scale the weights so that they are less than or equal to that number. By default the weights are scaled to lie between 0 and 1 (i.e. scale is 1), and have arbitrary units.

Parameter example:

To scale all weights so that they lie between 0 and 0.5: scale=0.5.

radius: optional

Specify the radius used for calculating the areas of cells defined in spherical polar coordinates. The radius is that which would be returned by this call of the field construct’s radius method: f.radius(radius). See the cf.Field.radius for details.

By default radius is 'earth' which means that if and only if the radius can not found from the datums of any coordinate reference constucts, then the default radius taken as 6371229 metres.

squeeze: bool, optional

If True then size 1 collapsed axes are removed from the output data array. By default the axes which are collapsed are retained in the result’s data array.

mtol: number, optional

Set the fraction of input data elements which is allowed to contain missing data when contributing to an individual output data element. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning that a missing datum in the output array occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.

Parameter example:

To ensure that an output array element is a missing datum if more than 25% of its input array elements are missing data: mtol=0.25.

ddof: number, optional

The delta degrees of freedom in the calculation of a standard deviation or variance. The number of degrees of freedom used in the calculation is (N-ddof) where N represents the number of non-missing elements. By default ddof is 1, meaning the standard deviation and variance of the population is estimated according to the usual formula with (N-1) in the denominator to avoid the bias caused by the use of the sample mean (Bessel’s correction).

coordinate: str, optional

Set how the cell coordinate values for collapsed axes are defined. This has no effect on the cell bounds for the collapsed axes, which always represent the extrema of the input coordinates. Valid values are:

coordinate Description
'mid_range' An output coordinate is the average of the first and last input coordinate bounds (or the first and last coordinates if there are no bounds). This is the default.
'min' An output coordinate is the minimum of the input coordinates.
'max' An output coordinate is the maximum of the input coordinates.
group: optional

A grouped collapse is one for which an axis is not collapsed completely to size 1. Instead the collapse axis is partitioned into groups and each group is collapsed to size 1. The resulting axis will generally have more than one element.

The group parameter defines how the elements are partitioned into groups, and may be one of:

  • A Data object defining the group size in terms of ranges of coordinate values. The first group starts at the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups of 10 kilometres: group=cf.Data(10, 'km').

    Note:
    • By default each element will be in exactly one group (see the group_by, group_span and group_contiguous parameters).
    • By default groups may contain different numbers of elements.
    • If no units are specified then the units of the coordinates are assumed.
  • A TimeDuration object defining the group size in terms of calendar months and years or other time intervals. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups of 5 days, starting and ending at midnight on each day: group=cf.D(5) (see cf.D).

    Parameter example:

    To define groups of 1 calendar month, starting and ending at day 16 of each month: group=cf.M(day=16) (see cf.M).

    Note:
    • By default each element will be in exactly one group (see the group_by, group_span and group_contiguous parameters).
    • By default groups may contain different numbers of elements.
    • The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.Y(month=12) then the first group will start on the closest 1st December to the first axis element.
  • A (sequence of) Query, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.

    Parameter example:

    To define groups of the season MAM in each year: group=cf.mam() (see cf.mam).

    Parameter example:

    To define groups of the seasons DJF and JJA in each year: group=[cf.jja(), cf.djf()]. To define groups for seasons DJF, MAM, JJA and SON in each year: group=cf.seasons() (see cf.djf, cf.jja and cf.season).

    Parameter example:

    To define groups for longitude elements less than or equal to 90 degrees and greater than 90 degrees: group=[cf.le(90, 'degrees'), cf.gt(90, 'degrees')] (see cf.le and cf.gt).

    Note:
    • If a coordinate does not satisfy any of the conditions then its element will not be in a group.
    • By default groups may contain different numbers of elements.
    • If no units are specified then the units of the coordinates are assumed.
    • If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
  • An int defining the number of elements in each group. The first group starts with the first axis element and spans the defined number of consecutive elements. Each susbsequent group immediately follows the preceeeding one.

    Parameter example:

    To define groups of 5 elements: group=5.

    Note:
    • By default each group has the defined number of elements, apart from the last group which may contain fewer elements (see the group_span parameter).
  • A numpy array of integers defining groups. The array must have the same length as the axis to be collapsed and its sequence of values correspond to the axis elements. Each group contains the elements which correspond to a common non-negative integer value in the numpy array. Upon output, the collapsed axis is arranged in order of increasing group number. See the regroup parameter, which allows the creation of such a numpy.array for a given grouped collapse.

    Parameter example:

    For an axis of size 8, create two groups, the first containing the first and last elements and the second containing the 3rd, 4th and 5th elements, whilst ignoring the 2nd, 6th and 7th elements: group=numpy.array([0, -1, 4, 4, 4, -1, -2, 0]).

    Note:
    • The groups do not have to be in runs of consective elements; they may be scattered throughout the axis.
    • An element which corresponds to a negative integer in the array will not be in any group.
group_by: str, optional

Specify how coordinates are assigned to the groups defined by the group, within_days or within_years parameter. Ignored unless one of these parameters is a Data or TimeDuration object. The group_by parameter may be one of:

  • 'coords'. This is the default. Each group contains the axis elements whose coordinate values lie within the group limits. Every element will be in a group.
  • 'bounds'. Each group contains the axis elements whose upper and lower coordinate bounds both lie within the group limits. Some elements may not be inside any group, either because the group limits do not coincide with coordinate bounds or because the group size is sufficiently small.
group_span: optional

Ignore groups whose span is less than a given value. By default all groups are collapsed, regardless of their size. Groups are defined by the group, within_days or within_years parameter.

In general, the span of a group is the absolute difference between the lower bound of its first element and the upper bound of its last element. The only exception to this occurs if group_span is an integer, in which case the span of a group is the number of elements in the group.

Note:
  • To also ensure that elements within a group are contiguous, use the group_contiguous parameter.

The group_span parameter may be one of:

  • True. Ignore groups whose span is less than the size defined by the group parameter. Only applicable if the group parameter is set to a Data, TimeDuration or int object. If the group parameter is a (sequence of) Query then one of the other options is required.

    Parameter example:

    To collapse into groups of 10 km, ignoring any groups that span less than that distance: group=cf.Data(10, 'km'), group_span=True.

    Parameter example:

    To collapse a daily timeseries into monthly groups, ignoring any groups that span less than 1 calendar month: monthly values: group=cf.M(), group_span=True (see cf.M).

  • Data. Ignore groups whose span is less than the given size. If no units are specified then the units of the coordinates are assumed.
  • TimeDuration. Ignore groups whose span is less than the given time duration.

    Parameter example:

    To collapse a timeseries into seasonal groups, ignoring any groups that span less than three months: group=cf.seasons(), group_span=cf.M(3) (see cf.seasons and cf.M).

  • int. Ignore groups that contain fewer than the given number of elements.
group_contiguous: int, optional

Only applicable to grouped collapses (i.e. the group, within_days or within_years parameter is being used). If set to 1 or 2 then ignore groups whose cells are not contiguous along the collapse axis. By default, group_contiguous is 0, meaning that non-contiguous groups are allowed. The group_contiguous parameter may be one of:

group_contiguous Description
0 Allow non-contiguous groups.
1 Ignore non-contiguous groups, as well as contiguous groups containing overlapping cells.
2 Ignore non-contiguous groups, allowing contiguous groups containing overlapping cells.
Parameter example:

To ignore non-contiguous groups, as well as any contiguous group containing overlapping cells: group_contiguous=1.

regroup: bool, optional

For grouped collapses, return a numpy.array of integers which identifies the groups defined by the group parameter. The array is interpreted as for a numpy array value of the group parameter, and thus may subsequently be used by group parameter in a separate collapse. For example:

>>> groups = f.collapse('time: mean', group=10, regroup=True)
>>> g = f.collapse('time: mean', group=groups)

is equivalent to:

>>> g = f.collapse('time: mean', group=10)
within_days: optional

Independently collapse groups of reference-time axis elements for CF “within days” climatological statistics. Each group contains elements whose coordinates span a time interval of up to one day. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Note:

For CF compliance, a “within days” collapse should be followed by an “over days” collapse.

The within_days parameter defines how the elements are partitioned into groups, and may be one of:

  • A TimeDuration defining the group size in terms of a time interval of up to one day. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups of 6 hours, starting at 00:00, 06:00, 12:00 and 18:00: within_days=cf.h(6) (see cf.h).

    Parameter example:

    To define groups of 1 day, starting at 06:00: within_days=cf.D(1, hour=6) (see cf.D).

    Note:
    • Groups may contain different numbers of elements.
    • The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.D(hour=12) then the first group will start on the closest midday to the first axis element.
  • A (sequence of) Query, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.

    Parameter example:

    To define groups of 00:00 to 06:00 within each day, ignoring the rest of each day: within_days=cf.hour(cf.le(6)) (see cf.hour and cf.le).

    Parameter example:

    To define groups of 00:00 to 06:00 and 18:00 to 24:00 within each day, ignoring the rest of each day: within_days=[cf.hour(cf.le(6)), cf.hour(cf.gt(18))] (see cf.gt, cf.hour and cf.le).

    Note:
    • Groups may contain different numbers of elements.
    • If no units are specified then the units of the coordinates are assumed.
    • If a coordinate does not satisfy any of the conditions then its element will not be in a group.
    • If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
within_years: optional

Independently collapse groups of reference-time axis elements for CF “within years” climatological statistics. Each group contains elements whose coordinates span a time interval of up to one calendar year. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Note:

For CF compliance, a “within years” collapse should be followed by an “over years” collapse.

The within_years parameter defines how the elements are partitioned into groups, and may be one of:

  • A TimeDuration defining the group size in terms of a time interval of up to one calendar year. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups of 90 days: within_years=cf.D(90) (see cf.D).

    Parameter example:

    To define groups of 3 calendar months, starting on the 15th of a month: within_years=cf.M(3, day=15) (see cf.M).

    Note:
    • Groups may contain different numbers of elements.
    • The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.Y(month=12) then the first group will start on the closest 1st December to the first axis element.
  • A (sequence of) Query, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.

    Parameter example:

    To define groups for the season MAM within each year: within_years=cf.mam() (see cf.mam).

    Parameter example:

    To define groups for February and for November to December within each year: within_years=[cf.month(2), cf.month(cf.ge(11))] (see cf.month and cf.ge).

    Note:
    • The first group may start outside of the range of coordinates (the start of the first group is controlled by parameters of the TimeDuration).
    • If group boundaries do not coincide with coordinate bounds then some elements may not be inside any group.
    • If the group size is sufficiently small then some elements may not be inside any group.
    • Groups may contain different numbers of elements.
over_days: optional

Independently collapse groups of reference-time axis elements for CF “over days” climatological statistics. Each group contains elements whose coordinates are matching, in that their lower bounds have a common time of day but different dates of the year, and their upper bounds also have a common time of day but different dates of the year. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Parameter example:

An element with coordinate bounds {1999-12-31 06:00:00, 1999-12-31 18:00:00} matches an element with coordinate bounds {2000-01-01 06:00:00, 2000-01-01 18:00:00}.

Parameter example:

An element with coordinate bounds {1999-12-31 00:00:00, 2000-01-01 00:00:00} matches an element with coordinate bounds {2000-01-01 00:00:00, 2000-01-02 00:00:00}.

Note:
  • A coordinate parameter value of 'min' is assumed, regardless of its given value.

  • A group_by parameter value of 'bounds' is assumed, regardless of its given value.

  • An “over days” collapse must be preceded by a “within days” collapse, as described by the CF conventions. If the field already contains sub-daily data, but does not have the “within days” cell methods flag then it may be added, for example, as follows (this example assumes that the appropriate cell method is the most recently applied, which need not be the case; see cf.CellMethods for details):

    >>> f.cell_methods[-1].within = 'days'
    

The over_days parameter defines how the elements are partitioned into groups, and may be one of:

  • None. This is the default. Each collection of

matching elements forms a group.

  • A TimeDuration object defining the group size in terms of a time duration of at least one day. Multiple groups are created from each collection of matching elements - the first of which starts at or before the first coordinate bound of the first element and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the matching elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups spanning 90 days: over_days=cf.D(90) or over_days=cf.h(2160). (see cf.D and cf.h).

    Parameter example:

    To define groups spanning 3 calendar months, starting and ending at 06:00 in the first day of each month: over_days=cf.M(3, hour=6) (see cf.M).

    Note:
    • Groups may contain different numbers of elements.
    • The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.M(day=15) then the first group will start on the closest 15th of a month to the first axis element.
  • A (sequence of) Query, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each subset of matching elements.

    Parameter example:

    To define groups for January and for June to December, ignoring all other months: over_days=[cf.month(1), cf.month(cf.wi(6, 12))] (see cf.month and cf.wi).

    Note:
    • If a coordinate does not satisfy any of the conditions then its element will not be in a group.
    • Groups may contain different numbers of elements.
    • If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
over_years: optional

Independently collapse groups of reference-time axis elements for CF “over years” climatological statistics. Each group contains elements whose coordinates are matching, in that their lower bounds have a common sub-annual date but different years, and their upper bounds also have a common sub-annual date but different years. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.

Parameter example:

An element with coordinate bounds {1999-06-01 06:00:00, 1999-09-01 06:00:00} matches an element with coordinate bounds {2000-06-01 06:00:00, 2000-09-01 06:00:00}.

Parameter example:

An element with coordinate bounds {1999-12-01 00:00:00, 2000-12-01 00:00:00} matches an element with coordinate bounds {2000-12-01 00:00:00, 2001-12-01 00:00:00}.

Note:
  • A coordinate parameter value of 'min' is assumed, regardless of its given value.

  • A group_by parameter value of 'bounds' is assumed, regardless of its given value.

  • An “over years” collapse must be preceded by a “within years” or an “over days” collapse, as described by the CF conventions. If the field already contains sub-annual data, but does not have the “within years” or “over days” cell methods flag then it may be added, for example, as follows (this example assumes that the appropriate cell method is the most recently applied, which need not be the case; see cf.CellMethods for details):

    >>> f.cell_methods[-1].over = 'days'
    

The over_years parameter defines how the elements are partitioned into groups, and may be one of:

  • None. Each collection of matching elements forms a group. This is the default.
  • A TimeDuration object defining the group size in terms of a time interval of at least one calendar year. Multiple groups are created from each collection of matching elements - the first of which starts at or before the first coordinate bound of the first element and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the matching elements whose coordinate values lie within the group limits (see the group_by parameter).

    Parameter example:

    To define groups spanning 10 calendar years: over_years=cf.Y(10) or over_years=cf.M(120) (see cf.M and cf.Y).

    Parameter example:

    To define groups spanning 5 calendar years, starting and ending at 06:00 on 01 December of each year: over_years=cf.Y(5, month=12, hour=6) (see cf.Y).

    Note:
    • Groups may contain different numbers of elements.
    • The start of the first group may be before the first first axis element, depending on the offset defined by the time duration. For example, if group=cf.Y(month=12) then the first group will start on the closest 1st December to the first axis element.
  • A (sequence of) Query, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each subset of matching elements.

    Parameter example:

    To define one group spanning 1981 to 1990 and another spanning 2001 to 2005: over_years=[cf.year(cf.wi(1981, 1990), cf.year(cf.wi(2001, 2005)] (see cf.year and cf.wi).

    Note:
    • If a coordinate does not satisfy any of the conditions then its element will not be in a group.
    • Groups may contain different numbers of elements.
    • If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
inplace: bool, optional

If True then do the operation in-place and return None.

kwargs: deprecated at version 3.0.0

i: deprecated at version 3.0.0

Use the inplace parameter instead.

Returns:
Field or numpy.ndarray

The collapsed field. Alternatively, if the regroup parameter is True then a numpy array is returned.

Examples:

See the on-line documention for further worked examples: https://ncas-cms.github.io/cf-python/tutorial.html#statistical-collapses