cf.aggregate

cf.aggregate(fields, verbose=None, relaxed_units=False, overlap=True, contiguous=False, relaxed_identities=False, ncvar_identities=False, respect_valid=False, equal_all=False, exist_all=False, equal=None, exist=None, ignore=None, exclude=False, dimension=(), concatenate=True, copy=True, axes=None, donotchecknonaggregatingaxes=False, allow_no_identity=False, atol=None, rtol=None, no_overlap=False, field_identity=None, field_ancillaries=None, cells=None, info=False)[source]

Aggregate field constructs into as few field constructs as possible.

Aggregation is the combination of field constructs to create a new field construct that occupies a “larger” domain. Using the aggregation rules, field constructs are separated into aggregatable groups and each group is then aggregated to a single field construct.

Identifying field and metadata constructs

In order to ascertain whether or not field constructs are aggregatable, the aggregation rules rely on field constructs (and their metadata constructs where applicable) being identified by standard name properties. However, it is sometimes the case that standard names are not available. In such cases the id attribute (which is not a CF property) may be set on any construct, which will be treated like a standard name if one doesn’t exist.

Alternatively the relaxed_identities parameter allows long name properties or netCDF variable names to be used when standard names are missing; the field_identity parameter forces the field construct identities to be taken from a particular property; and the ncvar_identities parameter forces field and metadata constructs to be identified by their netCDF file variable names.

Parameters
fields: sequence of Field, or sequence of Domain

The field or domain constructs to aggregate.

verbose: int or str or None, optional

If an integer from -1 to 3, or an equivalent string equal ignoring case to one of:

  • 'DISABLE' (0)

  • 'WARNING' (1)

  • 'INFO' (2)

  • 'DETAIL' (3)

  • 'DEBUG' (-1)

set for the duration of the method call only as the minimum cut-off for the verboseness level of displayed output (log) messages, regardless of the globally-configured cf.log_level. Note that increasing numerical value corresponds to increasing verbosity, with the exception of -1 as a special case of maximal and extreme verbosity.

Otherwise, if None (the default value), output messages will be shown according to the value of the cf.log_level setting.

Overall, the higher a non-negative integer or equivalent string that is set (up to a maximum of 3/'DETAIL') for increasing verbosity, the more description that is printed to convey information about the aggregation process. Explicitly:

Value set

Result

0

  • No information is displayed.

1

  • Display information on which fields are

unaggregatable, and why.

2

  • As well as the above, display the structural

signatures of the fields and, when there is more than one field construct with the same structural signature, their canonical first and last coordinate values.

3/-1

  • As well as the above, display the field

construct’s complete aggregation metadata.

overlap: bool, optional

If False then require that aggregated field constructs have adjacent dimension coordinate construct cells which do not overlap (but they may share common boundary values). Ignored for a dimension coordinate construct that does not have bounds. See also the contiguous parameter.

contiguous: bool, optional

If True then require that the dimension coordinates of an aggregated field have no “gaps” (defined below) between neighbouring cells that came from different input fields.

By default, or if contiguous is False, gaps may occur between neighbouring cells that came from different input fields.

For aggregated dimension coordinates with bounds and non-zero cell sizes, a gap is when neighbouring cells originating from different input fields neither share common boundary values nor overlap each other.

For aggregated dimension coordinates without bounds, or with bounds specifying zero cell sizes, the concept of a gap is generally ill-defined. In this case there is no restriction on the neighbouring cells originating from different input fields (i.e. contiguous is effectively taken as False, regardless of its setting). However, if the contiguous parameter is True and a coordinate spacing condition defined by the cells parameter has also been passed, then the concept of a “gap” becomes well defined - a gap now occurs when the difference between neighbouring coordinates originating from different input fields does not meet the coordinate spacing condition. In this special case an aggregated field will also have the specified coordinate spacing between neighbouring cells that originated from different input fields.

Note

An aggregated field may still have gaps between neighbouring cells that came from the same input field, regardless of the value of contiguous. However, such gaps may be controlled with a cell coordinate spacing condition defined by the cells parameter.

relaxed_units: bool, optional

If True then assume that field and metadata constructs with the same identity but missing units actually have equivalent (but unspecified) units, so that aggregation may occur. Also assumes that invalid but otherwise equal units are equal. By default such field constructs are not aggregatable.

allow_no_identity: bool, optional

If True then assume that field and metadata constructs with no identity (see the relaxed_identities parameter) actually have the same (but unspecified) identity, so that aggregation may occur. By default such field constructs are not aggregatable.

relaxed_identities: bool, optional

If True and there is no standard name property nor “id” attribute, then allow field and metadata constructs to be identifiable by long name properties or netCDF variable names. Also allows netCDF dimension names to be used when there are no spanning 1-d coordinates.

field_identity: str, optional

Specify a property with which to identify field constructs instead of any other technique. How metadata constructs are identified is not affected by this parameter. See the relaxed_identities and ncvar_identities parameters.

Parameter example:

Force field constructs to be identified by the values of their long_name properties: field_identity='long_name'

New in version 3.1.0.

ncvar_identities: bool, optional

If True then force field and metadata constructs to be identified by their netCDF file variable names See also the relaxed_identities parameter.

equal_all: bool, optional

If True then require that aggregated fields have the same set of non-standard CF properties (including long_name), with the same values. See the concatenate parameter.

equal: (sequence of) str, optional

Specify CF properties for which it is required that aggregated fields all contain the properties, with the same values. See the concatenate parameter.

exist_all: bool, optional

If True then require that aggregated fields have the same set of non-standard CF properties (including, in this case, long_name), but not requiring the values to be the same. See the concatenate parameter.

exist: (sequence of) str, optional

Specify CF properties for which it is required that aggregated fields all contain the properties, but not requiring the values to be the same. See the concatenate parameter.

ignore: (sequence of) str, optional

Specify CF properties to omit from any properties specified by or implied by the equal_all, exist_all, equal and exist parameters.

exclude: bool, optional

If True then do not return unaggregatable field constructs. By default, all input field constructs are represent in the outputs.

respect_valid: bool, optional

If True then the CF properties valid_min, valid_max and valid_range are taken into account during aggregation. I.e. a requirement for aggregation is that fields have identical values for each these attributes, if set. By default these CF properties are ignored and are not set in the output fields.

dimension: (sequence of) str, optional

Create new axes for each input field which has one or more of the given properties. For each CF property name specified, if an input field has the property then, prior to aggregation, a new axis is created with an auxiliary coordinate whose datum is the property’s value and the property itself is deleted from that field.

concatenate: bool, optional

If False then a CF property is omitted from an aggregated field if the property has unequal values across constituent fields or is missing from at least one constituent field. By default a CF property in an aggregated field is the concatenated collection of the distinct values from the constituent fields, delimited with the string ' :AGGREGATED: '.

copy: bool, optional

If False then do not copy fields prior to aggregation. Setting this option to False may change input fields in place, and the output fields may not be independent of the inputs. However, if it is known that the input fields are never to accessed again (such as in this case: f = cf.aggregate(f)) then setting copy to False can reduce the time taken for aggregation.

axes: (sequence of) str, optional

Select axes to aggregate over. Aggregation will only occur over as large a subset as possible of these axes. Each axis is identified by the exact identity of a one dimensional coordinate object, as returned by its identity method. Aggregations over more than one axis will occur in the order given. By default, aggregation will be over as many axes as possible.

donotchecknonaggregatingaxes: bool, optional

If True, and axes is set, then checks for consistent data array values will only be made for one dimensional coordinate objects which span the any of the given aggregating axes. This can reduce the time taken for aggregation, but if any those checks would have failed then this clearly allows the possibility of an incorrect result. Therefore, this option should only be used in cases for which it is known that the non-aggregating axes are in fact already entirely consistent.

atol: number, optional

The tolerance on absolute differences between real numbers. The default value is set by the cf.atol function.

rtol: number, optional

The tolerance on relative differences between real numbers. The default value is set by the cf.rtol function.

field_ancillaries: (sequence of) str, optional

Create new field ancillary constructs for each input field which has one or more of the given properties. For each input field, each property is converted to a field ancillary construct that spans the entire domain, with the constant value of the property, and the property itself is deleted.

New in version 3.15.0.

cells: dict or None, optional

Provide conditions for dimension coordinate cells so that input field or domain constructs whose dimension coordinates match particular conditions will be aggregated separately from those which don’t. All other aggregation criteria apply as normal. This can be used, for instance, to ensure that monthly and daily averages of the same physical quantity are not aggregated together.

Field or domain constructs that match any of the given conditions are otherwise aggregated in the usual manner, as are those which don’t match any of the given conditions.

Conditions format

The conditions are specified in a dictionary for which each key is a dimension coordinate identity, with a corresponding value of one or more conditions on the dimension coordinate cell sizes and/or coordinate spacings. For instance, the cells dictionary {'T': {'cellsize': cf.D()}} will cause fields or domains with time coordinates ('T') whose cells all span 1 day (cf.D()) to be aggregated separately from all others.

A dictionary key selects a dimension coordinate construct from each input field or domain construct by passing the key to its dimension_coordinate method. For example, a key of 'T' will select the dimension coordinate construct returned by f.dimension_coordinate('T'). If no such dimension coordinate construct exists, or if a dimension coordinate construct exists but none of the corresponding conditions are passed, then no special aggregation consideration is given to that axis for that field or domain. The dictionary may have any number of keys, defining conditions for any number of dimension coordinates. If multiple keys match the identity of the same dimension coordinate construct then the conditions corresponding to the first such key encountered when iterating through the dictionary are used.

A dictionary value defines the dimension coordinate conditions as one, or an ordered sequence of, the following:

  • A condition for the cell size (i.e. the absolute difference between the cell bounds) given as {'cellsize': <condition1>}.

  • A condition for the cell coordinate spacing (i.e. the absolute difference between two neighbouring coordinate values) given as {'spacing': <condition2>}.

  • Simultaneous conditions for the cell size and the cell coordinate spacing are given as {'cellsize': <condition1>, 'spacing': <condition2>} (with arbitrary key order).

where <condition1> and <condition2> must each be one of a Query, TimeDuration, scalar Data, scalar data_like object, or None. A condition of None is equivalent to that condition not being defined (which may be a useful setting for conditions that are generated automatically).

Note

The TimeDuration conditions cf.M() (1 calendar month) and cf.Y() (1 calendar year) may be used, and are interpreted internally as the Query conditions cf.wi(28, 31, 'days') and cf.wi(300, 366, 'days') respectively.

Note

Using a cf.isclose query condition allows for control of the test’s sensitivity to floating point precision and rounding errors. See also the rtol and atol parameters.

Units

Units must be provided on the conditions where applicable, since conditions without defined units will not match dimension coordinate constructs with defined units.

Multiple conditions

Multiple conditions for the same dimension coordinate construct may be defined by providing an ordered sequence of conditions. In this case, the conditions are tested in order, with the first one to be passed (if any) defining the aggregation separation for each input field or domain.

Coordinate spacing conditions

If a coordinate spacing condition has been passed then, by default, it does not apply to the spacing between neighbouring coordinates from different input fields. However, if the contiguous parameter is also True then this will ensure that aggregated fields will have the specified cell coordinate spacing throughout. See the contiguous parameter for more details.

Note

Potentially unexpected results might occur in the particular circumstance of multiple coordinate spacing conditions being applied to aggregatable input fields for which some, but not all, have a size 1 aggregation axis. The concept of cell coordinate spacing is undefined for the size 1 dimension coordinates and so they will pass any coordinate spacing condition, which in practice means they pass the first in the sequence. If the dimension coordinates with size greater than 1 also pass the first condition then the aggregation will proceed as expected, but if they pass one of the other coordinate spacing conditions then the fields with size 1 dimension coordinates will be aggregated separately.

Climatological time cells

As a convenience, the configurable cf.climatology_cells function returns a cells dictionary that may be suitable for the time axis aggregation of typical climate model simulation outputs:

>>> x = cf.aggregate(fl, cells=cf.climatology_cells())

Storage of conditions

All returned field or domain constructs that have passed dimension coordinate cell conditions will have those conditions stored on the appropriate dimension coordinate constructs, retrievable via their DimensionCoordinate.get_cell_characteristics methods.

Performance

The testing of the conditions has a computational overhead, as well as an I/O overhead if the dimension coordinate data are on disk. Try to avoid setting redundant conditions. For instance, if the inputs comprise monthly mean air temperature and daily mean precipitation fields, then the different field identities alone will ensure a correct aggregation. In this case, adding cell conditions of {'T': [{'cellsize': cf.D()}, {'cellsize': cf.M()}]} will not change the result, but tests will still be carried out.

When setting a sequence of conditions, performance will be improved if the conditions towards the beginning of the sequence are those that are expected to be passed by the dimension coordinate constructs with the largest data arrays. This is because the conditions are tested in order until a condition passes, and subsequent conditions are not tested. Therefore, this strategy will minimise the amount of the most expensive tests, i.e. those on the largest data.

Parameter example

Equivalent ways to separate time cells of 1 day from other time cell sizes: {'T': {'cellsize': cf.D()}}, {'T': {'cellsize': cf.eq(1, 'day')}}, {'T': {'cellsize': cf.isclose(1, 'day')}}, {'T': {'cellsize': cf.Data(1, 'day')}}, {'T': {'cellsize': cf.h(24)}}, etc.

Parameter example

Equivalent ways to separate time cells of 1 month, in any calendar, from other time cell sizes: {'T': {'cellsize': cf.M()}}, {'T': {'cellsize': cf.wi(28, 31, 'day')}}.

Parameter example

To separate horizontal cells with size (2.5 degrees north, 3.75 degrees east): {'Y': {'cellsize': cf.Data(2.5, 'degreeN')}, 'X': {'cellsize': cf.Data(3.75, 'degreeE')}}.

Parameter example

To aggregate time cells of 1 day separately and time cells of 30 days separately, a sequence of two cell size conditions are provided: {'T': [{'cellsize': cf.D(1)}, {'cellsize': cf.D(30)}]}.

Parameter example

To aggregate 6-hourly instantaneous time cells, specify a cellsize of zero: {'T': {'cellsize': cf.h(0), 'spacing': cf.h(6)}}.

Parameter example

To separate time cells of 5-day running means given for consecutive days: {'T': {'cellsize': cf.D(5), 'spacing': cf.D(1)}}.

New in version 3.15.2.

no_overlap: deprecated at version 3.0.0

Use the overlap parameter instead.

info: deprecated at version 3.5.0

Use the verbose parameter instead.

Returns
FieldList

The aggregated field constructs.

Examples

The following six fields comprise eastward wind at two different times and for three different atmospheric heights for each time:

>>> f
[<CF Field: eastward_wind(latitude(73), longitude(96)>,
 <CF Field: eastward_wind(latitude(73), longitude(96)>,
 <CF Field: eastward_wind(latitude(73), longitude(96)>,
 <CF Field: eastward_wind(latitude(73), longitude(96)>,
 <CF Field: eastward_wind(latitude(73), longitude(96)>,
 <CF Field: eastward_wind(latitude(73), longitude(96)>]
>>> g = cf.aggregate(f)
>>> g
[<CF Field: eastward_wind(height(3), time(2), latitude(73), longitude(96)>]
>>> g[0].source
'Model A'
>>> g = cf.aggregate(f, dimension=('source',))
[<CF Field: eastward_wind(source(1), height(3), time(2), latitude(73), longitude(96)>]
>>> g[0].source
AttributeError: 'Field' object has no attribute 'source'