cf.read¶
-
cf.
read
(files, external=None, verbose=None, warnings=False, ignore_read_error=False, aggregate=True, nfields=None, squeeze=False, unsqueeze=False, fmt=None, cdl_string=False, select=None, extra=None, recursive=False, followlinks=False, um=None, chunk=True, field=None, height_at_top_of_model=None, select_options=None, follow_symlinks=False, mask=True, warn_valid=False, chunks='auto', domain=False, cfa=None)[source]¶ Read field or domain constructs from files.
The following file formats are supported: CF-netCDF, CFA-netCDF, CDL, PP and UM fields datasets.
Input datasets are mapped to constructs in memory which are returned as elements of a
FieldList
or if the domain parameter is True, aDomainList
.NetCDF files may be on disk or on an OPeNDAP server.
Any amount of files of any combination of file types may be read.
NetCDF unlimited dimensions
Domain axis constructs that correspond to NetCDF unlimited dimensions may be accessed with the
nc_is_unlimited
andnc_set_unlimited
methods of a domain axis construct.NetCDF hierarchical groups
Hierarchical groups in CF provide a mechanism to structure variables within netCDF4 datasets. Field constructs are constructed from grouped datasets by applying the well defined rules in the CF conventions for resolving references to out-of-group netCDF variables and dimensions. The group structure is preserved in the field construct’s netCDF interface. Groups were incorporated into CF-1.8. For files with groups that state compliance to earlier versions of the CF conventions, the groups will be interpreted as per the latest release of CF.
CF-compliance
If the dataset is partially CF-compliant to the extent that it is not possible to unambiguously map an element of the netCDF dataset to an element of the CF data model, then a field construct is still returned, but may be incomplete. This is so that datasets which are partially conformant may nonetheless be modified in memory and written to new datasets.
Such “structural” non-compliance would occur, for example, if the “coordinates” attribute of a CF-netCDF data variable refers to another variable that does not exist, or refers to a variable that spans a netCDF dimension that does not apply to the data variable. Other types of non-compliance are not checked, such whether or not controlled vocabularies have been adhered to. The structural compliance of the dataset may be checked with the
dataset_compliance
method of the field construct, as well as optionally displayed when the dataset is read by setting the warnings parameter.CDL files
A file is considered to be a CDL representation of a netCDF dataset if it is a text file whose first non-comment line starts with the seven characters “netcdf ” (six letters followed by a space). A comment line is identified as one which starts with any amount white space (including none) followed by “//” (two slashes). It is converted to a temporary netCDF4 file using the external
ncgen
command, and the temporary file persists until the end of the Python session, at which time it is automatically deleted. The CDL file may omit data array values (as would be the case, for example, if the file was created with the-h
or-c
option toncdump
), in which case the the relevant constructs in memory will be created with data with all missing values.PP and UM fields files
32-bit and 64-bit PP and UM fields files of any endian-ness can be read. In nearly all cases the file format is auto-detected from the first 64 bits in the file, but for the few occasions when this is not possible, the um keyword allows the format to be specified, as well as the UM version (if the latter is not inferrable from the PP or lookup header information).
2-d “slices” within a single file are always combined, where possible, into field constructs with 3-d, 4-d or 5-d data. This is done prior to any field construct aggregation (see the aggregate parameter).
When reading PP and UM fields files, the relaxed_units aggregate option is set to
True
by default, because units are not always available to field constructs derived from UM fields files or PP files.Performance
Descriptive properties are always read into memory, but lazy loading is employed for all data arrays which means that, in general, data is not read into memory until the data is required for inspection or to modify the array contents. This maximises the number of field constructs that may be read within a session, and makes the read operation fast. The exceptions to the lazy reading of data arrays are:
Data that define purely structural elements of other data arrays that are compressed by convention (such as a count variable for a ragged contiguous array). These are always read from disk.
If field or domain aggregation is in use (as it is by default, see the aggregate parameter), then the data of metadata constructs may have to be read to determine how the contents of the input files may be aggregated. This won’t happen for a particular field or domain’s metadata, though, if it can be ascertained from descriptive properties alone that it can’t be aggregated with anything else (as would be the case, for instance, when a field has a unique standard name).
However, when two or more field or domain constructs are aggregated to form a single construct then the data arrays of some metadata constructs (coordinates, cell measures, etc.) must be compared non-lazily to ascertain if aggregation is possible.
See also
cf.aggregate
,cf.write
,cf.Field
,cf.Domain
,cf.load_stash2standard_name
,cf.unique_constructs
- Parameters
- files: (arbitrarily nested sequence of)
str
A string or arbitrarily nested sequence of strings giving the file names, directory names, or OPenDAP URLs from which to read field constructs. Various type of expansion are applied to the names:
Expansion
Description
Tilde
An initial component of
~
or~user
is replaced by that user’s home directory.Environment variable
Substrings of the form
$name
or${name}
are replaced by the value of environment variable name.Pathname
A string containing UNIX file name metacharacters as understood by the Python
glob
module is replaced by the list of matching file names. This type of expansion is ignored for OPenDAP URLs.Where more than one type of expansion is used in the same string, they are applied in the order given in the above table.
- Parameter example:
The file
file.nc
in the user’s home directory could be described by any of the following:'$HOME/file.nc'
,'${HOME}/file.nc'
,'~/file.nc'
,'~/tmp/../file.nc'
.
When a directory is specified, all files in that directory are read. Sub-directories are not read unless the recursive parameter is True. If any directories contain files that are not valid datasets then an exception will be raised, unless the ignore_read_error parameter is True.
As a special case, if the
cdl_string
parameter is set to True, the interpretation offiles
changes so that each value is assumed to be a string of CDL input rather than the above.- external: (sequence of)
str
, optional Read external variables (i.e. variables which are named by attributes, but are not present, in the parent file given by the filename parameter) from the given external files. Ignored if the parent file does not contain a global “external_variables” attribute. Multiple external files may be provided, which are searched in random order for the required external variables.
If an external variable is not found in any external files, or is found in multiple external files, then the relevant metadata construct is still created, but without any metadata or data. In this case the construct’s
is_external
method will returnTrue
.- Parameter example:
external='cell_measure.nc'
- Parameter example:
external=['cell_measure.nc']
- Parameter example:
external=('cell_measure_A.nc', 'cell_measure_O.nc')
- extra: (sequence of)
str
, optional Create extra, independent field constructs from netCDF variables that correspond to particular types metadata constructs. The extra parameter may be one, or a sequence, of:
extra
Metadata constructs
'field_ancillary'
Field ancillary constructs
'domain_ancillary'
Domain ancillary constructs
'dimension_coordinate'
Dimension coordinate constructs
'auxiliary_coordinate'
Auxiliary coordinate constructs
'cell_measure'
Cell measure constructs
This parameter replaces the deprecated field parameter.
- Parameter example:
To create field constructs from auxiliary coordinate constructs:
extra='auxiliary_coordinate'
orextra=['auxiliary_coordinate']
.- Parameter example:
To create field constructs from domain ancillary and cell measure constructs:
extra=['domain_ancillary', 'cell_measure']
.
An extra field construct created via the extra parameter will have a domain limited to that which can be inferred from the corresponding netCDF variable, but without the connections that are defined by the parent netCDF data variable. It is possible to create independent fields from metadata constructs that do incorporate as much of the parent field construct’s domain as possible by using the
convert
method of a returned field construct, instead of setting the extra parameter.- verbose:
int
orstr
orNone
, optional If an integer from
-1
to3
, or an equivalent string equal ignoring case to one of:'DISABLE'
(0
)'WARNING'
(1
)'INFO'
(2
)'DETAIL'
(3
)'DEBUG'
(-1
)
set for the duration of the method call only as the minimum cut-off for the verboseness level of displayed output (log) messages, regardless of the globally-configured
cf.log_level
. Note that increasing numerical value corresponds to increasing verbosity, with the exception of-1
as a special case of maximal and extreme verbosity.Otherwise, if
None
(the default value), output messages will be shown according to the value of thecf.log_level
setting.Overall, the higher a non-negative integer or equivalent string that is set (up to a maximum of
3
/'DETAIL'
) for increasing verbosity, the more description that is printed to convey how the contents of the netCDF file were parsed and mapped to CF data model constructs.- warnings:
bool
, optional If True then print warnings when an output field construct is incomplete due to structural non-compliance of the dataset. By default such warnings are not displayed.
- ignore_read_error:
bool
, optional If True then ignore any file which raises an IOError whilst being read, as would be the case for an empty file, unknown file format, etc. By default the IOError is raised.
- fmt:
str
, optional Only read files of the given format, ignoring all other files. Valid formats are
'NETCDF'
for CF-netCDF files,'CFA'
for CFA-netCDF files,'UM'
for PP or UM fields files, and'CDL'
for CDL text files. By default files of any of these formats are read.- cdl_string:
bool
, optional If True and the format to read is CDL, read a string input, or sequence of string inputs, each being interpreted as a string of CDL rather than names of locations from which field constructs can be read from, as standard.
By default, each string input or string element in the input sequence is taken to be a file or directory name or an OPenDAP URL from which to read field constructs, rather than a string of CDL input, including when the
fmt
parameter is set as CDL.Note that when
cdl_string
is True, thefmt
parameter is ignored as the format is assumed to be CDL, so in that case it is not necessary to also specifyfmt='CDL'
.- aggregate:
bool
ordict
, optional If True (the default) or a dictionary (possibly empty) then aggregate the field constructs read in from all input files into as few field constructs as possible by passing all of the field constructs found the input files to the
cf.aggregate
, and returning the output of this function call.If aggregate is a dictionary then it is used to configure the aggregation process passing its contents as keyword arguments to the
cf.aggregate
function.If aggregate is False then the field constructs are not aggregated.
- squeeze:
bool
, optional If True then remove size 1 axes from each field construct’s data array.
- unsqueeze:
bool
, optional If True then insert size 1 axes from each field construct’s domain into its data array.
- select: (sequence of)
str
orQuery
orre.Pattern
, optional Only return field constructs whose identities match the given values(s), i.e. those fields
f
for whichf.match_by_identity(*select)
isTrue
. Seecf.Field.match_by_identity
for details.This is equivalent to, but faster than, not using the select parameter but applying its value to the returned field list with its
cf.FieldList.select_by_identity
method. For example,fl = cf.read(file, select='air_temperature')
is equivalent tofl = cf.read(file).select_by_identity('air_temperature')
.- recursive:
bool
, optional If True then recursively read sub-directories of any directories specified with the files parameter.
- followlinks:
bool
, optional If True, and recursive is True, then also search for files in sub-directories which resolve to symbolic links. By default directories which resolve to symbolic links are ignored. Ignored of recursive is False. Files which are symbolic links are always followed.
Note that setting
recursive=True, followlinks=True
can lead to infinite recursion if a symbolic link points to a parent directory of itself.This parameter replaces the deprecated follow_symlinks parameter.
- mask:
bool
, optional If False then do not mask by convention when reading the data of field or metadata constructs from disk. By default data is masked by convention.
The masking by convention of a netCDF array depends on the values of any of the netCDF variable attributes
_FillValue
,missing_value
,valid_min
,valid_max
andvalid_range
.The masking by convention of a PP or UM array depends on the value of BMDI in the lookup header. A value other than
-1.0e30
indicates the data value to be masked.See https://ncas-cms.github.io/cf-python/tutorial.html#data-mask for details.
New in version 3.4.0.
- warn_valid:
bool
, optional If True then print a warning for the presence of
valid_min
,valid_max
orvalid_range
properties on field constructs and metadata constructs that have data. By default no such warning is issued.“Out-of-range” data values in the file, as defined by any of these properties, are automatically masked by default, which may not be as intended. See the mask parameter for turning off all automatic masking.
See https://ncas-cms.github.io/cf-python/tutorial.html#data-mask for details.
New in version 3.4.0.
- um:
dict
, optional For Met Office (UK) PP files and Met Office (UK) fields files only, provide extra decoding instructions. This option is ignored for input files which are not PP or fields files. In most cases, how to decode a file is inferrable from the file’s contents, but if not then each key/value pair in the dictionary sets a decoding option as follows:
Key
Value
'fmt'
The file format (
'PP'
or'FF'
)'word_size'
The word size in bytes (
4
or8
).'endian'
The byte order (
'big'
or'little'
).'version'
The UM version to be used when decoding the header. Valid versions are, for example,
4.2
,'6.6.3'
and'8.2'
. In general, a given version is ignored if it can be inferred from the header (which is usually the case for files created by the UM at versions 5.3 and later). The exception to this is when the given version has a third element (such as the 3 in 6.6.3), in which case any version in the header is ignored.The default version is
4.5
.'height_at_top_of_model'
The height (in metres) of the upper bound of the top model level. By default the height at top model is taken from the top level’s upper bound defined by BRSVD1 in the lookup header. If the height can’t be determined from the header, or the given height is less than or equal to 0, then a coordinate reference system will still be created that contains the ‘a’ and ‘b’ formula term values, but without an atmosphere hybrid height dimension coordinate construct.
Note
A current limitation is that if pseudolevels and atmosphere hybrid height coordinates are defined by same the lookup headers then the height can’t be determined automatically. In this case the height may be found after reading as the maximum value of the bounds of the domain ancillary construct containing the ‘a’ formula term. The file can then be re-read with this height as a um parameter.
If format is specified as
'PP'
then the word size and byte order default to4
and'big'
respectively.This parameter replaces the deprecated umversion and height_at_top_of_model parameters.
- Parameter example:
To specify that the input files are 32-bit, big-endian PP files:
um={'fmt': 'PP'}
- Parameter example:
To specify that the input files are 32-bit, little-endian PP files from version 5.1 of the UM:
um={'fmt': 'PP', 'endian': 'little', 'version': 5.1}
New in version 1.5.
- chunks:
str
,int
,None
, ordict
, optional Specify the
dask
chunking of dimensions for data in the input files.By default,
'auto'
is used to specify the array chunking, which uses a chunk size in bytes defined by thecf.chunksize
function, preferring square-like chunk shapes across all data dimensions.If chunks is a
str
then each data array uses this chunk size in bytes, preferring square-like chunk shapes across all data dimensions. Any string value accepted by the chunks parameter of thedask.array.from_array
function is permitted.- Parameter example:
A chunksize of 2 MiB may be specified as
'2097152'
or'2 MiB'
.
If chunks is
-1
orNone
then for each there is no chunking, i.e. every data array has one chunk regardless of its size.If chunks is a positive
int
then each data array dimension has chunks with this number of elements.If chunks is a
dict
, then each of its keys identifies dimension in the file, with a value that defines the chunking for that dimension whenever it is spanned by data.Each dictionary key identifies a file dimension in one of three ways: 1. the netCDF dimension name, preceded by
ncdim%
(e.g.'ncdim%lat'
); 2. the “standard name” attribute of a CF-netCDF coordinate variable that spans the dimension (e.g.'latitude'
); or 3. the “axis” attribute of a CF-netCDF coordinate variable that spans the dimension (e.g.'Y'
).The dictionary values may be
str
,int
orNone
, with the same meanings as those types for the chunks parameter but applying only to the specified dimension. Atuple
orlist
of integers that sum to the dimension size may also be given.Not specifying a file dimension in the dictionary is equivalent to it being defined with a value of
'auto'
.- Parameter example:
{'T': '0.5 MiB', 'Y': [36, 37], 'X': None}
- Parameter example:
If a netCDF file contains dimensions
time
,z
,lat
andlon
, then{'ncdim%time': 12, 'ncdim%lat', None, 'ncdim%lon': None}
will ensure that alltime
axes have a chunksize of 12; and alllat
andlon
axes are not chunked; and allz
axes are chunked to comply as closely as possible with the default chunks size.If the netCDF also contains a
time
coordinate variable with astandard_name
attribute of'time'
and anaxis
attribute of'T'
, then the same chunking could be specified with either{'time': 12, 'ncdim%lat', None, 'ncdim%lon': None}
or{'T': 12, 'ncdim%lat', None, 'ncdim%lon': None}
.
Note
The chunks parameter is ignored for PP and UM fields files, for which the chunking is pre-determined by the file format.
New in version 3.14.0.
- domain:
bool
, optional If True then return only the domain constructs that are explicitly defined by CF-netCDF domain variables, ignoring all CF-netCDF data variables. By default only the field constructs defined by CF-netCDF data variables are returned.
CF-netCDF domain variables are only defined from CF-1.9, so older datasets automatically contain no CF-netCDF domain variables.
The unique domain constructs of the dataset are easily found with the
cf.unique_constructs
function. For example:>>> d = cf.read('file.nc', domain=True) >>> ud = cf.unique_constructs(d) >>> f = cf.read('file.nc') >>> ufd = cf.unique_constructs(x.domain for x in f)
Domain constructs can not be read from UM or PP datasets.
New in version 3.11.0.
- cfa:
dict
, optional Configure the reading of CFA-netCDF files. The dictionary may have any subset of the following key/value pairs to override the information read from the file:
'substitutions'
:dict
A dictionary whose key/value pairs define text substitutions to be applied to the fragment file names. Each key may be specified with or without the
${...}
syntax. For instance, the following are equivalent:{'base': 'sub'}
,{'${base}': 'sub'}
. The substitutions are used in conjunction with, and take precedence over, any that are stored in the CFA-netCDF file by thesubstitutions
attribute of thefile
CFA aggregation instruction variable.- Example:
{'base': 'file:///data/'}
New in version 3.15.0.
- umversion: deprecated at version 3.0.0
Use the um parameter instead.
- height_at_top_of_model: deprecated at version 3.0.0
Use the um parameter instead.
- field: deprecated at version 3.0.0
Use the extra parameter instead.
- follow_symlinks: deprecated at version 3.0.0
Use the followlinks parameter instead.
- select_options: deprecated at version 3.0.0
Use methods on the returned
FieldList
instead.- chunk: deprecated at version 3.14.0
Use the chunks parameter instead.
- files: (arbitrarily nested sequence of)
- Returns
FieldList
orDomainList
The field or domain constructs found in the input dataset(s). The list may be empty.
Examples
>>> x = cf.read('file.nc')
Read a file and create field constructs from CF-netCDF data variables as well as from the netCDF variables that correspond to particular types metadata constructs:
>>> f = cf.read('file.nc', extra='domain_ancillary') >>> g = cf.read('file.nc', extra=['dimension_coordinate', ... 'auxiliary_coordinate'])
Read a file that contains external variables:
>>> h = cf.read('parent.nc') >>> i = cf.read('parent.nc', external='external.nc') >>> j = cf.read('parent.nc', external=['external1.nc', 'external2.nc'])
>>> f = cf.read('file*.nc') >>> f [<CF Field: pmsl(30, 24)>, <CF Field: z-squared(17, 30, 24)>, <CF Field: temperature(17, 30, 24)>, <CF Field: temperature_wind(17, 29, 24)>]
>>> cf.read('file*.nc')[0:2] [<CF Field: pmsl(30, 24)>, <CF Field: z-squared(17, 30, 24)>]
>>> cf.read('file*.nc')[-1] <CF Field: temperature_wind(17, 29, 24)>
>>> cf.read('file*.nc', select='units=K') [<CF Field: temperature(17, 30, 24)>, <CF Field: temperature_wind(17, 29, 24)>]
>>> cf.read('file*.nc', select='ncvar%ta') <CF Field: temperature(17, 30, 24)>