pycdf.pycdf | index /usr/lib/python2.3/site-packages/pycdf/pycdf.py |
Python interface to the Unidata netCDF library
(see: www.unidata.ucar.edu/packages/netcdf).
Version: 0.6-3
Date: Feb 10 2007
Table of contents
Introduction
Array packages : Numeric, numarray, numpy
Package components
Prerequisites
Documentation
Summary of differences between pycdf and C API
Error handling
Primer on reading and querying a netcdf variable
High level attribute access
High level variable access : extended indexing, slicing and ellipsis
Reading/setting multivalued netCDF attributes and variables
Rules governing array assignment
Working with scalar variables
Working with the unlimited dimension
Quirks
Functions summary
Classes summary
Examples
Introduction
------------
pycdf is a python wrapper around the netCDF v3 C API (up to version 3.6.2).
Please note that netCDF v4 is NOT supported by pycdf.
pycdf augments the API with an OOP framework where a netcdf file is accessed
through 4 different types of objects:
CDF netCDF dataset (file)
CDFDim netCDF dimension
CDFVar netCDF variable
CDFAttr netCDF attribute (dataset or variable attribute)
Of secondary importance are the following classes, which let one produce
a virtual netCDF dataset from the concatenation of two or more datasets
based on the same record variables:
CDFMF see a sequence of datasets as one consolidated dataset
CDFMFDim access a CDFMF dimension
CDFMFVar access a CDFMV variable
pycdf key features are as follows.
-pycdf is a complete implementation of the functionnality offered
by the netCDF v3 C API. For almost every function offered by the C API
exists an equivalent method in one of the pycdf classes. pycdf does not
hide anything, and everything possible in the C implementation is also
achievable in python. It is quite straightforward to go from C to python
and vice-versa, and to learn pycdf usage by refering to the C API
documentation.
-pycdf method names bear a strong resemblance to their C counterparts,
but are generally much simpler.
-A few high-level python methods have been developped to ease
programmer's task. Of greatest interest are those allowing netCDF
access through familiar python idioms:
-netCDF attributes can be read/written like ordinary python class
and object attributes
-netCDF variables can be read/written like ordinary python lists using
multidimensional indices and so-called "extended slice syntax", with
strides allowed
See "High level attribute access" and "High level variable access"
sections for details.
Other python specific helper methods are:
-pycdf can transparently put the dataset into define and data mode,
thus relieving the programmer from having to call redef() and
enddef(). See CDF.datamode() method for details.
-pycdf offers methods to retrieve a dictionnary of the attributes,
dimensions and variables defined on a dataset, and of the attributes
set on a variable. Querying a dataset is thus greatly simplified.
See methods CDF.attributes(), CDF.dimensions(), CDF.variables(),
and CDFVar.attributes() for details.
Array package : Numeric, numarray, numpy
----------------------------------------
netCDF variables are read/written using high-level "array" objects,
since arrays offer the closest match to the nD matrices that netCDF
implement.
Before 2005, arrays used to be supported solely by the python 'Numeric'
package. Promotion of a new array package called 'numarray' was then
attempted, which pycdf supported starting with version 0.6.
With the advent in late 2006 of the much awaited 'numpy' package, which
reconciliates Numeric and numarray while offering more functionnality,
Numeric and numarray should be considered deprecated. Beginning with
release 0.6-3,pycdf fully supports numpy, while still offering support
for the older Numeric and numarray pacakges for backwards compatibility.
The choice of the array package on which to base pycdf is made at install time
(see the INSTALL file in the pycdf distribution). Only ONE style of install is
possible : a numpy-based and a numarray-based pycdf cannot coexist on the
same machine, unless one wants to play tricks with python search paths.
Although the underlying API is very similar, the packages define
the "array" somewhat differently. Since the contents of cdf variables are always
returned as "arrays" to the calling program, the user must be carefull when
processing those arrays using a package different from the one used by pycdf
to generate them. A numpy-style array may not always be acceptable to a
numarray-based package, and vice-versa.
In the rest of this documentation, when one reads "from numpy import ...", it
should be assumed that "from Numeric import ..." (or "from numarray")
could also be used, unless explicitly stated otherwise.
It may be helpfull at times to obtain the name of the package on which the current
pycdf installation is based. For that, simply call function pycdfArrayPkg().
Package components
------------------
pycdf is a proper Python package, eg a collection of modules stored under
a directory name identical to the package name and holding an __init__.py
file. The pycdf package is composed of 3 main modules:
_pycdfext C extension module responsible for wrapping the
netcdf library
pycdfext python module implementing some utility functions
complementing the extension module
pycdf python module which wraps the extension module inside
an OOP framework
A fourth small utility module named "pycdfext_array" is used to isolate
declarations specific to the type of array package used. The pycdf module
imports the 'array' constructors indirectly through this utility module,
which is configured at install time.
Take note that, following the "import pycdf" or "from pycdf import *"
statement, the pycdf package exposes only the function and class declarations
documented herein. All references to the package modules are removed, and
they become very difficult to get at from a user program. The objects
exposed by pycdf are:
classes:
CDF, CDFAttr, CDFDim, CDFError, CDFMF, CDFVar, NC
functions:
inq_libvers, pycdfArrayPkg, pycdfVersion, strerror
_pycdfext and pycdfext were generated with the SWIG preprocessor.
SWIG is however *not* needed to run the package. Those two modules
are meant to do their work in the background, and should never be called
directly (as noted above, they are almost impossible to get at anyway).
Only 'pycdf' should be imported by the user program.
Prerequisites
-------------
The following software must be installed in order for pycdf to
work.
netCDF library
pycdf does *not* include the netCDF library, which must
be installed separately. netCDF is available at
"www.unidata.ucar.edu/packages/netcdf". All versions
up to 3.6.2 are supported.
numpy, Numeric or numarray python package
netCDF variables are read/written using the array data type provided
by the numpy, Numeric or numarray python packages. Those packages are
available at "numpy.sourceforge.net".
Documentation
-------------
pycdf has been written so as to stick as closely as possible to
the naming conventions and calling sequences documented inside the
"NetCDF Users Guide for C" manual. Even if pycdf gives an OOP twist
to the C API, the C manual can be easily used as a documentary source
for pycdf, once the class to which a method belongs has been
identified, and of course once requirements imposed by the Python
langage have been taken into account. Consequently, this documentation
will not attempt to provide an exhaustive coverage of the netCDF
library. For this, the user is referred to the above mentioned manual.
This document (in both its text and html versions) has been completely
produced using "pydoc", the Python documentation generator (which
made its debut in the Python 2.1 release). pydoc can also be used
as an on-line help tool. For example, to know everything about
the CDFVar class, say:
>>> from pydoc import help
>>> from pycdf import *
>>> help(CDFVar)
To be more specific and get help only for the get() method of the
CDFVar class:
>>> help(CDFVar.get) # or...
>>> help(vinst.get) # if vinst is a CDFVar instance
Summary of differences between pycdf and C API
----------------------------------------------
Most of the differences between the pycdf and C API can
be summarized as follows.
Python method vs C function names:
-Prefix 'nc_' has been dropped everywhere.
-Suffixes that became redundant given the class to which the
method belongs have been dropped. For example, C function
'nc_inq_var()' belongs to python class CDFVar (the class
encapsulating methods having to do with a netCDF 'variable').
The '_var' suffix is now redundant, and the method name
simplifes to 'inq()'.
-The same reasoning has led to the dropping of redundant
infixes. For ex., C function 'nc_inq_dimname()' belongs
to the 'CDFDim' class (the class describing a netCDF
'dimension'), and has been renamed 'inq_name' since the
'dim' infix is now redundant.
Internal vs external data types
-The C API offers the programmer the possibility of automatically
converting between netCDF external types (eg: NC.BYTE, NC.SHORT,
NC.INT, etc) and a vast array of C "internal" types (unsigned char,
char, short, etc). Each basic function responsible for reading/writing
a value then comes in a variety of different flavors, one per internal
type (eg: nc_put_var_text(), nc_put_var_uchar(), nc_put_var_schar,
etc).
-pycdf does not offer any such type conversion, mostly because this
would be either meaningless in the context of the Python language,
or because type conversion is better left as an explicit task to
the programmer outside of the netCDF context.
Values returned by the "get" methods always match the netCDF type
(NC.BYTE, NC.SHORT and NC.INT are returned as integers,
NC.FLOAT and NC.DOUBLE as reals, and NC.CHAR as strings).
Conversely, values written by the "put" methods are taken verbatim
from the argument lists and outputted according to the underlying
netCDF type, using the function variant allowing the maximum value
range (eg: "long" variant for integers, "double" variant for reals).
Return values
-In the C API, every function returns an integer status code, and values
computed by the function are returned through one or more pointers
passed as arguments.
-In pycdf, error statuses are returned through the Python exception
mechanism, and values are returned as the method result. When the
C API specifies that multiple values are returned, pycdf returns a
tuple of values, which are ordered similarly as the pointers in the
C function argument list.
Error handling
--------------
All errors are reported by pycdf using the Python exception mechanism.
pycdf normally raises a CDFError exception (a subclass of Exception).
The message accompanying the error is a 3-element tuple composed
in order of: the name of the function/method which raised the exception,
an integer error code, and a string explaining the meaning of this
error code. A negative error code signals an error raised by the
netCDF C library, and the string is then identical to the one obtained
through the strerror() function call. An error code of 0 indicates
an error signaled by the python layer, not the netCDF C library.
However, some errors related to the inner workings of the pycdf package
are reported using the standard python exceptions (ValueError, TypeError,
etc) rather than a CDFError exception.
Ex.:
>>> from pycdf import *
>>> try:
... d=CDF('foo.nc')
... except CDFError,err:
... print "pycdf reported an error in function/method:",err[0]
... print " netCDF error ",err[1],":",err[2]
>>>
Primer on reading and querying a netcdf file
--------------------------------------------
Here are useful hints for a quick start on how to read and query a netcdf
file.
Assume the file is named 'table.nc' (as created for example by the
'txttocdf.py' program inside the 'examples/txttocdf' directory accompanying
the pycdf distribution).
To open the file:
% python
>>> from pycdf import *
>>> from numpy import * # or "from Numeric import *"
>>> nc = CDF('table.nc') # file opened in readonly mode
NOTE: You need to import the array package (numpy, Numeric, numarray)
ONLY if you want to call methods or query attributes of the
array objects returned by pycdf. None of this is needed in the
example statements given below. Thus, the 'from <pkg> import *'
entered above could have been omitted without harm.
To get a dictionnary of attributes defined at the file level :
>>> ncattr = nc.attributes() # key is attr name, value is attr value
To get a dictionnary of the variables stored inside the file :
>>> vardict = nc.variables()
The keys are the variable names; the values store the variable properties,
eg: dimension names, shape, and type
To get a list of the variable names:
>>> varnames = nc.variables().keys()
To retrieve and print the full array of values stored inside variable
'varnames[0]' :
>>> v0 = nc.var(varnames[0])[:] # without the [:], you would get a CDF
# var instance; the slice gets you
# the array of values
>>> print v0
To print the values of the last column of array v0 :
>>> print v0[:,-1]
To print just the first two rows of values of variable 'varnames[1]' :
>>> v1_01 = nc.var(varnames[1])[:2]
To get the dictionnary of attributes attached to variable 'varnames[0]' :
>>> v0_dict = nc.var(varnames[0]).attributes()
Keys are the attribute names, and the dictionnary values store the
attribute values.
See the 'Examples' section below for a list of more comprehensive examples.
High level attribute access
---------------------------
netCDF allow setting attributes either at the dataset or the variable
level. Attributes are names storing information (in the form of scalars,
strings, sequences) which help interpret the dataset or variable they
are attached to. netCDF attributes rely on a set of conventions (see the
netCDF manual) and are not enforced in any way by the library. The only
exception (known to the author) is the '_FillValue' attribute which, when
attached to a variable, sets the value that is to be stored in
uninitialized entries of this variable. All other attributes must be
interpreted at the application level.
With pycdf, attributes can be assigned in two ways.
-By calling the get()/put() method of an attribute instance. In the
following example, dataset 'example.nc' is created, and string
attribute 'title' is attached to the dataset and given value
'this is an example'.
>>> from pycdf import *
>>> d = CDF('example.nc',NC.WRITE|NC.CREATE) # create dataset
>>> att = d.attr('title') # create attr. instance
>>> att.put(NC.CHAR, 'this is an example') # set attr. type and value
>>> att.get() # get attr. value
'this is an example'
>>>
-By handling the attribute like an ordinary Python class attribute.
Above example can then be rewritten as follows:
>>> from pycdf import *
>>> d = CDF('example.nc',NC.WRITE|NC.CREATE) # create dataset
>>> d.title = 'this is an example' # set attribute type and value
>>> d.title # get attribute value
'this is an example'
>>>
This applies as well to multi-valued attributes.
>>> att = d.attr('values') # With an attribute instance
>>> att.put(NC.INT, (1,2,3,4,5))
>>> att.get()
[1, 2, 3, 4, 5]
>>> d.values = (1,2,3,4,5) # As a Python class attribute
>>> d.values
[1, 2, 3, 4, 5]
When the attribute is known by its name through a string, standard
functions `setattr()' and `getattr()' can be used to replace the dot
notation. Above example becomes:
>>> setattr(d, 'values', (1,2,3,4,5))
>>> getattr(d, 'values')
[1, 2, 3, 4, 5]
Handling a netCDF attribute like a Python class attribute is admittedly
more natural, and also simpler. Some control is however lost in doing so.
-Attribute type cannot be specified. pycdf automatically selects one of
three types according to the value(s) assigned to the attribute:
NC.CHAR if value is a string, NC.INT if all values are integral,
NC.DOUBLE if one value is a float.
-Consequently, unsigned NC.BYTE values cannot be assigned.
-Attribute properties (length, type, index number) can only be queried
through methods of an attribute instance.
High level variable access : extended indexing, slicing and ellipsis
--------------------------------------------------------------------
With pycdf, netCDF variables can be read/written in two ways.
The first way is through the get()/put() methods of a variable instance.
Those methods accept parameters to specify the starting indices, the count
of values to read/write, and the strides along each dimension. For example,
if 'v' is a 4x4 array:
>>> v.get() # complete array
>>> v.get(start=(0,0),count=(1,4)) # first row
>>> v.get(start=(0,1),count=(2,2), # second and third columns of
... stride=(2,1)) # first and third row
The second way is by indexing and slicing the variable like a Python
sequence. pycdf here follows most of the rules used to index and slice
arrays. Thus a netCDF variable can be seen as an array,
except that data is read from/written to a file instead of memory.
Extended indexing let you access variable elements with the familiar
[i,j,...] notation, with one index per dimension. For example, if 'm' is a
3x3x3 netCDF variable, one could write:
>>> m[0,3,5] = m[0,5,3]
When indexing is used to select a dimension in a `get' operation, this
dimension is removed from the output array, thus reducing its rank by 1. A
rank 0 array is converted to a scalar. Thus, for a 3x3x3 `m' variable
(rank 3) of type int :
>>> a = m[0] # a is a 3x3 array (rank 2)
>>> a = m[0,0] # a is a 3 element array (rank 1)
>>> a = m[0,0,0] # a is an integer (rank 0 array becomes a scalar)
Had this rule not be followed, m[0,0,0] would have resulted in a single
element array, which could complicate computations.
Extended slice syntax allows slicing netCDF variables along each of its
dimensions, with the specification of optional strides to step through
dimensions at regular intervals. For each dimension, the slice syntax
is: "i:j[:stride]", the stride being optional. As with ordinary slices,
the starting and ending values of a slice can be omitted to refer to the
first and last element, respectively, and the end value can be negative to
indicate that the index is measured relative to the tail instead of the
beginning. Omitted dimensions are assumed to be sliced from beginning to
end. Thus:
>>> m[0] # treated as `m[0,:,:]'.
Example above with get()/put() methods can thus be rewritten as follows:
>>> v[:] # complete array
>>> v[:1] # first row
>>> v[::2,1:3] # second and third columns of first and third row
Indexes and slices can be freely mixed, eg:
>>> m[:2,3,1:3:2]
An ellipis (...) can be used to denote consecutive dimensions in a slicing
expression, avoiding the use of a series of ':' "wild-cards". Only one
ellipsis can appear, either at the start, the end, or the middle of the
slicing expression (more than one ellipsis would make the expression
ambiguous). Thus, if 'v' a 5-dimensional variable :
v[...,-1] equivalent to v[:,:,:,:,-1]
v[0,...,-1] equivalent to v[0,:,:,:,-1]
v[2,...] equivalent to v[2]
An ellipsis can help write cleaner code. Referring to the above
example, it is not clear, when faced with "v[2]", if we deal with a
1-dimensional array or not. The ellipsis used in the equivalent "v[2,...]"
expression makes clear that trailing dimensions are to be accounted for.
Note that, countrary to indexing, a slice never reduces the rank of the
output array, even if its length is 1. For example, given a 3x3x3 `m'
variable:
>>> a = m[0] # indexing: a is a 3x3 array (rank 2)
>>> a = m[0:1] # slicing: a is a 1x3x3 array (rank 3)
As can easily be seen, extended slice syntax is much more elegant and
compact, and offers a few possibilities not easy to achieve with the
get()/put() methods. Negative indices offer a nice example:
>>> v[-2:] # last two rows
>>> v[-3:-1] # second and third row
>>> v[:,-1] # last column
The only features exclusively available with the get()/put) methods are the
specification of a mapping vector (which could be used for ex. to
transpose an array), and the handling of NC.BYTE type values as unsigned.
Reading/setting multivalued netCDF attributes and variables
-----------------------------------------------------------
Multivalued netCDF attributes are set using a python sequence (tuple or
list). Reading such an attribute returns a python list. The easiest way to
read/set a netCDF attribute is by handling it like a Python class attribute
(see "High level attribute access"). For example:
>>> d=CDF('test.nc',NC.WRITE|NC.CREATE) # create dataset
>>> d.integers = (1,2,3,4) # define multivalued integer attr
>>> d.integers # get the attribute value
[1, 2, 3, 4]
The easiest way to set multivalued netCDF variables is to assign to an
indexed subset of the variable, using "[:]" (or [...]) to assign to the
whole variable (see "High level variable access"). The assigned value
can be a python sequence, which can be multi-leveled when assigning to a
multdimensional variable. For example:
>>> d=CDF('test.nc',NC.WRITE|NC.CREATE) # create dataset
>>> d3=d.def_dim('d1',3) # create dim. of length 3
>>> v1=d.def_var('v1',NC.INT,d3) # 3-elem vector
>>> v1[:]=[1,2,3] # assign 3-elem python list
>>> v2=d.def_var('d2',NC.INT,(d3,d3)) # create 3x3 variable
# The list assigned to v2 is composed
# of 3 lists, each representing a row of v2.
>>> v2[:]=[[1,2,3],[11,12,13],[21,22,23]]
The assigned value can also be an array. Rewriting example above:
>>> v1=array([1,2,3])
>>> v2=array([[1,2,3],[11,12,13],[21,22,23])
Note how we use indexing expressions 'v1[:]' and 'v2[:]' when assigning
using python sequences, and just the variable names when assigning
arrays.
Reading a netCDF variable always returns an array, except if
indexing is used and produces a rank-0 array, in which case a scalar is
returned.
Rules governing array assignment
--------------------------------
pycdf releases before 0.6 were somewhat careless when dealing with
array assignments. For example, no validity check was performed when
attempting to assign the contents of an array to an array of a different
shape. This could result in garbage being assigned, fatal errors,
and hard to catch rampant bugs.
Beginning with release 0.6, when an array (or a slice of thereof) is
assigned to, pycdf makes sure that the type of right-hand side is
acceptable, and that the values meet certain validity constraints. An
array can be assigned :
- a scalar (integer or float)
- a sequence (list or tuple) of integers or floats, or sequences
of integers or floats (arbitrarily nested)
- an array (possibly sliced)
The following paragraphs define the rules obeyed by pycdf.
Any unmet condition will be signaled by a TypeError or ValueError
exception.
Assigning a scalar to an array
When an integer or float scalar value is used on the right-hand side
(as in "x[4:6,:10:2] = 5), the value is now replicated (broadcasted)
over the whole left-hand size. Thus:
>>> x[:2] = 0" # zeroes the first two rows of array "x"
>>> x[:] = 1 # set 'x' to all 1's; equivalent to, but much
# simpler than :
# x[:] = numpy.ones((x.shape()))
# x[:] = NUMARRAY.ones((x.shape()))
Note in the above example that we do not have to care about the shape of
the left-hand side array.
Assigning a sequence (tuple or list) to an array
When a sequence appears on the right-hand size, it must hold only
integer or float scalars, or nested sequences thereof. The total number
of scalars in the sequence (ignoring nesting level) must match
exactly the number of elements expected on the left-hand side. The
sequence nesting levels are of no consequence, and the values are
assigned to the array in row-major order. Thus, if "x" is a 3x3 array
and "seq" is a sequence , then the statement "x[:] = seq" requires 9
values to be assigned to 'x' and is legal only if "seq" enumerates
exactly 9 values, eg:
>>> x[:] = (1,2,3,4,5,6,7,8,9) # ok, 9 values at same level
>>> x[:] = ((1,2,3),(4,5,6),(7,8,9)) # ok, 9 values in a 2-level tuple
>>> x[:] = ((1,2,3)[(4,5,6),(7,8,9)] # ok, 9 values, mix of
# tuple and list
>>> x[:] = [1,2,3,4] # wrong, 4 values listed, 5 missing
Assigning the contents of an array to an array
When an array (possibly sliced) is used as the right-hand size, its shape
must exactly match that of the array (possibly sliced) used on the left-
hand side. Thus, if "x" is a 4x4 array and "y" is a 6x4 array :
>>> x[...] = y # Fails, shape of x is (4,4) and does not match
# that of y which is (6,4)
>>> x[...] = y[:4] # Works since array 'y' is sliced to
# a (4,4) shape
Working with scalar variables
-----------------------------
A scalar (rank-0) variable is created inside a dataset by calling
dataset method def_var() with an empty (or omitted) dimension sequence, eg:
>>> cdf = CDF(...)
>>> cdf.automode()
>>> temp = df.def_var('temp', NC.FLOAT) # 'temp' is a scalar variable
Now, methods put() and get() of this variable can be called
to set and get the variable value, and attributes can be set on the
variable in the usual way, eg:
>>> temp.put(12)
>>> temp.units = "celsius"
>>> print temp.get(), temp.units # prints "12.0 celsius"
For uniformity purposes, the slicing expression "[:]" is also applicable
to scalar variables, even if they are not sequences at all.
Purists may disagree, but otherwise scalar variables could only be accessed
through get() and put() methods, preventing writing generic code to handle
variables using slicing constructs. We can thus write:
>>> temp[:] = 12 # equivalent to temp.put(12)
>>> print temp[:] # equivalent to "print temp.get()"
For a scalar variable 'v', the methods inquiring about the dimensions of 'v' will naturally
return empty sequences, eg: dimensions(), inq(), inq_dimid(), shape(). Method
inq_ndims() will return 0.
Working with the unlimited dimension
------------------------------------
Inside a dataset, one dimension can be designated as being 'unlimited',
allowing variables based on that dimension to dynamically grow
along that dimension. In physical applications, the unlimited dimension
is frequently used to manage 'time', as for example in a meteorological
model which could output forecasts composed of temperature(time,lat,lon),
pressure(time,lat,lon), etc, data grids.
An unlimited dimension is defined by calling the dataset def_dim() method
using NC.UNLIMITED as the dimension length, eg:
>>> d1 = cdf.def_dim('d1', NC.UNLIMITED)
A variable can be allowed to grow along that dimension if that dimension
comes first in the variable dimension list, eg:
>>> d2 = cdf.def_dim('d2', 5)
>>> v = cdf.def_var('v', NC.DOUBLE, (d1, d2)) # 'd1' must come first
Given an unlimited dimension 'd' and a variable 'v' whose first dimension
is 'd', it is common in netcdf parlance to designate 'v' as a "record
variable", and the data subsets v[0], v[1], etc as "records" inside 'v'.
For ex., if 'd' represents time, and 'v' is a temperature(time,lat,lon)
variable, one can picture v[0] as a "record" holding the grid of
temperatures at time 0, v[1] as the "record" of temperatures at time 1,
etc. Variable 'v' is extended by adding "records" v[0], v[1], etc along
dimension 'd', much as a traditional file is extended by writing data
records to it.
Here are some usefull methods to work with the unlimited dimension.
-Given a CDF dataset instance 'nc':
nc.inq_unlimdim() returns the index of the dataset unlimited dimension
or -1 if no unlimited dimension exists
nc.inq_unlimlen() returns the current length of the unlimited
dimension (or -1 if no unlimited dimension exists)
-Given a CDFVar variable instance 'v':
v.isrecord() checks whether v is a record variable, eg if first
dimension of v refers to the unlimited dimension.
Only ONE unlimited dimension is allowed inside a dataset, and ALL variables
based on that dimension grow "in synch". So, if variables 'v1' and 'v2'
include an unlimited dimension, adding records to 'v1' will also create new
records in 'v2' as a side effect. Those records will be initialized with
the 'v2' fill value. They will of course need to be properly initialized
afterwards.
When assigning to a variable along an unlimited dimension, the variable
must be properly sliced so as to match the shape of the right-hand side.
The "wild-card" notation ([:], [...]) cannot be used if the shape of the
right-hand side exceeds the current shape of the variable : a shape
mismatch will then be declared and the assignment will be refused. Slicing
the variable beyond its current length will allocate new records and solve
the problem. For example, if 'd' is an unlimited dimension, 'v' has
dimensions (d,5) and 'v' is empty at start:
>>> v[:] = zeros((4,5)) # fails: shape mismatch : (0,5) vs (4,5)
>>> v[:4] = ones((4,5)) # works: allocate records 0 to 3
# and set them to 1's
>>> v[:] = zeros((4,5)) # now works: records 0 to 3 exist and are
# reset to 0's
>>> v[:2,:2] = ones((2,2)) # works: resets records 0 and 1 to 1's
An unlimited dimension can be made to grow by assigning to higher and
higher indices along that dimension. Thus:
>>> for i in range(4,7):
... v[i] = i * ones(5) # grow dimension unlimited dimension
# from 4 to 6
Making an unlimited dimension grow in a non-sequential way will allocate
intermediate records inside the variable, which will be initialized with
the variable fill value (default one, or the one set with attribute
_FillValue). So, if 'v' currently holds 7 records (v[0] to v[6]):
>>> v._FillValue = 999.0
>>> v[8] = ones(5) # will fill v[7] with '999.0' fill values
The same holds true for the other record variables defined in the dataset.
They will all grow in synch when the unlimited dimension length is
extended, and newly created records inside those variables will be set to
their variable fill value.
Quirks
------
_FillValue attribute
pycdf attaches no special significance to the _FillValue attribute, only
the netcdf library does. This can lead to the following nasty problem. If you
write "v._FillValue = 0", pycdf defines a new attribute and deduces its type
from that of the right hand side value, eg an integer type. However, if variable
'v' is defined as being of type real, then the netcdf library (not pycdf!)
will complain about a type mismatch when time comes to initialize
the variable with the fill value, and an exception will be thrown.
To solve the problem, initialize the fill value with a value whose type is explicitly
identical to that of the variable. For example, if variable 'v' is of type real, write
"v._FillValue = 0.0" instead of "v._FillValue = 0".
Also do not forget to set the fill value *before* assigning to the variable.
Functions summary
-----------------
pycdf defines the following functions.
inq_libvers() query netcdf library version
strerror() return the string associated with a netCDF error code
pycdfVersion() query pycdf version string
pycdfArrayPkg() query the array package used at install time
Classes summary
---------------
pycdf defines the following classes.
CDF The CDF class desribes a netCDF dataset. It encapsulates a
netCDF file descriptor (refered to by 'ncid' in the C manual),
and all the netCDF top-level functions (those not dealing with
dimensions, variables or attributes). It contains constructors
to create instances of all those object types.
To create a CDF instance call the CDF() constructor.
methods:
constructors
CDF() open an existing netCDF file or create a new
one, returning a CDF instance
attr() get an existing or define a new dataset attribute,
returning a CDFAttr (attribute) instance
dim() get an existing dimension,
returning a CDFDim (dimension) instance
inq_dimid() equivalent to dim()
def_dim() define a new dimension,
returning a CDFDim (dimension) instance
var() get an existing variable,
returning a CDFVar (variable) instance
inq_varid() equivalent to var()
def_var() define a new variable
returning a CDFVar (variable) instance
dataset manipulation
abort() backout of recent definitions to the dataset
close() close the dataset; this is optional, since a dataset
is automatically closed when its instance variable
goes out of scope (or is reassigned)
automode() activate / deactivate the transparent setting
of the dataset define and data mode.
datamode() enter data mode, ignoring error if already in
this mode
enddef() switch the dataset to data mode
definemode() enter define mode, ignoring error if already
in this mode
redef() switch the dataset to definition mode
sync() synchronize the dataset to disk
dataset inquiry
attributes() get a dictionnary describing the dataset
global attributes
dimensions() get a dictionnary describing the dataset
dimensions
inq() query number of dimensions, variables, global
attributes and id of the unlimited dimension
inq_natts() query number of global attributes
inq_ndims() query number of dimensions
inq_nvars() query number of variables
inq_unlimdim() query id of the unlimited dimension
inq_unlimlen() query the current length of the unlimited dimension
variables() get a dictionnary describing the dataset
variables
misc
set_fill() set fill mode
CDFAttr The CDFAttr class describes a netCDF attribute, either
a variable attribute or a global (dataset) attribute.
It encapsulates the underlying CDF and CDFVar instances,
and the attribute name.
To create a CDFAttr instance, obtain a CDF of CDFVar
instance, and call its attr() method.
methods:
read/write value
get() get the attribute value
put() set the attribute value
inquiry
inq() get the attribute type and number of values
inq_id() get attribute index number
inq_len get attribute number of values
inq_name() get attribute name
inq_type() get attribute type
misc
copy() copy attribute to another variable or dataset
delete() delete attribute
rename() rename attribute
CDFDim The CDFDim class describes a netCDF dimension. It encapsulates
the underlying CDF instance and the dimension index number.
To create a CDFDim instance, obtain a CDF instance
and call one of its dim(), def_dim() or inq_dimid()
methods.
methods:
inquiry
inq() get the dimension name and length
inq_id() get the dimension id
inq_len() get the dimension length
inq_name() get the dimension name
misc
rename() rename dimension
CDFVar The CDFVar class describes a netCDF variable. It encapsulates
the underlying CDF dataset instance, and the variable index
number.
To create a CDFVar instance, obtain a CDF dataset
instance, and call one of its def_var(), var() or
inq_varid() methods.
methods:
constructors
attr() get an existing or create a new variable
attribute, returning a CDFAttr instance
get/set variable value
get() get the netCDF variable contents, totally or
partially; returns a Numeric array
get_1() get a single value form the netCDF variable
put() write a set of values to the variable;
the set can be a Numeric array
put_1() put a single value in the variable
inquiry
attributes() get a dictionnary holding the names and
values of all the variable attributes
dimensions() get the names of the variable dimensions
inq() get variable name, type, dimension index
numbers and number of attributes
inq_dimid() get the dimensions index numbers
inq_name() get the variable name
inq_id() get the variable index number
inq_natts() get the variable number of attributes
inq_ndims() get the variable number of dimensions
inq_type() get the variable type
isrecord() indicates wheter the variable is a record
variable (eg if dimension 0 refers to the
unlimited dimension)
shape() get the lengths of the variable dimensions
misc
rename() rename the variable
CDFMF The CDFMF class describes a pseudo-CDF file constructed by
concatenating inside the pseudo-file record variables which are similarly
defined in 2 or more CDF files. The resulting virtual CDF file is opened
in read-only mode, and thus cannnot be updated. Otherwise, it can
be manipulated similarly to a classic CDF dataset.
For example, we could have a set of files holding similar
(time x lat x lon) variables, where time is
the unlimited dimension. File 1 could hold records at times t0, t1,
t2; file 2 could hold records at times t3, t4; file 3 could hold
records at times t4, t5, etc. The CDFMF class could let us see
those 3 files as a "unified" variable for times t0, t1, t2, t3,
etc. This functionnality lets one break into more manageable pieces
what could otherwise become a huge dataset. A multi-file variable
could also be easier to share with others, and more closely adapt
to reality (eg when data is acquired daily, each day worth of data
can be stored in a separate dataset).
To create a multi-file dataset, call the CDFMF() constructor, with
the sequence of files names as argument. In order to be concatenated,
variables must be based on the same unlimited dimension and be of the same
shape and datatype.
The first file named in a multi-file dataset becomes the "master" file.
It defines the record variables which must be found compatible in all
the other files. The attributes defined on the master variables apply
to the multi-file variables as a whole.
methods: A CDFMF object inherits methods from the CDF class, specialising
the following.
constructors
CDFMF() create a multi-file dataset instance
dim() return a CDFMFDim() instance for the multi-file dataset
var() return a CDFMFVar() instance for the multi-file dataset
query
inq_unlimlen() returns the current length of the unlimited dimension,
summed over all the data files in the multi-file
dataset
CDFMFDim The CDFMFDim class plays a role equivalent to the CDFDim class, this
time for a CDFMF object. The dimension instance applies to the
multi-file dataset, instead of just one dataset.
To create a CDFMFDim intance, call the dim() method of a CDFMF instance,
with the dimension name of id as argument.
methods: A CDFMFDim object inherits methods from CDFDim class, specialising
the followings.
inq() return a tuple holding the name and length of the dimension
(length summed over all datasets for the unlimited dimension)
inq_len() return the dimension length (sommed over all datasets for the
unlimited dimension)
CDFMFVAR The CDFMFVar class plays a rote equivalent to the CDFVar class, this
time for a CDFMF instance. It lets one manipulate a concatenated variable
as if it were one ordinary variable, ignoring file boundaries. A CDFMFVar
instance can be accessed like a "classic" CDFVar instance, using
slicing, indexing, attributes, etc.
To create a CDFMFVar instance, call the var() method of the CDFMF instance.
methods: A CDFMFVar object inherits methods from the CDFVar class, specialising
the following.
query
shape() returns the variable shape, taking into account the total length
of the unlimited dimension
NOTE : there is no CDFMFAttr class. CDFMF and CDFMFVar objects inherit their
attribute handling methods from their superclass, CDF and CDFVar, resp.
NC The NC class defines constants for setting file opening modes,
data Those constants are defined as class attributes.
Constants are named after their C API counterparts.
data types:
NC.BYTE
NC.CHAR
NC.SHORT
NC.INT
NC.FLOAT
NC.DOUBLE
file opening modes:
NC.CREATE (note: specific to pycdf, absent from the C API)
NC.TRUNC (note: specific to pycdf, absent from the C API)
NC.LOCK
NC.SHARE
NC.NOWRITE
NC.WRITE
NC.BIT64_OFFSET (corresponds to C NC_64BIT_OFFSET constant)
dataset fill mode:
NC.FILL
NC.NOFILL
attribute:
NC.GLOBAL
dimension:
NC.UNLIMITED
Examples
--------
The pycdf distribution comes with a few non-trivial example programs
located under directory 'examples'.
compr.py
Shows how to create, index, slice, and update arrays. The results
are compared with what they should be, so this example can
be used to validate the installation.
multi.py
Shows how to use the CDFMF class to handle a multi-file dataset.
cdfstruct
Utility program capable of analysing the structure of any netCDF file
you may want to throw at it. Handy when you want to quickly peek at
the contents of any netCDF file.
txttocdf
Example proram showing how you may convert to netCDF data stored in
flat file format.
Modules | ||||||
|
Classes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Functions | ||
|
Data | ||
__all__ = ['CDF', 'CDFAttr', 'CDFDim', 'CDFError', 'CDFMF', 'CDFVar', 'NC', 'inq_libvers', 'pycdfArrayPkg', 'pycdfVersion', 'strerror'] |