casacore
|
Modules | |
Tables_module_internal_classes | |
Internal Tables_module classes and functions. | |
CTDS (Casacore Table Data System) is the data storage mechanism for Casacore
See below for an overview of the classes in this module.
Public interface
"Table" is a formal term from relational database theory: "The organizing principle in a relational database is the TABLE, a rectangular, row/column arrangement of data values." Casacore tables are extensions to traditional tables, but are similar enough that we use the same name. There is also a strong resemblance between the uses of Casacore tables, and FITS binary tables, which provides another reason to use "Tables" to describe the Casacore data storage mechanism.
Tables are the fundamental storage mechanism for Casacore. This document explains why they had to be made, what their properties are, and how to use them. The last subject is discussed and illustrated in a sequence of sections:
A few applications exist to inspect and manipulate a table.
Several UML diagrams describe the class structure of the Tables module.
The Casacore tables are mainly based upon the ideas of Allen Farris, as laid out in the AIPS++ Database document, from where the following paragraph is taken:
Traditional relational database tables have two features that decisively limit their applicability to scientific data. First, an item of data in a column of a table must be atomic – it must have no internal structure. A consequence of this restriction is that relational databases are unable to deal with arrays of data items. Second, an item of data in a column of a table must not have any direct or implied linkages to other items of data or data aggregates. This restriction makes it difficult to model complex relationships between collections of data. While these restrictions may make it easy to define a mathematically complete set of data manipulation operations, they are simply intolerable in a scientific data-handling context. Multi-dimensional arrays are frequently the most natural modes in which to discuss and think about scientific data. In addition, scientific data often requires complex calibration operations that must draw on large bodies of data about equipment and its performance in various states. The restrictions imposed by the relational model make it very difficult to deal with complex problems of this nature.
In response to these limitations, and other needs, the Casacore tables were designed.
Casacore tables have the following properties:
table.endianformat
which defaults to Table::LocalEndian
(the endian format of the machine being used when creating the table). Tables can be in one of four forms:
Concurrent access from different processes to the same plain table is fully supported by means of a locking/synchronization mechanism. Concurrent access over NFS is also supported.
A (somewhat primitive) mechanism is available to do a table lookup based on the contents of a key.
To open an existing table you just create a Table object giving the name of the table, like:
The constructor option determines whether the table will be opened as readonly or as read/write. A readonly table file must be opened as readonly, otherwise an exception is thrown. The functions Table::isWritable(...) can be used to determine if a table is writable.
When the table is opened, the data managers are reinstantiated according to their definition at table creation.
The static function TableUtil::openTable
can be used to open a table, in particular a subtable, in a simple way by means of the :: notation like maintable::subtable
. The :: notation is much better than specifying an explicit path (such as maintable/subtable
, because it also works fine if the main table is a reference table (e.g. the result of a selection).
You can read data from a table column with the "get" functions in the classes ScalarColumn<T> and ArrayColumn<T>. For scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could instead use TableColumn::getScalar(...) or TableColumn::asXXX(...). These functions offer an extra: they do automatic data type promotion; so that you can, for example, get a double value from a float column.
These "get" functions are used in the same way as the simple "put" functions described in the previous section.
ScalarColumn<T> can be constructed for a non-writable column. However, an exception is thrown if the put function is used for it. The same is true for ArrayColumn<T> and TableColumn.
A typical program could look like:
The creation of a table is a multi-step process:
The recipe above is meant for the creation a plain table, but the creation of a memory table is exactly the same. The only difference is that in call to construct the Table object the Table::Memory type has to be given. Note that in the SetupNewTable object the columns can be bound to any data manager. MemoryTable
will rebind stored columns to the MemoryStMan storage manager, but virtual columns bindings are not changed.
The following example shows how you can create a table. An example specifically illustrating the creation of the table description is given in that section. Other sections discuss the access to the table.
To create a table in memory, only step 6 has to be modified slightly to:
Note that the function TableUtil::createTable
can be used to create a table in a simpler way. It can also be used to create a subtable using the :: notation similar to the Tableutil::openTable
function described above.
Once a table has been created or has been opened for read/write, you want to write data into it. Before doing that you may have to add one or more rows to the table.
Tip: If a table was created with a given number of rows, you do not need to add rows; you may not even be able to do so;
When adding new rows to the table, either via the Table(...) constructor or via the Table::addRow(...) function, you can choose to have those rows initialized with the default values given in the description.
To actually write the data into the table you need the classes ScalarColumn<T> and ArrayColumn<T>. For each column you can construct one or more of these objects. Their put(...) functions let you write a value at a time or the entire column in one go. For arrays you can "put" subsections of the arrays.
As an alternative for scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could use the functions TableColumn::putScalar(...). These functions offer an extra: automatic data type promotion; so that you can, for example, put a float value in a double column.
A typical program could look like:
In this example we added rows in the for loop, but we could also have created 10 rows straightaway by constructing the Table object as:
in which case we would not include
The classes TableColumn, ScalarColumn<T>, and ArrayColumn<T> contain several functions to put values into a single cell or into the whole column. This may look confusing, but is actually quite simple. The functions can be divided in two groups:
Put the given value into the column cell(s).
Apart from accessing a table column-wise as described in the previous two sections, it is also possible to access a table row-wise. The TableRow class makes it possible to access multiple fields in a table row as a whole. Note that like the XXColumn classes described above, there is also an ROTableRow class for access to readonly tables.
On construction of a TableRow object it has to be specified which fields (i.e. columns) are part of the row. For these fields a fixed structured TableRecord object is constructed as part of the TableRow object. The TableRow::get function will fill this record with the table data for the given row. The user has access to the record and can use RecordFieldPtr objects for speedier access to the record.
The class could be used as shown in the following example.
The description of TableRow contains some more extensive examples.
The result of a select and sort of a table is another table, which references the original table. This means that an update of a sorted or selected table results in the update of the original table. The result is, however, a table in itself, so all table functions (including select and sort) can be used with it. Note that a true copy of such a reference table can be made with the Table::deepCopy function.
Rows or columns can be selected from a table. Columns can be selected by the Table::project(...) function, while rows can be selected by the various Table operator() functions. Usually a row is selected by giving a select expression with TableExprNode objects. These objects represent the various nodes in an expression, e.g. a constant, a column, or a subexpression. The Table function Table::col(...) creates a TableExprNode object for a column. The function Table::key(...) does the same for a keyword by reading the keyword value and storing it as a constant in an expression node. All column nodes in an expression must belong to the same table, otherwise an exception is thrown. In the following example we select all rows with RA>10:
while in the next one we select rows with RA and DEC in the given intervals:
The following operators can be used to form arbitrarily complex expressions:
Many functions (like sin, max, conj) can be used in an expression. Class TableExprNode shows the available functions. E.g.
Function in
can be used to select from a set of values. A value set can be constructed using class TableExprNodeSet.
select rows with a NAME equal to abc
, defg
, or h
.
You can sort a table on one or more columns containing scalars. In this example we simply sort on column RA (default is ascending):
Multiple Table::sort(...) functions exist which allow for more flexible control over the sort order. In the next example we sort first on RA in descending order and then on DEC in ascending order:
Tables stemming from the same root, can be combined in several ways with the help of the various logical Table operators (operator|, etc.).
The selection and sorting mechanism described above can only be used in a hard-coded way in a C++ program. There is, however, another way. Strings containing selection and sorting commands can be used. The syntax of these commands is based on SQL and is described in the Table Query Language (TaQL) note 199. The language supports UDFs (User Defined Functions) in dynamically loadable libraries as explained in the note.
A TaQL command can be executed with the static function tableCommand
defined in class TableParse.
Tables with identical descriptions can be concatenated in a virtual way using the Table concatenation constructor. Such a Table object behaves as any other Table object, thus any operation can be performed on it. An identical description means that the number of columns, the column names, and their data types of the columns must be the same. The columns do not need to be ordered in the same way nor to be stored in the same way.
Note that if tables have different column names, it is possible to form a projection (as described in the previous section) first to make them appear identical.
Sometimes a MeasurementSet is partitioned, for instance in chunks of one hour. All those chunks can be virtually concatenated this way. Note that all tables in the concatenation will be opened, thus one might run out of file descriptors if there are many chunks.
Similar to reference tables, it is possible to make a concatenated Table persistent by using the rename
function. It will not copy the data; only the names of the tables used are written.
The keywords of a concatenated table are taken from the first table. It is possible to change or add keywords, but that is not persistent, not even if the concatenated table is made persistent.
The keywords holding subtables can be handled in a special way. Normally the subtables of the concatenation are the subtables of the first table are used, but is it possible to concatenate subtables as well by giving their names in the constructor. In this way the, say, SYSCAL subtable of a MeasurementSet can be concatenated as well.
You can iterate through a table in an arbitrary order by getting a subset of the table consisting of the rows in which the iteration columns have the same value. An iterator object is created by constructing a TableIterator object with the appropriate column names.
In the next example we define an iteration on the columns Time and Baseline. Each iteration step returns a table subset in which Time and Baseline have the same value.
You can define more than one iterator on the same table; they operate independently.
Note that the result of each iteration step is a table in itself which references the original table, just as in the case of a sort or select. This means that the resulting table can be used again in a sort, select, iteration, etc..
A table vector makes it possible to treat a column in a table as a vector. Almost all operators and functions defined for normal vectors, are also defined for table vectors. So it is, for instance, possible to add a constant to a table vector. This has the effect that the underlying column gets changed.
You can use the templated class TableVector to make a scalar column appear as a (table) vector. Columns containing arrays or tables are not supported. The data type of the TableVector object must match the data type of the column. A table vector can also hold a normal vector so that (temporary) results of table vector operations can be handled.
In the following example we double the data in column COL1 and store the result in a temporary table vector.
In the next example we double the data in COL1 and put the result back in the column.
Any number of keyword/value pairs may be attached to the table as a whole, or to any individual column. They may be freely added, retrieved, re-assigned, or deleted. They are, in essence, a self-resizing list of values (any of the primitive types) indexed by Strings (the keyword).
A table keyword/value pair might be
Column keyword/value pairs might be
The class TableRecord represents the keywords in a table. It is (indirectly) derived from the standard record classes in the class Record
A table contains a description of itself, which defines the layout of the columns and the keyword sets for the table and for the individual columns. It may also define initial keyword sets and default values for the columns. Such a default value is automatically stored in a cell in the table column, whenever a row is added to the table.
The creation of the table descriptor is the first step in the creation of a new table. The description is part of the table itself, but may also exist in a separate file. This is useful if you need to create a number of tables with the same structure; in other circumstances it probably should be avoided.
The public classes to set up a table description are:
Here follows a typical example of the construction of a table description. For more specialized things – like the definition of a default data manager – we refer to the descriptions of the above mentioned classes.
Data managers take care of the actual access to the data in a column. There are two kinds of data managers:
In general the user of a table does not need to be aware which data managers are being used underneath. Only when the table is created data managers have to be bound to the columns. Thereafter it is completely transparent.
Data managers needs to be registered, so they can be found when a table is opened. All data managers mentioned below are part of the system and pre-registered. It is, however, also possible to load data managers on demand. If a data manager is not registered it is tried to load a shared library with the part of the data manager name (in lowercase) before a dot or left arrow. The dot makes it possible to have multiple data managers in a shared library, while the left arrow is meant for templated data manager classes.
E.g. if BitFlagsEngine<uChar>
was not registered, the shared library libbitflagsengine.so
(or.dylib) will be loaded. If successful, its function register_bitflagsengine()
will be executed which should register the data manager(s). Thereafter it is known and will be used. For example in a file Register.h and Register.cc:
There are several functions that can give information which data managers are used for which columns and to obtain the characteristics and properties of them. Class RODataManAccessor and derived classes can be used for it as well as the functions dataManagerInfo
and showStructure
in class Table.
Storage managers are used to store the data contained in the column cells. At table construction time the binding of columns to storage managers is done.
Each storage manager uses one or more files (usually called table.fi_xxx where i is a sequence number and _xxx is some kind of extension). Typically several file are used to store the data of the columns of a table.
In order to reduce the number of files (and to support large block sizes), it is possible to have a single container file (a MultiFile) containing all data files used by the storage managers. Such a file is called table.mf. Note that the program lsmf can be used to see which files are contained in a MultiFile. The program tomf can convert the files in a MultiFile to regular files.
At table creation time it is decided if a MultiFile will be used. It can be done by means of the StorageOption object given to the SetupNewTable constructor and/or by the aipsrc variables:
table.storage.option
which can have the value 'multifile', 'sepfile' (meaning separate files), or 'default'. Currently the default is to use separate files. table.storage.blocksize
defines the block size to be used by a MultiFile. If 0 is given, the file system's block size will be used. About all standard storage managers support the MultiFile. The exception is StManAipsIO, because it is hardly ever used.
Several storage managers exist, each with its own storage characteristics. The default and preferred storage manager is StandardStMan
. Other storage managers should only be used if they pay off in file space (like IncrementalStMan
for slowly varying data) or access speed (like the tiled storage managers for large data arrays).
The storage managers store the data in a big or little endian canonical format. The format can be specified when the table is created. By default it uses the endian format as specified in the aipsrc variable table.endianformat
which can have the value local, big, or little. The default is local.
StandardStMan stores all the values in so-called buckets (equally sized chunks in the file). It requires little memory.
It replaces the old StManAipsIO
.
IncrementalStMan uses a storage mechanism resembling "incremental backups". A value is only stored if it is different from the previous row. It is very well suited for slowly varying data.
The class ROIncrementalStManAccessor can be used to tune the behaviour of the IncrementalStMan
. It contains functions to deal with the cache size and to show the behaviour of the cache.
The Tiled Storage Managers store the data as a tiled hypercube allowing for more or less equally efficient data access along all main axes. It can be used for UV-data as well as for image data.
StManAipsIO uses AipsIO
to store the data in the columns. It supports all table functionality, but its I/O is probably not as efficient as other storage managers. It also requires that a large part of the table fits in memory.
It should not be used anymore, because it uses a lot of memory for larger tables and because it is not very robust in case an application or system crashes.
MemoryStMan holds the data in memory. It means that data 'stored' with this storage manager are NOT persistent.
This storage manager is primarily meant for tables held in memory, but it can also be useful for temporary columns in normal tables. Note, however, that if a table is accessed concurrently from multiple processes, MemoryStMan data cannot be synchronized.
dyscostman::DyscoStMan is a class that stores data with lossy compression. It combines non-linear least-squares quantization and different kinds of normalizaton. With the typical factor of 4 compression, the loss in accuracy from lossy compression is negligable. It should only be used for real (non-simulated) data that is in a Measurement Set. The method is described in this article: https://arxiv.org/abs/1609.02019.
The storage manager framework makes it possible to support arbitrary files as tables. This has been used in a case where a file is filled by the data acquisition system of a telescope. The file is simultaneously used as a table using a dedicated storage manager. The table system and storage manager provide a sync function to synchronize the processes, i.e. to make CTDS aware of changes in the file size (thus in the table size) by the filling process.
Tip: Not all data managers support all the table functionality; So, the choice of a data manager can greatly influence the type of operations you can do on the table as a whole; For example, if a column uses the tiled storage manager, it is not possible to delete rows from the table, because that storage manager will not support deletion of rows; However, it is always possible to delete all columns of a data manager in one single call;
The Tiled Storage Managers allow one to store the data of one or more columns in a tiled way. Tiling means that the data are stored without a preferred order to make access along the different main axes equally efficient. This is done by storing the data in so-called tiles (i.e. equally shaped subsets of an array) to increase data locality. The user can define the tile shape to optimize for the most frequently used access.
The Tiled Storage Manager has the following properties:
Each Tiled Storage Manager can store an N-dimensional so-called hypercolumn. Elaborate hypercolumns can be defined using
TableDesc::defineHypercolumn).
Note that defining a hypercolumn is only necessary if it contains multiple columns or if the TiledDataStMan is used. It means that in practice it is hardly ever needed to define a hypercolumn.
A hypercolumn consists of up to three types of columns:
The following Tiled Storage Managers are available:
TiledDataStMan
by using the array shape as the id value. Similarly to TiledDataStMan
it can maintain multiple hypercubes and store multiple rows in a hypercube, but it is easier to use, because the special addHypercube
and extendHypercube
functions are not needed. An hypercube is automatically added when a new array shape is encountered. allows one to control the creation and extension of hypercubes. This is done by means of the class
TiledDataStManAccessor. It makes it possible to store, say, row 0-9 in hypercube A, row 10-34 in hypercube B, row 35-54 in hypercube A again, etc..
The drawback of this storage manager is that its hypercubes are not automatically extended when adding new rows. The special functions addHypercube
and extendHypercube
have to be used making it somewhat tedious to use. Therefore this storage manager may become obsolete in the near future.
The Tiled Storage Managers have 3 ways to access and cache the data. Class TSMOption can be used to setup an access choice and use it in a Table constructor.
Apart from reading, all access ways described above can also handle writing and extending tables. They create fully equal files. Both little and big endian data can be read or written.
Virtual column engines are used to implement the virtual (i.e. calculated-on-the-fly) columns. CTDS provides an abstract base class (or "interface class") VirtualColumnEngine that specifies the protocol for these engines. The programmer must derive a concrete class to implement the application-specific virtual column.
For example: the programmer needs a column in a table which is the difference between two other columns. (Perhaps these two other columns are updated periodically during the execution of a program.) A good way to handle this would be to have a virtual column in the table, and write a virtual column engine which knows how to calculate the difference between corresponding cells of the two other columns. So the result is that accessing a particular cell of the virtual column invokes the virtual column engine, which then gets the values from the other two columns, and returns their difference. This particular example could be done using VirtualTaQLColumn.
Several virtual column engines exist:
The class
ForwardColumnEngine forwards the gets and puts on a row in a column to the same row in a column with the same name in another table. This provides a virtual copy of the referenced column.
The class
ForwardColumnIndexedRowEngine is similar to ForwardColumnEngine.
. However, instead of forwarding it to the same row it uses a a column to map its row number to a row number in the referenced table. In this way multiple rows can share the same data. This data manager only allows for get operations.
To handle arbitrary data types the templated abstract base class VSCEngine has been written. An example of how to use this class can be found in the demo program dVSCEngine.cc
.
Multiple concurrent readers and writers (also via NFS) of a table are supported by means of a locking/synchronization mechanism. This mechanism is not very sophisticated in the sense that it is very coarsely grained. When locking, the entire table gets locked. A special lock file is used to lock the table. This lock file also contains some synchronization data.
Five ways of locking are supported (see class TableLock):
Table::resync
. lock
and unlock
have to be used to acquire and release a (read or write) lock. Synchronization of the processes accessing the same table is done by means of the lock file. When a lock is released, the storage managers flush their data into the table files. Some synchronization data is written into the lock file telling the new number of table rows and telling which storage managers have written data. This information is read when another process acquires the lock and is used to determine which storage managers have to refresh their internal caches.
Note that for the NoReadLocking modes (see above) explicit synchronization might be needed using Table::resync
.
The function Table::hasDataChanged
can be used to check if a table is (being) changed by another process. In this way a program can react on it. E.g. the table browser can refresh its screen when the underlying table is changed.
In general the default locking option will do. From the above it should be clear that heavy concurrent access results in a lot of flushing, thus will have a negative impact on performance. If uninterrupted access to a table is needed, the PermanentLocking
option should be used. If transaction-like processing is done (e.g. updating a table containing an observation catalogue), the UserLocking
option is probably best.
Creation or deletion of a table is not possible if that table is still open in another process. The function Table::isMultiUsed()
can be used to check if a table is open in other processes.
The function TableUtil::deleteTable
should be used to delete a table. Before deleting the table it ensures that it is writable and that it is not open in the current or another process.
The following example wants to read the table uninterrupted, thus it uses the PermanentLocking
option. It also wants to wait until the lock is actually acquired. Note that the destructor closes the table and releases the lock.
The following example uses the automatic locking.. It tells the system to check about every 20 seconds if another process wants access to the table.
The following example gets data (say from a GUI) and writes it as a row into the table. The lock the table as little as possible the lock is acquired just before writing and released immediately thereafter.
The following example deletes a table if it is not used in another process.
Class ColumnsIndex offers the user a means to find the rows matching a given key or key range. It is a somewhat primitive replacement of a B-tree index and in the future it may be replaced by a proper B+-tree implementation.
The ColumnsIndex
class makes it possible to build an in-core index on one or more columns. Looking a key or key range is done using a binary search on that index. It returns a vector containing the row numbers of the rows matching the key (range).
The class is not capable of tracing changes in the underlying column(s). It detects a change in the number of rows and updates the index accordingly. However, it has to be told explicitly when a value in the underlying column(s) changes.
The following example shows how the class can be used.
Suppose one has an antenna table with key ANTENNA.
ColumnsIndex itself contains a more advanced example. It shows how to use a private compare function to adjust the lookup if the index does not contain single key values, but intervals instead. This is useful if a row in a (sub)table is valid for, say, a time range instead of a single timestamp.
CTDS resembles a database system, but it is not as robust. It lacks the transaction and logging facilities common to data base systems. It means that in case of a crash data might be lost. To reduce the risk of data loss to a minimum, it is advisable to regularly do a flush
, optionally with an fsync
to ensure that all data are really written. However, that can degrade the performance because it involves extra writes. So one should find the right balance between robustness and performance.
To get a good feeling for the performance issues, it is important to understand some of the internals of CTDS.
The storage managers drive the performance. All storage managers use buckets (called tiles for the TiledStMan) which contain the data. All IO is done by bucket. The bucket/tile size is defined when creating the storage manager objects. Sometimes the default will do, but usually it is better to set it explicitly.
It is best to do a flush when a tile is full. For example:
When creating a MeasurementSet containing N antennae (thus N*(N-1) baselines or N*(N+1) if auto-correlations are stored as well) it makes sense to store, say, N/2 rows in a tile and do a flush each time all baselines are written. In that way tiles are fully filled when doing the flush, so no extra IO is involved.
Here is some code showing this when creating a MeasurementSet. The code should speak for itself.
Which storage managers to use and how to use them depends heavily on the type of data and the access patterns to the data. Here follow some guidelines:
Several forms of tracing can be done to see how the Table I/O performs.
strace
command can be used to collect trace information about the physical IO. showCacheStatistics
in class TiledStManAccessor can be used to show the number of actual reads and writes and the percentage of cache hits. aipsrc
variables can be used. table.trace.filename
specifies the file to write the trace output to. If not given or empty, no tracing will be done. The file name can contain environment variables or a tilde. table.trace.operation
specifies the operations to be traced. It is a string containing s, r, and/or w where s means tracing RefTable construction (selection/sort), r means column reads, and w means column writes. If empty, only the high level table operations (open, create, close) will be traced. table.trace.columntype
specifies the types of columns to be traced. It is a string containing the characters s, a, and/or r. s means all scalar columns, a all array columns, and r all record columns. If empty and if table.trace.column
is empty, its default value is a. table.trace.column
specifies names of columns to be traced. Its value can be one or more glob-like patterns separated by commas without any whitespace. The default is empty. For example: oper
). reftable
means that the operation is on a RefTable (thus result of selection, sort, projection, or iteration).