casacore
Loading...
Searching...
No Matches
MultiFile.h
Go to the documentation of this file.
1//# MultiFile.h: Class to combine multiple files in a single one
2//# Copyright (C) 2014
3//# Associated Universities, Inc. Washington DC, USA.
4//#
5//# This library is free software; you can redistribute it and/or modify it
6//# under the terms of the GNU Library General Public License as published by
7//# the Free Software Foundation; either version 2 of the License, or (at your
8//# option) any later version.
9//#
10//# This library is distributed in the hope that it will be useful, but WITHOUT
11//# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
12//# FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public
13//# License for more details.
14//#
15//# You should have received a copy of the GNU Library General Public License
16//# along with this library; if not, write to the Free Software Foundation,
17//# Inc., 675 Massachusetts Ave, Cambridge, MA 02139, USA.
18//#
19//# Correspondence concerning AIPS++ should be addressed as follows:
20//# Internet email: casa-feedback@nrao.edu.
21//# Postal address: AIPS++ Project Office
22//# National Radio Astronomy Observatory
23//# 520 Edgemont Road
24//# Charlottesville, VA 22903-2475 USA
25
26#ifndef CASA_MULTIFILE_H
27#define CASA_MULTIFILE_H
28
29//# Includes
30#include <casacore/casa/aips.h>
31#include <casacore/casa/IO/MultiFileBase.h>
32#include <memory>
33
34namespace casacore { //# NAMESPACE CASACORE - BEGIN
35
36 //# Forward declarations.
37 class ByteIO;
38 class CanonicalIO;
39 class MemoryIO;
40
41
42 // <summary>
43 // Class to combine multiple files in a single one.
44 // </summary>
45
46 // <use visibility=export>
47
48 // <reviewed reviewer="" date="" tests="tMultiFile" demos="">
49 // </reviewed>
50
51 // <synopsis>
52 // This class (derived from MultiFileBase) is a container file holding
53 // multiple virtual files in a regular file.
54 // It is primarily meant as a container file for the storage manager files
55 // of a table to reduce the number of files used (especially for Lustre) and
56 // to reduce the number of open files (especially when concatenating tables).
57 // <br> MultiFile has the following properties:
58 // <ul>
59 // <li> It can choose an IO buffer size that matches the file system well
60 // (e.g., to support a large buffer size on ZFS or Lustre).
61 // <li> O_DIRECT (if supported by the OS) can be used to tell the OS kernel
62 // to bypass its file cache. It does not speed up the I/O, but it makes
63 // I/O behaviour more predictable which a real-time system might need.
64 // <li> Often the data to be read from MultiFile will not exactly match the
65 // block size and offset. MultiFile will buffer the data and copy the
66 // part that is needed (similar to stdio). However, when matching
67 // block size and offset are used, data will directly be read into the
68 // user's buffer to achieve zero-copy behaviour.
69 // <li> It is possible to nest MultiFile's. Thus a MultiFile can be a file
70 // in a parent MultiFile. In this way it is easily possible to store
71 // a main table and its subtables (such as an MS) in a single file.
72 // <li> Optionally each block is stored with a 32-bit CRC to check if the
73 // data in a block are correctly read. The CRC values are stored as
74 // part of the header, thus not in each individual block. This is done
75 // to make the zero-copy behaviour possible (as described above).
76 // <li> The header and the index are stored in the first block. If too large,
77 // continuation blocks are used. There are two sets of continuation
78 // blocks between which is alternated. This is done for robustness
79 // purposes; there is always a valid one in case of a crash in the
80 // middle of writing the continuation blocks. Note that the first
81 // header block is written after the continuation blocks, so it always
82 // points to a valid set of continuation blocks.
83 // </ul>
84 //
85 // The SetupNewTable constructor has a StorageOption argument to define
86 // if a MultiFile has to be used and if so, the buffer size to use.
87 // It is also possible to specify that through aipsrc variables.
88 //
89 // A virtual file is spread over multiple (fixed size) data blocks in the
90 // MultiFile. A data block is never shared by multiple files.
91 // For each virtual file MultiFile keeps a MultiFileInfo object telling
92 // the file size and the block numbers used for the file. When flushing
93 // the MultiFile, this meta info is written into the header block. If it
94 // does not fit in the header block, the rest is written in continuation blocks.
95 // On open and resync, it is read back. There are two sets of continuation
96 // blocks which are alternately used when the header is written. This is done
97 // to have a valid header in case of a crash in the middle of writing the header.
98 //
99 // A virtual file is represented by an MFFileIO object, which is derived
100 // from ByteIO and as such part of the casacore IO framework. It makes it
101 // possible for applications to access a virtual file in the same way as
102 // a regular file.
103 //
104 // It is possible to delete a virtual file. Its blocks will be added to
105 // the free block list (which is also stored in the meta info).
106 // The MultiFile is truncated when blocks are deleted at the end of the file.
107 // </synopsis>
108
109 // <example>
110 // In principle it is possible to use the MultiFile functions directly.
111 // However, in general it is much easier to use an MFFileIO object
112 // per virtual file as shown below.
113 // <srcblock>
114 // // Create a new MultiFile using a block size of 1 MB.
115 // std::shared_ptr<MultiFileBase> mfile
116 // (new MultiFile("file.mf", ByteIO::New, 1048576));
117 // // Create a virtual file in it.
118 // MFFileIO mf1(mfile, "mf1", ByteIO::New);
119 // // Use it (for example) as the sink of AipsIO.
120 // AipsIO stream (&mf1);
121 // // Write values.
122 // stream << (Int)10;
123 // stream << True;
124 // // Seek to beginning of file and read data in.
125 // stream.setpos (0);
126 // Int vali;
127 // Bool valb;
128 // stream >> vali >> valb;
129 // </srcblock>
130 // </example>
131
132 // <todo>
133 // <li> MultiFile can be optimized how cont.blocks are used. In case of
134 // file truncation, it could check if only cont.blocks are present
135 // after the blocks to be removed. In such a case they can be moved
136 // backwards. Also the nr of cont.blocks can shrink. In such a case
137 // the unused blocks are not added to the free list. Only the nr of
138 // actually used cont.blocks is decremented. They could be added to
139 // the free list later.
140 // The reason for above is that the free list is written into the
141 // header blocks before the required nr of continuation blocks is known.
142 // <li> Keep a journal file telling which files are created and which
143 // blocks are allocated for a virtual file.
144 // </todo>
145
147 {
148 public:
149 // Open or create a MultiFile with the given name.
150 // Upon creation the block size can be given. If 0, it uses the block size
151 // of the file system the file is on.
152 // <br>If useODirect=True, the O_DIRECT flag is used (if supported).
153 // It tells the kernel to bypass its file cache to have more predictable
154 // I/O behaviour.
155 // <br>If useCRC=True, 32-bit CRC values are calculated and stored for
156 // each data block. Note that useCRC is only used for new files.
158 Bool useODirect=False, Bool useCRC=False);
159
160 // Open or create a MultiFile with the given name which is nested in the
161 // given parent. Thus data are read/written in the parent file.
162 // Upon creation the block size can be given. If 0, it uses the block size
163 // of the parent.
164 explicit MultiFile (const String& name,
165 const std::shared_ptr<MultiFileBase>& parent,
167
168 // The destructor flushes and closes the file.
169 ~MultiFile() override;
170
171 // Copy constructor and assignment not possible.
172 MultiFile (const MultiFile&) = delete;
173 MultiFile& operator= (const MultiFile&) = delete;
174
175 // Make a nested MultiFile.
176 std::shared_ptr<MultiFileBase> makeNested
177 (const std::shared_ptr<MultiFileBase>& parent, const String& name,
178 ByteIO::OpenOption, Int blockSize) const override;
179
180 // Reopen the underlying file for read/write access.
181 // Nothing will be done if the file is writable already.
182 // Otherwise it will be reopened and an exception will be thrown
183 // if it is not possible to reopen it for read/write access.
184 void reopenRW() override;
185
186 // Fsync the file (i.e., force the data to be physically written).
187 void fsync() override;
188
189 // Show some info.
190 void show (std::ostream&) const;
191
192 // Compress a block index by looking for subsequent block numbers.
193 static std::vector<Int64> packIndex (const std::vector<Int64>& blockNrs);
194
195 // Decompress a block index by inserting subsequent block numbers.
196 static std::vector<Int64> unpackIndex (const std::vector<Int64>& blockNrs);
197
198 private:
199 // Initialize the MultiFile object.
201 // Read the file info for the new version 2.
202 void getInfoVersion2 (Int64 contBlockNr, CanonicalIO& aio);
203 // Write a vector of Int64.
204 void writeVector (CanonicalIO& cio, const std::vector<Int64>& index);
205 void writeVector (CanonicalIO& cio, const std::vector<uInt>& index);
206 // Read a vector of Int64.
207 void readVector (CanonicalIO& cio, std::vector<Int64>& index);
208 void readVector (CanonicalIO& cio, std::vector<uInt>& index);
209 // Write the remainder of the header (in case exceeding 1 block).
210 // <src>iobuf</src> should be large enough
212 // Read the remainder of the header into the buffer.
213 void readRemainder (Int64 headerSize, Int64 blockNr, std::vector<char>& buf);
214 // Truncate the file if blocks are freed at the end.
216 // Header writing hooks (meant for derived test classes).
217 virtual void writeHeaderShow (Int64 ncont, Int64 todo) const;
218 virtual void writeHeaderTest();
219 // </group>
220
221 // Do the class-specific actions on opening a file.
222 void doOpenFile (MultiFileInfo&) override;
223 // Do the class-specific actions on closing a file.
224 void doCloseFile (MultiFileInfo&) override;
225 // Do the class-specific actions on adding a file.
226 void doAddFile (MultiFileInfo&) override;
227 // Do the class-specific actions on deleting a file.
228 void doDeleteFile (MultiFileInfo&) override;
229 // Truncate the file to <src>nrblk</src> blocks.
230 void doTruncateFile (MultiFileInfo& info, uInt64 nrblk) override;
231 // Flush the file itself.
232 void doFlushFile() override;
233 // Flush and close the file.
234 void close() override;
235 // Write the header info.
236 void writeHeader() override;
237 // Read the header info. If always==False, the info is only read if the
238 // header counter has changed.
239 void readHeader (Bool always=True) override;
240 // Extend the virtual file to fit lastblk.
241 void extend (MultiFileInfo& info, Int64 lastblk) override;
242
243 protected:
244 // Store the CRC of a data block in the index.
245 void storeCRC (const void* buffer, Int64 blknr);
246 // Check the CRC of a data block read.
247 void checkCRC (const void* buffer, Int64 blknr) const;
248 // Calculate the CRC of a data block.
249 uInt calcCRC (const void* buffer, Int64 size) const;
250 // Extend the virtual file to fit lastblk.
251 // Optionally the free blocks are not used.
252 virtual void extendVF (MultiFileInfo& info, Int64 lastblk, Bool useFreeBlocks);
253 // Write a data block.
255 const void* buffer) override;
256 // Read a data block.
258 void* buffer) override;
259 // Read the version 1 header.
260 void readHeaderVersion1 (Int64 headerSize, std::vector<char>& buf);
261 // Read the version 2 and higher header.
262 void readHeaderVersion2 (std::vector<char>& buf);
263
264 //# Data members
265 // Define two continuation sets where the header overflow can be stored
267 uInt itsNrContUsed[2]; // nr of cont.blocks actually used
268 uInt itsHdrContInx; // Continuation set last used (0 or 1)
270 std::vector<uInt> itsCRC; // CRC value per block (empty if useCRC=False)
271 std::unique_ptr<ByteIO> itsIO; // A regular file or nested MFFileIO
272 };
273
274
275
276} //# NAMESPACE CASACORE - END
277
278#endif
OpenOption
Define the possible ByteIO open options.
Definition ByteIO.h:63
Abstract base class to combine multiple logical files in a single one.
virtual void doCloseFile(MultiFileInfo &)=0
Do the class-specific actions on closing a logical file.
Int64 blockSize() const
Get the block size used.
virtual void extend(MultiFileInfo &info, Int64 lastblk)=0
Extend a logical file to fit lastblk.
virtual void doFlushFile()=0
Flush the container file.
virtual void writeHeader()=0
Write the header info.
virtual void readHeader(Bool always=True)=0
Read the header info.
virtual void doTruncateFile(MultiFileInfo &info, uInt64 nrblk)=0
Truncate the container file to nrblk blocks.
const std::vector< MultiFileInfo > & info() const
Get the info object (for test purposes mainly).
virtual void doOpenFile(MultiFileInfo &)=0
Do the class-specific actions on opening a logical file.
Bool useODirect() const
Is O_DIRECT used?
virtual void doDeleteFile(MultiFileInfo &)=0
Do the class-specific actions on deleting a logical file.
virtual void writeBlock(MultiFileInfo &info, Int64 blknr, const void *buffer)=0
Write a data block of a logical file into the container file.
virtual void readBlock(MultiFileInfo &info, Int64 blknr, void *buffer)=0
Read a data block of a logical file from the container file.
virtual void close()=0
Flush and close the container file.
virtual void doAddFile(MultiFileInfo &)=0
Do the class-specific actions on adding a logical file.
void readRemainder(Int64 headerSize, Int64 blockNr, std::vector< char > &buf)
Read the remainder of the header into the buffer.
void fsync() override
Fsync the file (i.e., force the data to be physically written).
static std::vector< Int64 > packIndex(const std::vector< Int64 > &blockNrs)
Compress a block index by looking for subsequent block numbers.
void writeVector(CanonicalIO &cio, const std::vector< Int64 > &index)
Write a vector of Int64.
MultiFile(const String &name, ByteIO::OpenOption, Int blockSize=0, Bool useODirect=False, Bool useCRC=False)
Open or create a MultiFile with the given name.
MultiFile(const String &name, const std::shared_ptr< MultiFileBase > &parent, ByteIO::OpenOption, Int blockSize=0)
Open or create a MultiFile with the given name which is nested in the given parent.
void readVector(CanonicalIO &cio, std::vector< Int64 > &index)
Read a vector of Int64.
void init(ByteIO::OpenOption option)
Initialize the MultiFile object.
MultiFile & operator=(const MultiFile &)=delete
std::shared_ptr< MultiFileBase > makeNested(const std::shared_ptr< MultiFileBase > &parent, const String &name, ByteIO::OpenOption, Int blockSize) const override
Make a nested MultiFile.
static std::vector< Int64 > unpackIndex(const std::vector< Int64 > &blockNrs)
Decompress a block index by inserting subsequent block numbers.
virtual void writeHeaderShow(Int64 ncont, Int64 todo) const
Header writing hooks (meant for derived test classes).
void reopenRW() override
Reopen the underlying file for read/write access.
~MultiFile() override
The destructor flushes and closes the file.
void show(std::ostream &) const
Show some info.
void writeRemainder(MemoryIO &mio, CanonicalIO &, MultiFileBuffer &mfbuf)
Write the remainder of the header (in case exceeding 1 block).
MultiFile(const MultiFile &)=delete
Copy constructor and assignment not possible.
void getInfoVersion2(Int64 contBlockNr, CanonicalIO &aio)
Read the file info for the new version 2.
void writeVector(CanonicalIO &cio, const std::vector< uInt > &index)
virtual void writeHeaderTest()
void readVector(CanonicalIO &cio, std::vector< uInt > &index)
void truncateIfNeeded()
Truncate the file if blocks are freed at the end.
String: the storage and methods of handling collections of characters.
Definition String.h:223
this file contains all the compiler specific defines
Definition mainpage.dox:28
uInt itsNrContUsed[2]
Definition MultiFile.h:267
void checkCRC(const void *buffer, Int64 blknr) const
Check the CRC of a data block read.
const Bool False
Definition aipstype.h:42
uInt itsHdrContInx
Definition MultiFile.h:268
std::vector< uInt > itsCRC
Definition MultiFile.h:270
uInt calcCRC(const void *buffer, Int64 size) const
Calculate the CRC of a data block.
void readHeader(Bool always=True) override
Read the header info.
void doDeleteFile(MultiFileInfo &) override
Do the class-specific actions on deleting a file.
void storeCRC(const void *buffer, Int64 blknr)
Store the CRC of a data block in the index.
void doAddFile(MultiFileInfo &) override
Do the class-specific actions on adding a file.
unsigned int uInt
Definition aipstype.h:49
void doTruncateFile(MultiFileInfo &info, uInt64 nrblk) override
Truncate the file to nrblk blocks.
long long Int64
Define the extra non-standard types used by Casacore (like proposed uSize, Size)
Definition aipsxtype.h:36
void close() override
Flush and close the file.
void readBlock(MultiFileInfo &info, Int64 blknr, void *buffer) override
Read a data block.
void readHeaderVersion2(std::vector< char > &buf)
Read the version 2 and higher header.
std::unique_ptr< ByteIO > itsIO
Definition MultiFile.h:271
int Int
Definition aipstype.h:48
bool Bool
Define the standard types used by Casacore.
Definition aipstype.h:40
void readHeaderVersion1(Int64 headerSize, std::vector< char > &buf)
Read the version 1 header.
void writeHeader() override
Write the header info.
Bool itsUseCRC
Definition MultiFile.h:269
virtual void extendVF(MultiFileInfo &info, Int64 lastblk, Bool useFreeBlocks)
Extend the virtual file to fit lastblk.
void doFlushFile() override
Flush the file itself.
const Bool True
Definition aipstype.h:41
void writeBlock(MultiFileInfo &info, Int64 blknr, const void *buffer) override
Write a data block.
void doCloseFile(MultiFileInfo &) override
Do the class-specific actions on closing a file.
void extend(MultiFileInfo &info, Int64 lastblk) override
Extend the virtual file to fit lastblk.
void doOpenFile(MultiFileInfo &) override
Do the class-specific actions on opening a file.
MultiFileInfo itsHdrCont[2]
Define two continuation sets where the header overflow can be stored.
Definition MultiFile.h:266
unsigned long long uInt64
Definition aipsxtype.h:37
Helper class for MultiFileBase containing info per logical file.