Return to the table of contents
In this chapter we will develop an algorithm that uses the quantum file access method to handle a file containing a large number of records each of which can vary dynamically in size. In order to appreciate the power of this access method, we will start by considering the much simpler problem of access to fixed-length records.
The Quantum File Access Method
Let us suppose that we want to access a number of fixed-length customer records by their record numbers.1 Given the record number, we can locate that customer's record by multiplying the record number by the length of a record, which gives us the offset into the file where that record will be found. We can then read or write that record as needed.
Of course, a real application needs to reuse records of customers who have become inactive, to prevent the customer file from growing indefinitely. To handle this problem, we could set a limit on the file size and, when it is reached, start reusing records that haven't been referenced for a long time, making sure to correct or delete any records in other files that refer to the deleted customer.
This fixed-length record system is not very difficult to implement, but it has significant drawbacks; address fields, for example, tend to vary greatly in length, with some records needing 25 or 30 characters for a city name or street address and others needing only 10. If we allocate enough storage for the extreme case, the records become much longer than if we had to handle only the average case. However, allocating enough for only the average case will leave those customers whose names or addresses won't fit into the allotted space quite unhappy, as I know from personal experience as a software developer! The obvious solution is to allow the fields (and therefore the records) to vary in length as necessary.
Unfortunately, variable-length records are much more difficult to deal with than fixed-length ones. The most obvious reason, as discussed above, is that determining where fixed-length records are stored requires only a simple calculation; this is not true of variable-length records. However, we could remedy this fairly easily by maintaining a record index consisting of an array of structures containing the starting location and length of each record, as depicted in Figure recindex.2
+- Starting Record
| Index Address Length
| +---------------+
| 0 | 0 12 +----+
Record | 1 | 12 12 +----++
Index | 2 | 24 7 +----++-------+
Array | 3 | 31 2 +----++-------+------+
| 4 | 33 5 +----++-------+------+-+
| +---------------+ || | | |
+- +--------------+| | | |
| +---+ | | |
+- | | | | |
| ++-----------+-----------+------+-+----+
Record | |Steve HellerP.O.Box 0335BaldwinNY11510|
Data | +--------------------------------------+
+-
+-
Record | 0000000000111111111122222222223333333333
Offset | 0123456789012345678901234567890123456789
+-
We encounter a more serious difficulty when we want to delete a record and reuse the space it occupied.3 In some situations we can sidestep the problem by adding the new version of the record to the end of the file and changing the record pointer to refer to the new location of the record; however, in the case of an actively updated file such an approach would cause the file to grow rapidly.
But how can we reuse the space vacated by a deleted record of some arbitrary size? The chances of a new record being exactly the same size as any specific deleted one are relatively small, especially if the records average several hundred bytes each, as is not at all unusual in customer data files. A possible solution is to keep a separate free list for each record size and reuse a record of the correct size. However, there is a very serious problem with this approach: a new record may need 257 bytes, for example, and there may be no available record of exactly that size. Even though half of the records in the file might be deleted, none of them could be reused, and we would still be forced to extend the file. The attempt to solve this difficulty by using a record that is somewhat larger than necessary leads to many unusably small areas being left in the file (a situation known as fragmentation).
However, there is a relatively unknown way to make variable-length records more tractable: the quantum file access method.4 The key is to combine them into groups of fixed length, which can then be allocated and deallocated in an efficient manner.
Before the following discussion will make much sense to you, I will need to explain in general terms what we're trying to accomplish: building a virtual memory system that can accommodate records of varying lengths in an efficient manner. This means that even though at any given time, we are storing most of our data on the disk rather than maintaining it all in memory, we will provide access to all the data as though it were in memory. To do this, we have to arrange that any actively used data is actually in memory when it is needed. In the present application, our data is divided into fixed-size blocks called quanta (plural of quantum),5 so the task of our virtual memory system is to ensure that the correct blocks are in memory as needed by the user.6 The quanta in the file are generally divided into a number of addressable units called items.7 When adding a record to the file, we search a free space list, looking for a quantum with enough free space for the new record. When we find one, we add the record to that quantum and store the record's location in the item reference array, or IRA, which replaces the record index in Figure recindex; this array consists of entries of the form "quantum number, item number".8 The item number refers to an entry in the item index stored at the beginning of the quantum; the items are stored in the quantum in order of their item index entries, which allows the size of an item to be calculated rather than having to be stored.
For example, if we were to create an array of variable-length strings, some of its item references might look like those illustrated in Figure itemref1.
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-----------------+
Reference | 1 | 3 2 +-----------------+-+
Array | 2 | 3 3 +-----------------+-+--+
(IRA) | 3 | 3 4 +-----------------+-+--+-+
| 4 | 3 5 +-----------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- Item # Offset Type Index | | | | |
| +--------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 24 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-31 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-33 || VARSTRING 3 |±---------+ |
for | 5 |+++-38 || VARSTRING 4 |±-----------+
Quantum | 6 |||| 38 || UNUSED 0 |
3 | 7 |||| 38 || UNUSED 0 |
| 8 |||| 38 || UNUSED 0 |
| 9 |||| 38 || UNUSED 0 |
| 10 |||| 38 || UNUSED 0 |
| ++++----++-----------------+
+- ||| |+-----------------+
||| +------+ |
||+----+ | |
+- |+---+ | | |
| +--+----+-+------+-----------+-----------+
Quantum | | 11510NYBaldwinP.O.Box 0335Steve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-
When we delete an item from a quantum, we have to update the free space list entry for that quantum to reflect the amount freed, so the space can be reused the next time an item is to be added to the file. We also have to slide the remaining items in the quantum together so that the free space is in one contiguous block, rather than in slivers scattered throughout the quantum. With a record index like the one in Figure recindex, we would have to change the record index entries for all the records that were moved. Since the records that were moved might be anywhere in the record index, this could impose unacceptable overhead on deletions; to avoid this problem, we will leave the item index entry for a deleted item empty rather than sliding the other entries down in the quantum, so that the IRA is unaffected by changes in the position of records within a quantum. If we delete element 1 from the array, the resulting quantum looks like Figure itemref2.
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-------------------+
Reference | 1 | NONE 0 | |
Array | 2 | 3 3 +-------------------+----+
(IRA) | 3 | 3 4 +-------------------+----+-+
| 4 | 3 5 +-------------------+----+-+-+
| +-------------+ | | | |
+- | | | |
| | | |
+- Item # Offset Type Index | | | |
| +----------------------------+ | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | |
| 2 | 12 | UNUSED 0 | | | |
Item | 3 | 19-+| VARSTRING 2 |±-------+ | |
Index | 4 | +--21 || VARSTRING 3 |±---------+ |
for | 5 |++--26 || VARSTRING 4 |±-----------+
Quantum | 6 ||| 38 || UNUSED 0 |
3 | 7 ||| 38 || UNUSED 0 |
| 8 ||| 38 || UNUSED 0 |
| 9 ||| 38 || UNUSED 0 |
| 10 ||| 38 || UNUSED 0 |
| +++-----++-------------------+
+- || |+-----------------+
|| +-----------+ |
|+---------------+ | |
+- +-----------+ | | |
| +--------------+----+-+------+-----------+
Quantum | | 11510NYBaldwinSteve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-
Note that we set the offset value of the deleted item equal to the offset of the item before it so that its length can be calculated as zero. Now let us add a new street address of "123 Main Street". After this change, our data structures look like those in Figure itemref3.
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +------------------+
Reference | 1 | 3 2 +------------------+-+
Array | 2 | 3 3 +------------------+-+--+
(IRA) | 3 | 3 4 +------------------+-+--+-+
| 4 | 3 5 +------------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
|Item # Offset Type Index | | | | |
| +---------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 27 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-34 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-36 || VARSTRING 3 |±---------+ |
for | 5 +-+-++-41 || VARSTRING 4 |±-----------+
Quantum | 6 | | || 38 || UNUSED 0 |
3 | 7 | | || 38 || UNUSED 0 |
| 8 | | || 38 || UNUSED 0 |
| 9 | | || 38 || UNUSED 0 |
| 10 | | || 38 || UNUSED 0 |
| | +-++----++------------------+
+- | || |+-----------------+
| || +---+ |
| |+-+ | |
+- | ++ | | |
| ++----+-+------+--------------+-----------+
Quantum | |11510NYBaldwin123 Main StreetSteve Heller|
Data | +-----------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-
Of course, the deleted item still takes up a slot in the item index of the old quantum; in our implementation this requires 10 bytes of storage.9 We need a method of reclaiming that space, so that we do not gradually fill our quanta with indexes pointing to nothing. We have already seen part of the solution to this problem: whenever we are adding an item to a quantum, we search for an unused item index entry rather than simply adding a new entry at the end.
This mechanism alone would only slow the growth of the item indexes; actually reducing the size of an index requires us to check for empty entries at the end of the item index after deleting an item. If such excess entries exist, we reclaim the space they occupy by reducing the entry count in the item index header to the number actually in use. This reduces wasted space in the item index without affecting entries in the IRA. Since we remove unused item index entries only when they are at the end of the index, no IRA entries can refer to items later in the same quantum.
Our records are stored in the quanta pointed to by our IRA. The IRA will be stored in its own quanta, which allows us to make it as large as needed. However, this means that we can't allocate the IRA in one piece. Since a quantum has a fixed size, we will have to break up our IRA into segments that will fit in a quantum; these segments will be collectively referred to as the little pointer array. We'll also need some way to find the segment that we need for a particular element in the IRA, which will be the responsibility of the big pointer array.
Figure bigpointer shows the relationship between the big pointer array and the IRA segments.
+------------------------------------------+
| +- |
| Big | element_count = 8000 |
| Pointer | last_quantum_added_to = 3 |
| Array | |
| Header | |
| +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
| Big | 0 | 2 +-------+---+
| Pointer | 1 | 10 +-------+---+-+
| Array | 2 | 12 +-------+---+-+--+
| | 3 | 14 +-------+---+-+--++
| | 4 | 11 +-------+---+-+--+++
| | +-------------+ | | | |||
| +- | | | |||
| | | | |||
+------------------------------------------+ | | |||
+--------------------------------------+ | | |||
| |±------+ | |||
| +- | Quantum | |||
| | Quantum Item | 2 | |||
| | Index Number Number | | |||
| Item | +-------------+ | | |||
| Reference | 0 | 3 2 | | | |||
| Array | 1 | 3 4 | | | |||
| (IRA) | 2 | 3 5 | | | |||
| segment | 3 | 3 3 | | | |||
| 0 | 4 | 3 1 | | | |||
| | +-------------+ | | |||
| +- | | |||
+--------------------------------------+ | |||
+--------------------------------------+ | |||
| |±--------+ |||
| +- | Quantum |||
| | Quantum Item | 10 |||
| | Index Number Number | |||
| Item | +-------------+ | |||
| Reference | 0 | 3 12 | | |||
| Array | 1 | 4 6 | | |||
| (IRA) | 2 | 4 1 | | |||
| segment | 3 | 4 2 | | |||
| 1 | 4 | 4 3 | | ²²²
| | +-------------+ |
| +- |
+--------------------------------------+
Because our ArrayIndex
type is
defined as an unsigned long
, which is usually 4 bytes, we
can actually create arrays of that size.10
There is one more generalization of the quantum file access method that will make it more generally useful: the ability to create more than one array of variable-length items.
The mechanism by which we will accomplish this goal is called the "main object index", which contains the quantum number of the big pointer array for each object; the number of an object is set when the object is created, by a directory facility which we will examine later.
Figure mainindex shows the path from the main object index through the big pointer array, the IRA segment, and the item index, to finally arrive at the data.
+-
| Index Quantum Number
| +----------+
Main | 0 |NO_QUANTUM|
Object| 1 | 5 +-------------------------+
Index | 2 | 9 | |
| 3 | 1 | |
| 4 | 7 | |
+- +----------+ |
+-----------------------------------------------+ |
|Big +- |±----+
|Pointer | element_count = 8000 |
|Array | last_quantum_added_to = 3 |
|Header +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
|Big | 0 | 2 +--------------+---+
|Pointer | 1 | 10 | | |
|Array | 2 | 12 | | |
| | 3 | 14 | | |
| | 4 | 11 | | |
| +- +-------------+ | |
+-----------------------------------------------+ |
+-------------------------------------------+ |
| +- Quantum Item |±------+
| | Index Number Number | Quantum
|Item | +-------------+ | 2
|Reference | 0 | 3 2 +----------+----------+
|Array | 1 | 3 4 +----------+----------+-+
|(IRA) | 2 | 3 5 +----------+----------+-+-+
|segment | 3 | 3 3 +----------+-------+ | | |
|0 | 4 | 3 1 +----------+-----+ | | | |
| +- +-------------+ | | | | | |
+-------------------------------------------+ | | | | |
+-----------------------------------------------+ | | | | |
| +- | | | | | |
| | Item # Offset Type Index | | | | | |
| | +----------------------------+ | | | | | |
| | 1 | 5 -+ VARSTRING 4 |±+-+ | | | |
| | 2 | 17 +| VARSTRING 0 |±+---+--+ | |
|Item | 3 | +-19 || VARSTRING 3 |±+---+ | |
|Index | 4 | ++-31 || VARSTRING 1 |±+--------+ |
| | 5 |+++-38 || VARSTRING 2 |±+----------+
| | 6 |||| 38 || UNUSED 0 | | Quantum 3
| | ++++----++-------------------+ +---------+
| +- ||| |+------------------------+ |
| ||| +-------------+ | |
| ||+----------------+ | | |
| +- |+-----+ | | | |
|Quantum | +-+------+-----------+-+-----------+----+ |
|Data | | BaldwinP.O.Box 0335NYSteve Heller11510| |
| +- +---------------------------------------+ |
| 333333333222222222211111111110000000000 |
| 876543210987654321098765432109876543210 |
+---------------------------------------------------------+
Of course, we need a place to store the main object index. A normal application might have five or ten main objects; therefore, our allocation of 256 entries in the main object index is extremely generous in most cases.11
Let's start our examination of the implementation of the quantum file code by looking at a simple program that uses it to store a number of strings and then retrieve them (Figure flextest.00).
codelist/flextest.00
Possibly the most striking feature of this program is how simple it
appears, which is a direct consequence of the power of C++. Writing a
similar program in C would require much greater knowledge of the
implementation of the quantum file algorithm. However, in C++ all we
have to know is which header file to include, how to open the quantum
file, and how to create a FlexArray
object, which behaves
like a variable-sized array of variable-sized strings. Once we have
dealt with this small set of prerequisites, we can use this extremely
powerful and flexible data type in virtually the same way we would use
a built-in type.
Let's take a detailed look at exactly how we set up and use a
quantum file in this program. First, we have a couple of statements
that declare a temporary buffer variable which we'll use to construct
data to store in the quantum file and delete any previous version of
the quantum file itself. Next, we create a QuantumFile
object called QF
via the default constructor of the QuantumFile
class
. Then we call an Open
function on that
object, supplying the name of the quantum file to be created.
The next statement creates a FlexArray
object called TestArray
.
The arguments to this constructor are a pointer to the QuantumFile
object that we'll use to store the data for TestArray
,
and a name by which we could reuse this same FlexArray
object in another program or another execution of this program.
Now we're up to the loop where we will start to use the TestArray
object. The loop steps through the first 100 elements of TestArray
,
assigning each element a String
consisting of digits
corresponding to the position of that element in the TestArray
object. The strstream
named Temp
is used to
format the numeric values of the loop variable i
into String
values that can be stored in a FlexArray
object.
Finally, after we've put the values into TestArray
,
the second loop displays the values in the TestArray
object.
At this point, I wouldn't be at all surprised if you had come to the conclusion that I have something up my sleeve ... and you would be right. Underneath this apparent simplicity is a wonderland of complex programming, which I hope I will be able to explain in an understandable way. Let's begin by examining some of these data types we've just seen, in enough detail to demystify them.
The first of these data types is the String
. As it
happens, this is actually a typedef
for a more general
type called SVector
, so we'll start with the question of
exactly what an SVector
might be.
An SVector
is very much like an array, with the
following exceptions:
SVector
,
you will get an error message rather than a mysterious crash. SVector
has,
by calling its size
member function. SVector
after it has
been created, by calling its resize
member function. SVector
of a particular type to
another SVector
of the same type. SVector
by copying from another SVector
of the same type. While these are very useful characteristics for any type of SVector
,
they are even more important in one particular type of SVector
:
the String
, which is an SVector
of char
s.
As a specialization of the SVector
type, the String
has all of the characteristics of any other SVector
, but
has additional abilities as well. Primary among these is its ability to
be read or written using the standard C++ input/output stream
functions. This String
type, or one very much like it, is
one of the most obvious improvements in C++ over C, whose corresponding
char*
type is notoriously error-prone. In this program, we
will use the String
data type in much the same way as we
would use the char*
data type in a C program.
Unfortunately, we will not be able to go into detail on the
implementation of either the SVector
or String
class
es in this book; we will just use their facilities as
described immediately above.
typedef
Another "type" used in this simple test program is ArrayIndex
.
This is also a typedef
, but for a much simpler underlying
type: unsigned long
.
In this case, the reason I didn't just use the underlying type was
to enhance portability. The original version of this program was
implemented on a 16-bit compiler, where the definition of ArrayIndex
was unsigned short
, primarily because it was impossible
to create an array or SVector
having more than 64K
elements. When I ported this program to a 32-bit compiler, I changed
the definition of ArrayIndex
and several other similar typedef
s
and recompiled.
I would like to be able to tell you that the program worked without
further changes, but that would be a lie: it actually took me a couple
of days to get it working again. However, considering the size of the
source code for the class
es (over 100KB), I was quite
happy to take such a short time to port it.
If I had used the explicit type unsigned short
everywhere, it would have taken me much longer to figure out which
occurrences of that type should be changed to unsigned long
and which should remain as they were. As it was, I found a few places
where I did not use a typedef
or used the wrong one and
ran into trouble as a result. Overall, though, I did a pretty good job
in using appropriate types originally, which saved me a lot of trouble
later.
Now let's start our journey into the interior of the quantum file
implementation, starting with the QuantumFile class
itself, whose header file is shown in Figure blockh.00.
QuantumFile
class
(quantum\block.h) (Figure blockh.00)codelist/blockh.00
I can't say that I'm completely happy with the design of this class
.
Ideally, the user of any class
should not have to worry
about any member data or member variables that the user cannot or
should not use. Unfortunately, this is not easily accomplished in C++.
I believe it is worthwhile to digress slightly here to discuss the
reasons why this is so.
First, let's examine the difficulty of "hiding" private
and protected
member variables from the class
user. The technical problem here is that the compiler cannot allocate
space for a variable unless its size is known. This implies that you
cannot create a variable of a given class
without access
to its interface definition, which must in turn include the definitions
of all of its member variables.
This does not allow the class
user to use those
member variables unless they are public
, so we don't have
to worry about non-member functions accidentally altering member
variables. However, it does mean that the user of a class
must ensure that all of the data types needed in the declaration of a class
are available when the header for that class
is compiled,
and that the names of those data types do not conflict with any
previously defined names. This can be a source of considerable
difficulty when trying to use a fairly large library in a context other
than the one in which it was developed, which of course reduces the
payoff that such reuse would otherwise provide.
The other problem with the design of this class
is
that it defines a number of functions that application programmers have
no need for. These functions are intended for use by other class
es
that contribute to the functionality of the quantum file
implementation. You might think that I could simply make these
functions private
to prevent their being called by
outside class
es. Unfortunately, I wasn't able to do that
with Microsoft Visual C++ 5.0, one of the compilers with which this
code must work, as that compiler apparently cannot understand friend
declarations that specify class
es that are nested within
other class
es, as some of mine are. Even if this did
work, I'd still prefer to spare the application programmer the details
of these functions.
Is there a solution to either or both of these problems? Yes, in fact both of these can be remedied. Let's start with the data problem.
The standard solution to avoid exposing member variables to the
application programmer is to create a secondary data structure that
contains the actual data for the object, and include only a pointer to
such a structure in the main object. Because the compiler knows the
size of a pointer to a structure without having to see the definition
of the structure itself, the header file that defines this secondary
structure can be included only when needed in the implementation of the
class
. This frees the application programmer from any
concern about the representation of the data.
What about those functions that the application programmer doesn't
need to (and shouldn't) call? This can be solved in much the same way
as the data problem. In this case, however, the pointer in the visible class
points to an object of a class
that supplies the
additional functions needed by the other implementation class
es.
Because the client program would not include the header file that
defines this secondary class
, it could not access those
additional functions. Of course, the other implementation class
es
would include that header file so that they could access the additional
functions as needed.
Why didn't I implement either of these solutions (or better yet,
both)? Because doing so would have required wholesale changes to the
implementation, and I ran out of time. Of course, had I considered
these issues when I was first creating these class
es,
they would have been fairly simple to solve, but I didn't think of them
at that time.
In any event, the design is usable as it stands, even if it could
be improved, so let's move on to how we can use it in its present form.
We'll start with the functions that are needed by client programs such
as our test program flextest.cpp
. Figure block.00 shows the first of these
functions, QuantumFile::QuantumFile()
.
QuantumFile class
(from quantum\block.cpp) (Figure block.00)codelist/block.00
As you can see, this function's implementation consists entirely of
initialization expressions in the member initialization list. We'll
look at each of the member variables of this class
in
detail when we get to the functions that use them. For now, I'll just
mention that they are all needed to keep track of the status of the
quantum file and the buffers that maintain parts of its data in memory.
Next, let's take a look at Figure block.01,
which shows the first "regular member function" in this class
,
QuantumFile::Open
. This is a good place to start, since we
can't do anything to a quantum file until we open it, as with any other
file.
Open
function for the QuantumFile class
(from quantum\block.cpp) (Figure block.01)codelist/block.01
After declaring a number of variables, this function initializes
the variables that keep track of the number of physical read and write
operations for benchmarking purposes (m_ReadCount
and m_WriteCount
,
respectively). Then it resizes the SVectors
that keep
track of the block buffers (m_Block
), which quantum is in
each block buffer (m_BlockNumber
), whether each block
buffer has been modified and therefore needs to be written back to the
disk (m_BufferModified
), and the "time stamp" for each
block buffer that indicates how recently it has been accessed (m_TimeStamp
).
We'll see how each of these variables is used in the program as we go
through the code.
The next few lines check whether the caller has supplied a non-null file name for the quantum file. If not, we bail out, because we need a valid file name to use when we open the file to store our data.
Assuming that we do have some sort of file name, we try to open the
file for reading, to see whether it is already there. If fopen
returns NULL, indicating that the file doesn't exist yet, we create the
file, clear the QuantumFileHeaderStruct
structure called QFHS
,
set the main_object_index_count
and free_space_list_count
to their default values, and copy the current program_version_info
into the quantum_version_info
structure element.
The next operation is to initialize the member variables of QFHS
that keep track of the offsets in the file where the main_object_index
,
the free_space_list
, and the quantum file data area
begin, so that these areas of the file can be located when we open the
file again.
Since we haven't created any main objects yet, our next task is to
clear the main object index, setting it to zeros (which indicates
available entries, as quantum number zero is not used), except for the
entry for object number zero, which is set to NoQuantum
;
we will not allow a main object number zero, for better error control.
We also write the header block to the file at this point.
The last operation in the new file initialization is to set up the free space list to indicate that all quanta have the maximum available free space, except of course for quantum zero, which is not used.
Of course, the process of opening an already existing quantum file starts differently; we have to read in the quantum file header and check whether it was created by the same version of the program. The current program simply tests whether the version of the program that created the file is different from the one reading it and aborts if they are different. A production program would use this information to determine whether the versions were compatible, so that if we changed the program in a way that makes old files unusable, we could warn the user that the file is of an old type and should be converted. Alternatively, we might automatically convert it to a new format; in either case, it is essential that we know what version of our program created the file. After we have taken care of this detail, the rest of the code for opening either an old or a new file is the same.
We are almost done with the initialization, but there are a few
more data structures to handle. First, we have to calculate the block
numbers where the main object list, the free space list, and the data
area start. Then we initialize the member variables that keep track of
the current number of main objects and the number of elements in the
free space list. Then we initialize the SVector
s of block
buffers, block numbers, modified flags, and time stamps. Next, we
initialize variables that will allow us to access the free space list
and main object list; we'll get into exactly how that works later.
Finally, if we have created a new quantum file (as opposed to opening one that already existed), we create a main object called "Directory", which we will use to keep track of the names of the main objects that the user will create.
Now let's take a look at Figure block.02,
which shows the next user function, QuantumFile::Flush
.
This function is used to ensure that all data written to the quantum
file has actually been stored on the disk. It's a good idea to call Flush
whenever you have finished with a logical portion of your data
updating, to make sure that a power failure or other crash doesn't lose
data.
Flush
function for the QuantumFile class
(from quantum\block.cpp) (Figure block.02)codelist/block.02
This isn't a very complicated function, but it does show how the m_BufferModified
SVector
is used: Each buffer whose flag in that SVector
is TRUE
is written to the disk via the Write
member function. Any buffer whose flag is not TRUE
has
not been modified since it was last read from the disk, so there is no
reason to write it back to the disk. Avoiding unnecessary writes is
very helpful in reducing disk I/O and improving the speed of the
program. The Write
function resets the flag for any
buffer it writes, so if we called Flush
twice in a row,
no buffers would be written out on the second execution.12
The final function that we're going to cover in this class
at this point is Close
(Figure block.03). As its name suggests, this
function closes the quantum file and takes care of all the housekeeping
required at that time. However, you generally won't call this function
explicitly, because it is called automatically when the QuantumFile
object is destroyed at the end of its scope, as you can see by looking
at the QuantumFile
destructor (Figure block.04).
Close
function for the QuantumFile class
(from quantum\block.cpp) (Figure block.03)codelist/block.03
QuantumFile class
(from
quantum\block.cpp) (Figure block.04)codelist/block.04
Allowing the destructor to close the file automatically is easier and (more important) safer than closing the file explicitly, because it's not always easy to determine when all of the objects in the quantum file are inactive. Hence, I recommend allowing the compiler to take care of closing the file.
Now that we've taken a look at the member functions of the QuantumFile
class
that user programs are concerned with, let's do the
same with the FlexArray
class
. In the
process, we'll start to delve into the underlying layers of the quantum
file implementation in more detail. Let's start with the interface for
the FlexArray
, which is shown in
Figure newquant.00.
class
FlexArray class
(from
quantum\newquant.h) (Figure newquant.00)codelist/newquant.00
As before, we'll postpone discussion of the member variables of
this class
until we see how they are used in the member
functions. We'll start with the normal constructor for the FlexArray
, which is shown in Figure newquant.01.
class
FlexArray class
(from
quantum\newquant.cpp) (Figure newquant.01)codelist/newquant.01
As you can see, this function simply calls an Open
function in the same class
to do all of the actual work.
The purpose of this handoff is to allow the user to create a FlexArray
via the default FlexArray
constructor before opening the
quantum file and fill in the details later via an explicit Open
call. Now that I've cleared that up, let's take a look at the function
that actually does the work, which is shown in Figure newquant.02.
Open
function for the FlexArray class
(from quantum\newquant.cpp) (Figure newquant.02)codelist/newquant.02
The header for this function shouldn't be too alarming. The first
argument is simply the address of the quantum file object in which the FlexArray
will be created. The second argument is the name under which the FlexArray
will be stored in the quantum file's directory, which will make it
accessible to another program or a later run of this program. The ModifiableElement
type is just another name for a String
, left over from a
previous incarnation of that type. The third argument is the initial
number of elements that the FlexArray
will contain, and
the last argument is the maximum number of elements to which the FlexArray
will expand if necessary. If you look at the interface for this class
,
you'll notice that these last two arguments have default values of 1
and UINT_MAX-1
elements, respectively. Therefore, if you
don't have any idea how large a FlexArray
might need to
be, you can omit these arguments and it will expand as needed.
Now we're up to the first statement inside the function, which
appears innocent enough. It assigns a value to a member variable called
m_MOA
, which stands for "Main Object Array". This member
variable can then be used to look up the name of the FlexArray
in the quantum file directory, as well as to access a number of other
attributes of the quantum file. But what is the type of m_MOA
,
and how does it give us access to the quantum file attributes?
To answer these questions, we're going to have to get into some
fairly complex implementation and language issues. We'll start this
excursion with a look at the interface for the MainObjectArrayPtr
class
, shown in Figure newquant.03.
MainObjectArrayPtr class
(from
quantum\newquant.h) (Figure newquant.03)codelist/newquant.03
You will notice that this class
has an embedded class
called MainObjectArray
defined within it, and that the
embedded class
has quite a few member functions. You may
also notice that the MainObjectArrayPtr
class
itself has only a few member functions, most of which are the standard
"structural" member functions -- the normal and default constructors,
the destructor, and the assignment operator. One of the constructors is
the conversion function that constructs a MainObjectArrayPtr
object from a QuantumFile
pointer (used in the FlexArray
Open
function), and the others are the default constructor
and copy constructor. However, there is one member function that is
anything but standard: operator->
. Why would we want
to redefine that operator, and what can a custom version of operator->
do?
Before we get through with this question, we will have entered a realm of C++ where relatively few have trespassed. I don't believe in adding complexity for the sake of complexity, so why would I use a feature of C++ that most C++ programmers have never dealt with?
Because we need this feature to improve the ease of use and reliability of programs using this quantum file access method implementation. A little history lesson is appropriate here so you can see that I'm not speaking from a solely theoretical perspective.
The first edition of this book included an implementation of the quantum file access method written in C.13 It had many of the features that the current implementation has, but it was much harder to use. The main problem was that the user program had to deal with memory allocation issues, including remembering to free memory for variables whose storage was assigned inside the quantum file code. In addition, the syntax of a quantum file "array" was much less convenient than dealing with a built-in array. It was obvious to me that resolving all of these issues so that the user could ignore the inner workings of the quantum file would greatly improve the usability of this algorithm.
Unfortunately, this isn't possible in C; however, a major impetus behind the creation of C++ was to make it possible for class library designers to provide such facilities. Since the quantum file access method would benefit greatly from such an approach, I decided to rewrite it in C++.
Why am I using the word "rewrite", rather than "convert", "translate", or other euphemisms? Judging from the number of ads for "C/C++ programmers" I see in the newspapers, some employers have the notion that switching to C++ can be accomplished by changing source file extensions to ".cpp" and recompiling. After removing some syntax errors revealed by the stricter type checking of the C++ compiler, the program compiles and runs as it did before. What's so difficult about object-oriented programming? If you are one of these employers, or you have tried the above experiment yourself and now believe that you are a "C++ programmer", let me break the news to you gently: all you have accomplished is to switch to the C subset of C++, which has nothing to do with object-oriented programming. Virtually none of the code or design from the original quantum file implementation has survived the transition to object orientation unscathed, despite the fact that, according to an expert on object-oriented design, that original C implementation was "object-oriented"!
However, my loss (if that's what it was) can be your gain: I am going to break a long-standing tradition of object-oriented design literature by disclosing not only the final design but also many of the missteps, errors, and difficulties that ensued from my original determination to tackle this rather complex project. So, with no further ado, let's begin with an often neglected concept which is at the core of the implementation of this project: operator overloading.
One of the most powerful mechanisms for hiding the details of
implementation from the class
user in C++ is operator
overloading, which means defining (or in some cases redefining) the
semantics of one or more of the standard operators +, -, =
,
and so forth, as they apply to class
objects. For better
or worse, the semantics of these operators cannot be changed as they
apply to "intrinsic" data types, so the calculation 2+2
is always going to result in 4
; this restriction is
probably necessary, as the potential confusion could be horrendous.14 A good example of the most common type of
use for this facility is the implementation of +, -
,
etc., for manipulating Complex
class
objects, following the rules of complex arithmetic. The ability to
provide this sort of intuitive operation to the number-crunching user
is starting to make C++ (with the proper class
libraries,
of course) a viable alternative to FORTRAN, long the dominant language
for such users.
As usual, it's probably best to start with a relatively simple
example: in this case, we'll look at an example of overloading operator-
.15 Consider the program in Figure overload1.
operator-
(Figure
overload1)codelist/minus.00
As you can see, I've created a very simplified version of the
dreaded Point
class
, used in innumerable
textbooks to represent a point on the Euclidean plane. The result of
running our sample program is 5
, which is the distance
between the Point
(1,1)
and the Point
(4,5)
, calculated by the normal Euclidean calculation:
Distance = sqrt(xdiff*xdiff + ydiff*ydiff)
where xdiff
is the difference between the x
coordinates of the two Point
s, and ydiff
is
the difference between their y coordinates; sqrt
,
of course, represents the square root function.
The "magic" is in the definition of operator-
, which
is the syntax for redefining an operator; in this case it's the
subtraction operator. When the compiler sees the expression x-y
,
it looks for a definition of operator-
that is specified
for class
Point
, taking an argument which
is also of class
Point
. Since there is such
a definition, the compiler generates a call to the code specified in
that definition. Had there not been such a definition, a syntax error
would have resulted, since the compiler doesn't have any built-in
knowledge of how to subtract two Point
s.
Operator overloading doesn't have to be terribly complicated, as
that example illustrates. However, there's a trick to the overloading
of operator->
. When we overload other operators, such
as our earlier example of operator-
, our code takes over
the implementation of the operator completely. That is, the result of
calling our operator is whatever we say it is; in the operator-
example, it's the double
result of the Euclidean distance
formula. However, this is not true with operator->
; in
that case alone, a different scheme is followed by the compiler. Figure
overload2 shows some code derived
from an early version of the project. How does the compiler interpret
it?16
operator->
overloading
(Figure overload2)codelist/danger.00
The line X->Get(1);
will be compiled as follows:
1. Since X
is not a pointer to any type, its class
definition is examined for a definition of operator->
.
2. Since this definition is found, a call to that code is inserted in the function being compiled.
3. Then the return value of the operator->
code is
examined. Since it is a pointer type (BlockPtr *
, to be
exact), the compiler continues by generating code to call BlockPtr->Get(int)
with this
equal to the result returned by operator->
,
which is the same BlockPtr
that we started with.
So far, this is doing exactly what we wanted it to do. However,
there is one serious problem with this method of implementing operator->
,
which is illustrated by what happens when the line Y->Get(1);
is compiled. Unlike our previous example, Y
is a
pointer (to a BlockPtr
); therefore, the compiler doesn't
look for a definition of operator->
, but merely
generates code to call BlockPtr->Get(int)
with this
equal to Y
. As a result, our overloaded operator is never
called.
Much the same problem will occur when the line X.Get(1);
is compiled. The compiler is happy to generate a call to BlockPtr.Get(int)
with this
being equal to the address of X
;
again, though, our custom operator->
code won't be
executed.
The reason for this problem is that the compiler stops looking for operator->
overloading whenever an actual pointer is found. After all, if you
already have a pointer in a place where a pointer is needed, you must
have what you want! Obviously, we're going to have to come up with a
type-safe solution, so we can't accidentally use the wrong syntax and
end up with a nasty bug.17 The solution is
to have not one, but two class
es involved in the virtual
memory mechanism, an "outer" (or handle) class
and an "inner" (or body) class
.18
The handle class
has two primary functions: the
initializing of a body class
object and the overloading
of operator->
to return a pointer to that body class
object, which actually does the work. Figure overload3 presents this solution.
operator->
overloading
(Figure overload3)codelist/safe.00
One interesting thing about this program is that the main program
refers to the Get
function in exactly the same way as it
did before: in fact, the only difference visible to the
user is that the compiler is now able to prevent him (and us) from
compiling the incorrect references as we did before.
class
However, there are some changes to the internals that deserve
comment. First of all, in the class
definition, the
previous BlockPtr
class
has been renamed to
Block
and made a nested class
of a new BlockPtr
class
. The reason for the renaming is to prevent
unnecessary changes to the user's code; that was successful, as noted
above. Another design decision whose justification is not quite as
obvious is the use of a nested class
rather than a
freestanding class
. Why is this appropriate?
Since the purpose of the nested Block
class
is solely to add type safety, rather than to provide any visible
functionality, nesting it inside the BlockPtr
class
reduces "name space pollution". That is, if the user needs a Block
class
for some other purpose, its name won't conflict with
this one. Such potential conflicts are going to become more serious as
the use of class
libraries increases; we should do our
part to "preserve the environment" when possible.19
The fact that the Block
class
is visible
only to members of its enclosing class
, BlockPtr
,
is responsible for the somewhat odd appearance of the member function
declarations in Figure overload3.
For example, what are we to make of the declaration "char
BlockPtr::Block::Get(int p_Index)
"?
Of course, we C++ programmers are used to "qualified" function
names, such as Block::Get
. But why do we need two
qualifiers? Because the compiler knows the meaning of Block
only in the context of BlockPtr
; that's what prevents the
global name pollution that would ensue if Block
were
declared outside of BlockPtr
. This means that we could,
for example, have another class
called Block
nested inside another enclosing class
called NewBlockPtr
.
If that Block
class
had a Get
member function, we could declare it as "char
NewBlockPtr::Block::Get(int p_Index);
", and the compiler would
have no problem telling which Block::Get
we meant.
Similarly, an unadorned "Block::Get(int p_Index);
" would
be yet another completely distinct function.20
One more safety feature in this new arrangement is worthy of note: the
constructor for Block
is protected
, so that
the user can't accidentally create one and use it instead of
referencing it through a BlockPtr
object the way we
intend. This isn't very likely anyway, since the user would have to
specify BlockPtr::Block
as the type, which isn't easy to
do by accident; however, according to the principles of object-oriented
programming, it is best to hide the internals of our classes whenever
possible, so that class
user code can't depend on those
internals. That restriction makes it possible for us to improve the
implementation of our classes without breaking user code.
Now that we've investigated the notion of redefining operator->
,
let's return to the MainObjectArrayPtr
class
that we were discussing a little while ago. We'll start with Figure newquant.04, which shows the code
for the normal constructor for that class
.
MainObjectArrayPtr
(from
quantum\newquant.cpp) (Figure newquant.04)codelist/newquant.04
This function is fairly simple. It creates a new MainObjectArray
object using the normal constructor for that class
. This
is the "body" object that will actually do the work of both of these class
es.
Then it sets the reference count of that object to 1,
indicating that there is one MainObjectArray
"handle"
object using that "body" object.
The general idea of reference counting is fairly simple, as most great ideas are (after you understand them, at least). It's inefficient to copy a lot of data whenever we set one variable to the same value as another; on the other hand, copying a pointer to the data is much easier. However, we have to consider how we will know when we can delete the data being pointed to, which will be when no one needs the data anymore.
One way to share data safely in such a situation is to write the
constructor(s), destructor, and assignment operator to keep track of
the number of objects that are using a particular body object, and when
there are no more users of that object, to delete it.21
We'll see how this works in the case of the MainObjectArray
body class
and its MainObjectArrayPtr
handle class
; the implementation is much the same in the
other handle/body class
es in this program.
However, right now we're more interested in how we can use operator->
in the FlexArray::Open
function (Figure newquant.02). After creating a
member variable called m_MOA
via the normal constructor
of the MainObjectArrayPtr
class
, the next
line in the Open
function invokes MainObjectArrayPtr
::operator->
(Figure newquant.06).
MainObjectArrayPtr::operator->
(from
quantum\newquant.cpp) (Figure newquant.06)codelist/newquant.06
Given the definition of operator->
in Figure newquant.06, how will the compiler
interpret the expression m_MOA->FindObjectByName
,
which is used in the function FlexArray::Open
?
1. Since m_MOA
is not a pointer to any type, the
interface of its class
(MainObjectArrayPtr
)
is examined for a definition of operator->
.
2. Since there is such a definition, a call to that code is inserted in the function being compiled.
3. Then the return value of the operator->
code is
examined. Since it is a pointer type (MainObjectArray *
,
to be exact), the compiler continues by generating code to call MainObjectArray->FindObjectByName
with this
equal to the result returned by operator->
,
which is the address of the body object of the MainObjectArrayPtr
that we started with. Of course, the same analysis applies to the other
uses of the m_MOA
object in the Open
function as well as the other functions that use handle objects.
Now let's continue with the analysis of the Open
function in the FlexArray
class
. The
statement we've just examined calls a function called FindObjectByName
,
which returns an index into the main object array. If the name supplied
in the call was in fact the name of an existing object in the main
object array, then the returned value will be the element number of
that object in the main object array. On the other hand, if the
specified name was not found in the main object array, the returned
value will be the special value NoObject
. In that case, Open
will use another member function of MainObjectArray
,
namely FindAvailableObject
, to locate a free element in
the main object array. Then it will call the CreateMainObject
function to create a new main object which will use that free element.
At this point in the function, we have a valid index into the main
object array for our FlexArray
, whether it existed
previously or was just created. Finally, we retrieve the current and
maximum element counts for the FlexArray
object.
Now that we have covered the Open
function in the FlexArray
class
, there's only one other function in that class
that could use a significant amount of explanation: the implementation
of operator []
. However, I'm going to postpone coverage
of that function until the next chapter, where we'll be examining the
somewhat complex task of making one of our data types look as much like
an array as possible.
In the meantime, we should go over the rest of the member functions
of the MainObjectArrayPtr
class
, so you can
see how this typical handle class
works.
MainObjectArrayPtr
class
We'll start with the default constructor (Figure newquant.07), which is pretty
simple. In fact, it's exactly like the normal constructor (Figure newquant.04), except that it uses
the default constructor of its body class
to create its
body object rather than the normal constructor of the body class
.
MainObjectArrayPtr
(from
quantum\newquant.cpp) (Figure newquant.07)codelist/newquant.07
MainObjectArrayPtr
Copy ConstructorThe copy constructor (Figure newquant.08) isn't much more complicated. It copies the pointer to the main object array (body) object from the existing object to the same member variable in the newly created object. Then it increments the reference count because there is now one more user of that body object.
MainObjectArrayPtr
(from
quantum\newquant.cpp) (Figure newquant.08)codelist/newquant.08
MainObjectArrayPtr::operator =
FunctionThe next function, MainObjectArrayPtr::operator =
(Figure newquant.09), is a bit
more complicated than the previous ones, because it has to account for
the reference counts in both the current object and the right-hand
object being copied from.
MainObjectArrayPtr::operator=
(from
quantum\newquant.cpp) (Figure newquant.09)codelist/newquant.09
We start by decrementing the reference count of the current body object, because we won't be pointing to it after this operation (unless the right-hand object happens to point to the same body object as ours does). Then we check whether the reference count is less than or equal to 0, in which case we can (and will) delete the current body object.22 After taking care of that issue, we copy the right-hand object's main object array pointer to the corresponding member variable in our object and increment the reference count for that main object array. Finally, we return our object to the caller.
MainObjectArrayPtr
DestructorThe final function in the MainObjectArrayPtr
class
is the destructor (Figure newquant.10).
MainObjectArrayPtr class
(from
quantum\newquant.cpp) (Figure newquant.10)codelist/newquant.10
This shouldn't seem at all mysterious after our analysis of the operator
=
function, as its operation is basically a subset of what that
function does. That is, it decrements the reference count of its body
object; if the resulting reference count is less than or equal to 0, it
deletes the body object.
Now that we're through with the MainObjectArrayPtr
class
,
we'll examine its body class
, MainObjectArray
,
so we can see how this fundamental data type of the quantum file
algorithm works.23
MainObjectArrayPtr::MainObjectArray
class
We'll skip over the default constructor and destructor, as these
functions don't do anything particularly interesting. However, the
normal constructor for this class
does some
initialization that we should pay attention to, as shown in Figure newquant.11.
MainObjectArray class
(from quantum\newquant.cpp) (Figure newquant.11)codelist/newquant.11
As you can see, this constructor begins by checking whether its
argument is non-null; if so, it continues by initializing a member
variable to the quantum file pointer provided by that argument. Then it
uses that quantum file pointer to retrieve the number of main objects
in the quantum file as well as the number of blocks that the main
object list occupies. Then it resizes its block pointer list to that
count and initializes each of the elements in the list to the
appropriate block pointer. We'll see exactly how those block pointers
work when we get back to the QuantumFile
class
,
but for now we'll just assume that they allow access to the individual
blocks that make up the main object list.
MainObjectArray::Get
FunctionNow let's take a look at the Get
function in this class
(Figure newquant.12).
MainObjectArray::Get
function (from
quantum\newquant.cpp) (Figure newquant.12)codelist/newquant.12
This function is reasonably straightforward. It begins by checking
the index of the element in the main object array that the caller wants
to retrieve. If it is out of range, the return value is NoObject
,
indicating an error. However, assuming that the index is valid, the
function calculates which block and element within that block
corresponds to the requested entry. It then retrieves that element via
the MainObjectBlock::Get
function, which is called for
the m_BlockPtr
variable for that block.24
Finally, the result of the Get
function, which is the
quantum number of the big pointer quantum of the object in question, is
returned to the calling function.
MainObjectArray::Set
FunctionThe next function we'll cover, MainObjectArray::Set
(Figure newquant.13), is the
counterpart to Get
.
MainObjectArray::Set
function (from
quantum\newquant.cpp) (Figure newquant.13)codelist/newquant.13
This function takes two arguments, the element number in the main
object array and the quantum number of that object's big pointer
quantum. After checking that the element number is valid, it calculates
the block number and element number in which the desired entry will be
found, then calls the MainObjectBlock::Set
function for
the m_BlockPtr
variable of the appropriate block in the
main object array. Finally, if the entry being stored is equal to NoObject
and the element number of this element is less than the current "lowest
free object", then we update that variable to indicate that we should
start looking at this place in the main object array the next time we
are searching for a free object.
MainObjectArray::FindAvailableObject
FunctionThe next function we'll cover, MainObjectArray::FindAvailableObject
(Figure newquant.14), is used to
find an available entry in the main object array when we want to create
a new object.
MainObjectArray::FindAvailableObject
function
(from quantum\newquant.cpp) (Figure newquant.14)codelist/newquant.14
This isn't terribly complicated either. It searches the main object
list block by block, starting with the entry corresponding to the last
known value of the "lowest free object" variable and continuing until
it finds an empty entry (i.e., one whose value is NoObject
).
Once it finds such an entry, it calculates the corresponding element
number in the main object array and returns it as the result after
storing it in the "lowest free object" variable.25
MainObjectArray::GetModifiableElement
FunctionThe next function we'll cover, MainObjectArray::GetModifiableElement
(Figure newquant.15), is used to
retrieve a ModifiableElement
value from a main object.
MainObjectArray::GetModifiableElement
function
(from quantum\newquant.cpp) (Figure newquant.15)codelist/newquant.15
Among its many uses, GetModifiableElement
is the
function that a FlexArray
uses to retrieve values, so it
is quite important to the functioning of our test program. Although it
is somewhat longer than most of the other functions we've seen so far,
it's really not too complicated, and it will serve as a good
introduction to the more complex functions in this class
.
We start by using the Get
member function to look up
the big pointer array quantum number for the object in which the
element is contained. Once we have that quantum number, we call the
function QuantumFile::MakeBigPointerBlockPtr
to create a
handle object that will allow us to access that quantum as a big
pointer array. Next, we calculate which big pointer array entry
contains the quantum number of the little pointer block that contains
the element reference that we'll use to access the actual data. Once we
know that, we can call the BigPointerBlock::GetBigArrayElement
function to retrieve the quantum number for that little pointer block.
Then we call the function QuantumFile::MakeLittlePointerBlockPtr
to create a handle object that will allow us to access that quantum as
a little pointer array. Once we have done that, we calculate which
element in the little pointer array is the element reference to the
element we are interested in, and retrieve that element reference from
the little pointer array.
Now we're ready to access the element itself, assuming that it
actually exists. First, though, we have to account for the possibility
that it is a null item, that is, one that contains no data. If this is
the case, we return an empty (i.e., default-constructed) ModifiableElement
.
Why have I handled this case separately, rather than storing a
zero-length ModifiableElement
in the leaf block?
A null item is an optimization: we could treat items containing no
data like all other items. However, special handling for this case is
worthwhile, since it is very common. For example, a word processor
might store each line of a file as a string, removing the trailing null
from each string before storing it; with that approach, strings
consisting only of a trailing null would be stored as null items. The
advantage of using null items is that, rather than occupying an item
index slot indicating a length of zero, they are represented by item
references that contain a special quantum number called NO_QUANTUM
and a special item number called NULL_ITEM_NUMBER
,
neither of which can appear in a real item reference. This means that a
null item does not occupy any space in a item index, and that we can
avoid loading a quantum to retrieve its value.
However, let us suppose that the element reference is not null. In
that case, we call QuantumFile::MakeLeafPointerBlockPtr
to create a handle object that will allow us to access items in the
quantum containing the desired item. Then we use QuantumBlock::GetModifiableItem
to retrieve the actual item and return it to the caller.
MainObjectArray::PutElement
FunctionThe next function we'll cover, MainObjectArray::PutElement
,
shown in Figure newquant.16, is
used to store a ModifiableElement
value into a main
object.
MainObjectArray::PutElement
function (from
quantum\newquant.cpp) (Figure newquant.16)codelist/newquant.16
This starts out by using a preprocessor macro called qfassert
to check whether the size of the item to be stored is legal, i.e., that
it is less than the maximum allowable item size.26
After that, the code is very similar to its counterpart, GetModifiableElement
,
until we retrieve the item reference from the little pointer array.
Then, if the item reference is not null, we call QuantumBlock::DeleteItem
to delete the current contents of that element. Then we check whether
the size of the element to be stored is zero; if so, we store a null
reference in the little pointer array (as noted previously).
On the other hand, if we have some data to be stored, we have to
find a place to store it. The most logical place to look is in the
quantum that we have most recently used to store something for this
object, and this information is therefore maintained by the big pointer
block access code in the BigPointerBlock class
.27 Therefore, we check whether there may be
enough free space in that quantum by calling QuantumFile::QGetFreeSpace
.28 Because a free space code is only one
byte (to reduce the size of the free space list), it cannot exactly
represent the amount of free space in a quantum. Therefore, if the free
space list entry indicates that this quantum may have enough space for
the item, we have to retrieve the quantum via QuantumFile::MakeLeafBlockPtr
and then call QuantumBlock::CalculateFreeSpace
to
calculate its free space more precisely. If the actual free space is
larger than the size of the item, then we have found the quantum we
will use to store the item.
However, if that quantum does not have enough space to store the
item, then we will have to search for one that does. That is handled by
QuantumFile::FindSpaceForItem
, which returns the number of
the first quantum that fits the bill.
At this point, we have a quantum in which to store the item.
Therefore, after setting the "last quantum added to" member variable,
we set the quantum number in the element reference to that quantum
number. Then we call QuantumBlock::AddItem
to add the
item to that quantum, and set the item number in the element reference
to the value returned from that function. Finally, we update the little
array pointer that points to the newly added element.
MainObjectArray::GetMainObjectElementCount
FunctionThe next function, MainObjectArray::GetMainObjectElementCount
(Figure newquant.17), is used to
retrieve the number of elements that a main object currently contains.
MainObjectArray::GetMainObjectElementCount
function (from quantum\newquant.cpp) (Figure
newquant.17)codelist/newquant.17
You might think that this would be a trivial "get" function that merely returns the value of a member variable, but that is not the case. The reason is that this information is stored in the quantum file itself, as part of the big pointer array, because it needs to be maintained between executions of this program (or any other program that might want to use the file).
However, it's not too complicated. After using the Get
function to retrieve the quantum number of the big pointer array for
the object, it uses QuantumFile::MakeBigPointerBlockPtr
to provide access to the big pointer array. Then it calls BigPointerBlock::GetBigArrayElementCount
to retrieve the value to be returned.
MainObjectArray::GetMainObjectMaxElementCount
FunctionThe next function, MainObjectArray::GetMainObjectMaxElementCount
(Figure newquant.18), is used to
retrieve the maximum number of elements that a main object is permitted
to contain. Because it is exactly like the previous function except for
the function it calls in the BigPointerBlock
class
,
I'll just list it without further comment.
MainObjectArray::GetMainObjectMaxElementCount
function (from quantum\newquant.cpp) (Figure
newquant.18)codelist/newquant.18
MainObjectArray::CreateMainObject
FunctionThe next function we'll cover, MainObjectArray::CreateMainObject
,
shown in Figure newquant.19, is
used to create a new main object.
MainObjectArray::CreateMainObject
function (from
quantum\newquant.cpp) (Figure newquant.19)codelist/newquant.19
This function is somewhat complex, but that should be
understandable because it takes quite a bit of work to create a new
main object. We start off by checking whether the user has asked to
create a zero-length object. Such an object would cause complications
in the way that the size of the little pointer array is calculated;
namely, the calculation of the number of blocks needed to hold the
little pointer arrays (LittlePointerBlockCount
) produces
correct results for all values of the element count that are greater
than 0 but produces a nonsensical result when the element count is 0.
Because arrays can grow in size, a zero-element array may be a
reasonable thing to create, so I didn't want to disallow them; the
simplest solution, and the one I adopted, was to treat a request to
create a zero-element array as a request for a one-element array.
Because a zero-length object isn't very useful until it has been
resized, this isn't likely to surprise the user.
Once that detail has been handled, we compute a free space entry
indicating that this block is no longer available for use. Then we
continue by calling QuantumFile::FindEmptyBlock
to find a
block for the big pointer array, then calling QuantumFile::MakeBigPointerBlockPtr
to allow access to that block as a big pointer block.
Now we are ready to begin setting up the big pointer block for use.
We start by calling BigPointerBlock::Clear
to erase any
previous data in the block.29 Next, we
call BigPointerBlock::SetQuantumType
to set the type of
the quantum to indicate that it contains a big pointer array; this
information is used for consistency checking when accessing a quantum
that is supposed to be a big pointer array quantum.30
Then we call BigPointerBlock::SetMainObjectNumber
to set
the entry in the big pointer block that indicates which main object it
belongs to. This is needed when we are updating the free space list,
because each free space entry includes the number of the main object
for that quantum. This is relevant because we do not share a quantum
among several main objects, to simplify the task of programs that
recover as much data as possible from a quantum file that has become
corrupted.
The next step is to set up the header for the big pointer array.
This contains the current element count, the maximum element count, and
the last quantum that we've added an element to; of course, the latter
variable is initialized to NoQuantum
, as we have not yet
added any data to this new object. Once we have set the header up, we
call QuantumBlock::AddItem
to add it to the big pointer
block, so that it will be accessible the next time we use the big
pointer array.
Now it's time to initialize the little pointer array for this object and the big pointer array that allows us to access the little pointer array. To start this process, we calculate the number of little pointer blocks that we will need, which is dependent on the number of elements in the object and the number of item references that fit in one block. Once we have determined that number, we can create a big pointer array containing that number of quantum numbers, and a temporary little pointer array that we will use to initialize each block in the little pointer array.
We're up to the loop that initializes all of the blocks in the
little pointer array. It begins by calling QuantumFile::FindEmptyBlock
to find an empty block that we can use for the little pointer array.
Then we call QuantumFile::SetFreeSpace
to reserve this
quantum for the current main object and mark it as full.31
Next, we set the current element of the big pointer array to the
quantum number of the quantum where we are storing the current little
pointer array block.
Now we're ready to initialize the little pointer block. We start by
calling QuantumFile::MakeLittlePointerBlockPtr
to allow
us to treat the current quantum as a little pointer block. Once we have
done that, we call LittlePointerBlock::Clear
to
initialize the quantum to an unused state, LittlePointerBlock::SetQuantumType
to set the quantum type to "little pointer array" (for consistency
checking while retrieving data), LittlePointerBlock::SetMainObjectNumber
to allow us to update the free space list when adding data to this
quantum, QuantumBlock::AddItem
to add a section of the
little pointer array to this quantum, and LittlePointerBlock::ClearLittleArray
to initialize that section of the little pointer array to null item
references.
One detail that might not be immediately obvious is why we have to
call QuantumFile::SetFreeSpace
again at the end of the
loop. The reason is that the QuantumBlock::AddItem
function recalculates the free space for the block to which it adds the
item. Since we want to prevent any further data from being added to
this block, we have to reset the free space manually (to "none
available") to indicate that this block is full.
Once we have initialized all of the little pointer array sections,
we're ready to finish setting up the big pointer array for use. The
first step in that process is to add the big pointer array data we have
just filled in to the big pointer quantum, via QuantumBlock::AddItem
.
Then we use QuantumFile::SetFreeSpace
to reserve the big
pointer quantum for our new main object and mark it full. Then we use
the member function Set
to set the big pointer array
quantum number for our new main object. Finally, we call the member
function PutElement
to set up the directory entry for the
new main object.
MainObjectArray::FindObjectByName
FunctionThe next function we'll cover, MainObjectArray::FindObjectByName
(Figure newquant.20), is used to
look up a main object in the "directory".
MainObjectArray::FindObjectByName
function (from
quantum\newquant.cpp) (Figure newquant.20)codelist/newquant.20
This one is fairly simple. It steps through the elements of the
main object directory, looking for an element whose contents are equal
to the argument to the function. If it finds such an element, it
returns the index of that element, which is the object number of the
object with that name. On the other hand, if it gets to the end of the
main object directory without finding a match, it returns the value NoObject
.
MainObjectArray::GrowMainObject
FunctionThe next function, MainObjectArray::GrowMainObject
(Figure newquant.21), is used to
increase the size of a main object when needed to accommodate more
elements.
MainObjectArray::GrowMainObject
function (from
quantum\newquant.cpp) (Figure newquant.21)codelist/newquant.21
This is about as complicated as CreateMainObject
.
However, because it is quite similar to that function, we'll be able to
shorten the explanation considerably. We start by retrieving the
existing big pointer block for the object. Then we copy the big pointer
array from that block to a new big pointer array. Next, we calculate
the size of the new little pointer array needed to hold references to
all the elements in the new, enlarged, object.
The next few lines of code might be a little mysterious if I don't
explain a few details of the size calculation. To simplify the
accounting needed to keep track of the sizes of all of the little
pointer arrays, I've decided always to allocate a little pointer array
to the largest size that will fit into a quantum. However, setting the
accessible size of an array to the actual number of elements allocated
for its little pointer arrays has the drawback that the user wouldn't
be able to find out when the program accessed an element that is within
the allocated size but outside the number of elements specified when
creating the array. Therefore, I've crafted a compromise: when an array
is created, I set the number of accessible elements to the number the
user specifies32; however, if the array is
increased in size, the new size is always equal to the total allocation
for all little pointer arrays. This means that if an array is expanded
by a reference to an element that was already allocated but was outside
the specified array size, GrowMainObject
simply resets
the size to the number of elements contained in all little pointer
arrays already extant and returns that value. Otherwise, a new little
pointer array is allocated, and the big pointer array is updated to
reflect the change.
Assuming that we actually do need to allocate some new little pointer blocks, we continue by increasing the size of the big pointer array to accommodate the new number of little pointer blocks. Then we allocate and initialize each of the new little pointer blocks in exactly the same way as we allocated and initialized the original little pointer blocks when we created the object. Then we delete the old big pointer array from the big pointer block, add the new big pointer array to the big pointer block, set the free space entry for the big pointer block to indicate that it is full, and return the new element count to the caller.
Now that we have reached the end of the discussion of the MainObjectArray
class
, we'll return to the QuantumFile
class
.
This time, we're going to get into the member functions that are used
in the implementation, since we've already dealt with those that are
used by outside programs.
QuantumFile
class
We'll skip over the QuantumFile
member functions that
fall in three categories:
1. Those that merely compute the number of blocks occupied by a
segment of the quantum file (i.e., GetMainObjectBlockCount
and QGetFreeSpaceListBlockCount
).
2. Those that merely return a block of the appropriate type to be
accessed as a little pointer block, big pointer block, and the like
(i.e., MakeFreeSpaceBlockPtr
, MakeMainObjectBlockPtr
,
MakeLittlePointerBlockPtr
, MakeBigPointerBlockPtr
,
MakeLeafBlockPtr
).
3. Those that merely call member functions of the FreeSpaceArray
class
(i.e., SetFreeSpace
, QGetFreeSpace
,
FindEmptyBlock
, FindSpaceForItem
).
We'll see how the functions in the third category work when we examine the underlying functions that they call.
QuantumFile::SetModified
FunctionHowever, that still leaves us with several functions to discuss in
the QuantumFile
class
. We'll start with QuantumFile::SetModified
,
which is shown in Figure block.06.
QuantumFile::SetModified
function (from
quantum\block.cpp) (Figure block.06)codelist/block.06
This is actually a fairly simple function. It calls a global
function called FindBuffer
to look up the specified
quantum number in the list of quantum buffers.33
If the quantum with that quantum number is in fact in one of the
quantum buffers, then the "buffer modified" flag for that buffer is
set, and the function returns. The only complication is a consistency
check in the form of an "assert" that the quantum in question is
actually in one of the buffers. If that is not the case, then we have a
serious problem, because the program should not be trying to set the
modified flag for a quantum that isn't in memory: that indicates that
the buffer that contained the quantum has been reused too soon.
QuantumFile::Position
FunctionThe next function we will examine is QuantumFile::Position
,
which is shown in Figure block.07.
QuantumFile::Position
function (from
quantum\block.cpp) (Figure block.07)codelist/block.07
This function is used by the Read
and Write
functions to position the file pointer to the correct place for reading
or writing a quantum. This is quite simple because every block is the
same size. However, it is important to remember that the quantum file
contains data other than the quanta themselves, so the block number of
a quantum is not the same as its quantum number. To convert between
these two numbers is the responsibility of the function that computes
the block number supplied to Position
.
QuantumFile::Read
FunctionThe next function we will examine is QuantumFile::Read
,
which is shown in Figure block.08.
QuantumFile::Read
function (from
quantum\block.cpp) (Figure block.08)codelist/block.08
After positioning the file pointer to the location in the file where this block is stored, we read it into the buffer. Then we set the modified flag for the buffer to FALSE, indicating that the disk and memory versions of the block are the same. Finally, we increment the read count, which is used for performance analysis.
QuantumFile::Write
FunctionThe next function we will examine is QuantumFile::Write
,
which is shown in Figure block.09.
This is identical to the Read
function, except of course
that it writes the data to the file rather than reading it.
QuantumFile::Write
function (from
quantum\block.cpp) (Figure block.09)codelist/block.09
QuantumFile::MakeBlockResident
FunctionThe last function in this class
is QuantumFile::MakeBlockResident
.
This is where we rejoin the thread of discussion about operator->
.
You see, MakeBlockResident
is called during the execution
of operator->
for any of the BlockPtr
class
es.34 Figure mbr1
shows an early implementation of this routine.
codelist/mbr1.00
This is a fairly straightforward implementation of a least recently
used (LRU) priority algorithm, but there are a few wrinkles to reduce
overhead. First, we use the reference parameter p_OldBufferNumber
to retrieve the buffer number (if any) in which the searched-for block
was located on our last attempt. If the number of the block held in
that buffer matches the one we're looking for, we will be able to avoid
the overhead of searching the entire buffer list. The reason that p_OldBufferNumber
is a reference parameter is so that we can update the caller's copy
when we locate the block in a different buffer; that way, the next time
that MakeBlockResident
is called to retrieve the same
block's address, we can check that buffer number first.
In order to make this work, we can't implement the LRU priority
list by moving the most recently used block to the front of the block
list; the saved buffer number would be useless if the blocks moved
around in the list every time a block other than the most recently used
one was referenced. Instead, each slot has an attached timestamp,
updated by calling the time
function every time the
corresponding block is referenced. When we need to free up a buffer, we
select the one with the lowest (i.e., oldest) timestamp. If that buffer
has the "modified" attribute set, then it has been updated in memory
and needs to be written back to the disk before being reused, so we do
that. Then we read the new block into the buffer, update the caller's p_OldBufferNumber
for next time, and return the address of the buffer.
This seems simple enough, without much scope for vast improvements
in efficiency; at least, it seemed that way to me, until I ran Turbo
ProfilerTM on it and discovered that it was horrendously
inefficient. The profiler indicated that 73% of the CPU time of my test
program was accounted for by calls to the time
routine!
To add insult to injury, the timing resolution of that routine is quite
coarse (approximately 18 msec), with the result that most of the blocks
would often get the same timestamp when running the program normally.35 Upon consideration, I realized that the
best possible "timestamp" would be a simple count of the number of
times that MakeBlockResident
was called. This would
entail almost no overhead and would be of exactly the correct
resolution to decide which block was least recently used; this is the
mechanism used in the current version.
One interesting consideration in the original design of this
mechanism was what size of counter to use for the timestamp. At first,
it seemed necessary (and sufficient) to use a 32-bit type such as an unsigned
long
, which would allow 4 billion accesses before the counter
would "turn over" to 0. However, because the original implementation
was compiled with a 16-bit compiler, the natural size to use would be
an unsigned
(meaning unsigned short
)
variable. Therefore, I had to decide whether a longer type was
necessary.
After some thought, I decided that it wasn't. The question is: what happens when the counter turns over? To figure this out, let's do a thought experiment, using two-byte counters. Suppose that the priority list looks like Figure timestamp1, with the latest timestamp being 65535.
Block number Timestamp
Let's suppose that the next reference is to block 9, with the counter turning over to 0. The list will now look like Figure timestamp2.
Block number Timestamp
Although the preceding analysis was good enough to convince me not to worry about the counter turning over, unfortunately it wasn't good enough to convince the machine. What actually happened was that after a large number of block references, the program started to run very slowly. I was right that the data wasn't in danger, but performance suffered greatly. Why? Let's go back to our example. Which buffer would be replaced when the next block needed to be read in? The one currently holding block 9, since it has the "lowest" priority. If that block number happened to be 32, for example, that would leave us with the arrangement in Figure timestamp3.
Block number Timestamp
The acid test of this virtual memory mechanism
is to run the program with only one block buffer; unsurprisingly, this
test revealed a nasty bug. It seems that I had neglected to initialize
the value of the EarliestStamp
variable, used to keep
track of the buffer with the earliest timestamp. When running with only
one buffer, it was possible under some circumstances for a block to be
replaced before it was ever used; when this happened, the timestamp on
the buffer it had occupied was left set to its initial value of ULONG_MAX
.
This initial value was significant because the search for the earliest
timestamp also starts out by setting the TimeStamp
variable to ULONG_MAX
, which should be greater than any
timestamp found in the search. If there were no blocks in the list with
a "real" timestamp, the conditional statement that set the EarliestStamp
value in the search loop was never executed. As a result, the EarliestStamp
variable was left in an uninitialized state, which caused a wild access
to a nonexistent block buffer. The fix was to initialize EarliestStamp
to 0, so the first buffer will be selected under these circumstances;
you can see this implemented in the current version of MakeBlockResident
(Figure block.05).
QuantumFile::MakeBlockResident
function
MakeBlockResident
function (from
quantum\block.cpp) (Figure block.05)codelist/block.05
We've already covered most of the tricky parts of this function, but
let's go over it one more time just to be on the safe side. We begin by
incrementing the counter that serves as a timestamp; if it turns over
to 0, we set all of the timestamps to 0 to prevent the newest block
from being replaced continually.36 Then we
look up the buffer number for the quantum in question, via the FindBuffer
global function. If the quantum is found in a buffer, then we merely
reset the timestamp on that buffer to indicate that it is the most
recently used, and return a pointer to that buffer for use by the operator->
function that called MakeBlockResident
.
However, in the event that the quantum we want isn't yet in a buffer, then we have to figure out which buffer it should be read into. We do this by examining the timestamp for each buffer, and selecting the buffer with the earliest timestamp.
Once we have found the buffer that we are going to reuse, we check
whether it has been modified; if so, we write it back to the disk. Then
we set its block number to the block number of the new quantum that is
being read in, set its modified flag to FALSE, set the timestamp on the
buffer to the current timestamp, and read the quantum into the buffer.
Finally, we return a pointer to the buffer for use by the calling operator->
function.
Now that we've seen how MakeBlockResident
works, let's
see how it is used in one of the "BlockPtr" class
es.
Because all of these class
es work in much the same way,
we will examine only one of them in any detail. I've chosen LittlePointerBlockPtr
because it made for a better section title, but any of them would have
done just as well. First, Figure blocki.00
shows the interface for this class
.
LittlePointerBlockPtr
class
(from quantum\blocki.h) (Figure blocki.00)codelist/blocki.00
As you can see, this interface isn't very large. In fact, with only one member variable and three functions, it's the smallest one we'll see.37 However, that doesn't mean that it is completely without interest, as we'll discover. Let's start with the normal constructor, shown in Figure block.10.
LittlePointerBlockPtr
class
(from quantum\block.cpp) (Figure block.10)codelist/block.10
This constructor is simple but unrevealing: to see the effects of
the variables it is initializing, we will have to wait until we examine
the LittlePointerBlock
class
. For now, let
me just say that the block number is the number of the block where the
little pointer block is stored, and the block manager is a pointer to
the QuantumFile
object that controls the quantum file
where the little pointer block is stored. The block variable is a
pointer to the buffer where the block is stored in memory; it is
assigned a value in the next function we will examine, operator->
(Figure block.11).
LittlePointerBlockPtr::operator->
Function
LittlePointerBlockPtr::operator->
function
(from quantum\block.cpp) (Figure block.11)codelist/block.11
As we've seen in the prior discussion of operator->
,
the return value of this function will be used to point to the object
for which the final function will be executed. In this case, that
object will be the little pointer block object embedded in this LittlePointerBlockPtr
object. The main purpose of this operator->
function
is to make sure that the block that the little pointer block object
refers to is in memory before we try to do anything with it. That's why
QuantumFile::MakeBlockResident
exists, and that's why we
call it here. Once that function returns, we know that the little
pointer block is resident in memory. At that point, we set the item
index member variable to the address of the item index in the block,
call LittlePointerBlock::SetLittlePointerArray
to allow
access to the little pointer array in the block, and return the address
of our little pointer block object.
When the function that called our operator->
resumes execution, it will reapply the same function call to the
pointer that we return. This time, because the return value is in fact
a pointer rather than an object, the function call will go through to
its final destination.38 This would seem
to be a logical time to start our examination of how the body class
,
LittlePointerBlock
, works. However, before we get to that class
,
we have quite a bit of ground to cover. You see, the block class
es
are arranged in an inheritance tree that begins with a very simple
structure called Block
, and builds in complexity from
there. The best place to start is at the beginning, so that's what
we'll do.
Figure blocki.01 shows the
definition of the Block union
.
Block union
(from
quantum\blocki.h) (Figure blocki.01)codelist/blocki.01
I don't use the union
type very much. In fact, this Block
type is the only example of a union
in this book. That's
because they can be the source of subtle errors if you don't keep track
of the actual type of the data in a union
very carefully.
However, in this case it came in very handy because the different types
of blocks have to be identified at run-time anyway; putting the logic
that discriminates among the various types of blocks into the various
block class
es should make it reasonably safe, and in fact
I haven't seen any problems from this implementation decision.
So what are all the types of data that can be stored in a Block
?
The simplest is an array of char
taking up the entire
block; this low-level type is used when we are copying a block or
otherwise treating it as simply a bunch of undifferentiated data. The
next possibility, a 64-element array of char
, is used in
the header block of a quantum file, and consists of a 64 character
"version information string". This is used to determine whether the
version of the quantum file implementation that wrote the file is the
same as the current implementation, as described earlier. The next type
is an array of QFreeSpaceEntry
elements, which is used in
the free space list class
es. The next possible type is an
array of MainObjectEntry
elements, which is used in the
main object class
es. Finally, a Block
can
contain a QuantumBlockStruct
, which is the data type used
to access blocks that contain normal quanta.
QuantumBlockStruct struct
The last of these types is actually a bit more complicated than it
looks, as we'll see when we examine the definition of QuantumBlockStruct
,
shown in Figure blocki.02.
QuantumBlockStruct struct
(from quantum\blocki.h) (Figure blocki.02)codelist/blocki.02
What makes this apparently innocent QuantumBlockStruct
data type more mysterious than it looks? The clue is in the comment on
the m_ItemIndex
variable: this is actually a place holder
for a variable-length array of item index elements. Although you cannot
directly declare a variable-length array in C or C++, it is possible to
use such a place holder to simplify the calculation of the beginning
address of a variable-length array that is "mapped" onto a buffer, as
we do when accessing data in a quantum block. We'll see how this works
when we get to the first of the block class
es that is
used for accessing data in the normal quantum format rather than as one
of the other types (e.g., free space entry or main object entries).
QuantumBlockHeader struct
We should also take a look at the definition of QuantumBlockHeader
,
shown in Figure blocki.03.
QuantumBlockHeader struct
(from quantum\blocki.h) (Figure blocki.03)codelist/blocki.03
This struct
represents the data that is present at
the beginning of every block that holds a normal quantum. It consists
of three fields: the quantum type field, the object number field, and
the item count field.
The quantum type field contains a value indicating the type of the quantum, such as "little pointer array", "big pointer array", or "leaf node". This information could be used to allow a deeper hierarchy of pointer types, but currently it is used only for consistency checking. That is, the top-level quantum pointed to by a main object entry must be a big pointer array quantum, the quantum pointed to by a big pointer array entry must be a little pointer array quantum, and the quantum pointed to by a little pointer entry must be a leaf node (i.e., one that contains user data).
The object number field contains the number of the main object to which this quantum belongs. This is used when we update the free space list, because we do not share a quantum among more than one main object.
The item count field represents the number of data items in this
quantum, which is also the number of entries in the variable-length m_ItemIndex
array.
BaseBlockPtr class
Now that we've discussed those data types, we're ready to look at
the base class
of the block types, BaseBlockPtr
,
whose interface is shown in Figure blocki.04.
We've already discussed the purpose of the Block
data
type, so I don't have to explain why we have such a variable in this class
.
The block number variable keeps track of which block in the quantum
file this object represents. The block manager variable represents the
quantum file object itself, which is used when we are reading or
writing the block. As for the implementation of the only member
function (Figure block.11a), it
merely sets the block number and block manager variable to 0, so I
won't bother to discuss it further.
BaseBlockPtr class
(from
quantum\blocki.h) (Figure blocki.04)codelist/blocki.04
BaseBlockPtr class
(from quantum\block.cpp) (Figure block.11a)codelist/block.11a
Now we're ready to start looking at the derived block class
es.
I won't bother going over the Ptr
handle classes, because
they are all essentially identical to LittlePointerBlockPtr
,
which we've already covered. However, each of the body class
es
has its own set of member functions appropriate to its specific task,
so we'll look at each one in turn, starting with the FreeSpaceBlock
class
.39
FreeSpaceBlockPtr::FreeSpaceBlock class
The interface for FreeSpaceBlock
is shown in Figure blocki.05.
FreeSpaceBlock class
(from
quantum\blocki.h) (Figure blocki.05)codelist/blocki.05
The default constructor (Figure block.12)
and normal constructor (Figure block.13)
for this class
merely initialize the block pointer, block
number, and block manager variables for the free space block, exactly
as the analogous functions do in the LittlePointerBlockPtr
class
that we've already examined; we'll see how those
variables are used when we get to the other member functions.
FreeSpaceBlock class
(from quantum\block.cpp) (Figure block.12)codelist/block.12
FreeSpaceBlock class
(from quantum\block.cpp) (Figure block.13)codelist/block.13
The Get
function (Figure block.14) returns an element of the
free space data array in the block, using the m_FreeSpaceData
member of the Block
union
.
FreeSpaceBlock::Get
function (from
quantum\block.cpp) (Figure block.14)codelist/block.14
The Set
function (Figure block.15) sets the value of an element
of the free space data array in the block, also using the m_FreeSpaceData
union
member.
FreeSpaceBlock::Set
function (from
quantum\block.cpp) (Figure block.15)codelist/block.15
The MakeLeafBlockPtr
function (Figure block.16) is used in the FreeSpaceArrayPtr::FreeSpaceArray::FindSpaceForItem
function to allow that function to access a block as a leaf block so
that it can be used to store data. As you can see, this function calls
the corresponding function in the QuantumFile
class
to do the actual work.
FreeSpaceBlock::MakeLeafBlockPtr
function (from
quantum\block.cpp) (Figure block.16)codelist/block.16
MainObjectBlockPtr::MainObjectBlock class
The interface for MainObjectBlock
is shown in Figure blocki.06.
MainObjectBlock class
(from
quantum\blocki.h) (Figure blocki.06)codelist/blocki.06
As you can see, this interface is exactly the same as the one for FreeSpaceArrayPtr::FreeSpaceArray
,
except that the MakeLeafBlockPtr
function is not included
and the types used by the Get
and Set
functions are different, as is necessary for their proper functioning.
Because the default and normal constructors are exactly the same as
those in the previous class
, I'm not going to waste space
reproducing them here, so let's move right along to the Get
and Set
functions.
The Get
function (Figure block.18) returns an element of the
main object array in the block, using the m_MainObjectData
union
member.
MainObjectBlock::Get
function (from
quantum\block.cpp) (Figure block.18)codelist/block.18
The Set
function (Figure block.19) sets the value of an element
of the main object array in the block, also using the m_MainObjectData
union
member.
MainObjectBlock::Set
function (from
quantum\block.cpp) (Figure block.19)codelist/block.19
QuantumBlock class
The interface for QuantumBlock
is shown in Figure blocki.07. This is a more interesting
class
than the previous few we've discussed; it has a
number of fairly complicated functions that we will need to go over in
detail. Probably the best place to start is with the AddItem
function, because that is the function that adds each item to a
quantum.
QuantumBlock class
(from
quantum\blocki.h) (Figure blocki.07)codelist/blocki.07
QuantumBlock::AddItem
FunctionSince there are actually two versions of this function, let's start
with the simpler one, which takes a ModifiableElement
(i.e., String
) first argument. The code for this function
is shown in Figure block.20.
QuantumBlock::AddItem
function (from quantum\block.cpp) (Figure block.20)codelist/block.20
As you can see, this function retrieves the size and data address
from its ModifiableElement
argument and then calls the
other overloaded AddItem
function, returning the result
of that function to its caller. So let's continue by looking at the
other AddItem
function, whose code is shown in Figure block.21.
QuantumBlock::AddItem
function (from quantum\block.cpp) (Figure block.21)codelist/block.21
This starts by setting the modified flag for its block; because we
are going to modify the block, we want to make sure that the block is
written back to disk before the buffer is reused. Then we call a global
function called FindUnusedItem
to locate an item index
entry that we can use for our new item.40
Then we call CalculateFreeSpace
to figure out how much
space is available in this block. This is actually a safety measure, as
the calling function is responsible for making sure that this block has
enough free space before attempting to add an item to the block. But
I'd rather be safe than sorry, so I check whether the free space is
insufficient; if not, this is an error, which is trapped by the assert
that immediately follows.
However, let's assume that we do have sufficient room to store the
new item. In that case, we continue by checking whether the item number
that was returned by FindUnusedItem
is greater than or
equal to 0. If so, we are going to reuse an item that has been
previously used, so we set its type and index value to the new values
supplied as arguments to the AddItem
function.
We've referred to the ItemIndex
data type a few times
in the past, but haven't really looked at what it consists of. I think
this would be a good time to do so, so let's take a look at its
definition, which is shown in Figure blocki.08.
ItemIndex struct
(from
quantum\blocki.h) (Figure blocki.08)codelist/blocki.08
This is a fairly simple data type, consisting of an index entry (which represents the element number of this item in the array of which it is an element), the offset of the beginning of the data for the element from the end of the block, and a type indicator that specifies the type of the element. While the purpose of the offset member variable should be fairly obvious, the purpose of the other two member variables may not be. The purpose of the type indicator is consistency checking; if we're trying to access a little pointer array, we can use this type indicator to make sure that we are getting what we expect. In a similar vein, the purpose of the index entry is to provide information needed by a program that will recover data from a quantum file that has become corrupted.
Now let's continue with our analysis of the AddItem
function (Figure block.21), where
we have just updated the type and index of the item index entry that we
are reusing to the new values supplied as arguments to this function.
Next, we extract the offset field from that item index entry and call GetItemCount
to find out how many items are in the index. Now we have to calculate
the parameters needed to shift the data already in the quantum so that
we have room for our new item. The offset of the last item is equal to
the distance in bytes from the beginning of that item to the first byte
after the end of the quantum. Therefore, the address of the beginning
of the last item in the quantum, DataFrom
, can be
calculated as the address of the beginning of the buffer in which the
quantum resides, plus the size of a quantum, minus the offset of the
last item in the quantum. The byte at that address, and all those after
it up to the starting address for the new item's data, must be moved
down in memory by enough to leave room for the new data.41
Therefore, the new address of that first byte of the last item's data, DataTo
,
will be its old address minus p_ElementSize
. We want to
move all of the items that precede our new item in memory, which means
those that have higher item numbers; therefore, the number of bytes we
want to move, DataLength
, is equal to the distance from
the beginning of the last item to the beginning of the area where we
will store our new item. Now that we have calculated these parameters,
we can call the global BlockMove
function to perform the
transfer.
Now we have moved the existing data, but the index still refers to
the old locations of the data, so we have to call the global AdjustOffset
function to adjust the elements of the item index that refer to data
that has been moved. Once we have adjusted the item index, we calculate
the address where the new data will be stored and call BlockMove
again, this time to copy the new data to the correct place in the
block. Finally, we calculate the return value from this function, which
is the item number of the item we have entered, as one more than the
index of the item entry we reused; this adjustment is needed because
item numbers start at 1 rather than 0.
I think an example might be very helpful here. Suppose that we want to insert a new field with index number 6 to store the company name ("Chrysalis") in the following quantum, reusing the currently unused fourth item (see Figure insert1).
+-
| Item # Offset Type Index
| +---------------------------------+
| 1 | 5 -+ VARSTRING 4 |
Item | 2 | 17 +| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 19 || UNUSED 0 |
| 5 | ++-31 || VARSTRING 1 |
| 6 |+++-38 || VARSTRING 2 |
| ++++----++------------------------+
+- ||| |+------------------------+
||| +-------------+ |
||+----------------+ | |
+- |+-----+ | | |
| +--+------+-----------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335NYSteve Heller11510|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666
| 111111111111111111111111111111111111111
Address | 000011111111112222222222333333333344444
| 678901234567890123456789012345678901234
+-
After the move, the quantum looks like Figure insert2.
+-
| Item # Offset Type Index
| +--------------------------------------+
| 1 | 5 -+ VARSTRING 4 |
Item | 2 | 17 +| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 19 || UNUSED 0 |
| 5 | ++-31 || VARSTRING 1 |
| 6 |+++-38 || VARSTRING 2 |
| ++++----++-----------------------------+
+- ||| |+---------------------------------+
||| +----------------------+ |
||+-------------------------+ | |
+- |+--------------+ | | |
| +--------+ | | | |
| +-----------+------+-----------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335?????????NYSteve Heller11510|
Data | +-------------------------------------------------+
+-
+-
Quantum | 444444443333333333222222222211111111110000000000
Offset | 765432109876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666666666666
| 999111111111111111111111111111111111111111111111
Address | 999000000000011111111112222222222333333333344444
| 789012345678901234567890123456789012345678901234
+-
We're almost done with this insertion. All we
have left is to calculate the actual position of the new item in the
quantum as the quantum buffer's address + the size of a quantum - the
offset of this item, (4096+2048-28, or 6116) and store the result in DataTo
.
Then we can call BlockMove
to copy the new item's data to
that position, and finish up by setting the relative item number (which
is the return value from this function) to ItemFound
+ 1
and returning it. The final quantum contents look like Figure insert3.
+-
| Item # Offset Type Index
| +---------------------------------------+
| 1 | 5 --+ VARSTRING 4 |
Item | 2 | 17 -+| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 28-+|| VARSTRING 6 |
| 5 | ++-40 ||| VARSTRING 1 |
| 6 |+++-47 ||| VARSTRING 2 |
| ++++----+++-----------------------------+
+- ||| ||+--------------------------------+
||| |+---------------------+ |
||+----+--------------------+ | |
+- || +-----------+ | | |
| || | | | |
| |+-----+ | | | |
| +--+------+-----------+--------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335ChrysalisNYSteve Heller11510|
Data | +-------------------------------------------------+
+-
+-
Quantum | 444444443333333333222222222211111111110000000000
Offset | 765432109876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666666666666
| 999111111111111111111111111111111111111111111111
Address | 999000000000011111111112222222222333333333344444
| 789012345678901234567890123456789012345678901234
+-
QuantumBlock::GetModifiableItem
FunctionNow that we have seen how to add an item to a
quantum, perhaps we should take a look at the function that allows us
to get it back: GetModifiableItem
(shown in Figure block.22).
QuantumBlock::GetModifiableItem
Function (from
quantum\block.cpp) (Figure block.22)codelist/block.22
As you might imagine, retrieving an item is simpler than storing
the item in the first place, because we don't have to move anything
around. We start with an assert that checks two conditions: that the
item number is greater than 0 and that it is less than or equal to the
item count. Once we have passed that sanity check, we decrement the
item number to convert it to an index into the item index array, then
retrieve the offset of the item from its item index entry. Then we
calculate its actual starting address and length, which gives us the
information necessary to create a ModifiableElement
from
the data. We do so, and return the result to the caller.
QuantumBlock::DeleteItem
functionThe third fundamental operation that we can perform on an item is to
delete it. This operation is used both when simply deleting an item
from the main object and when changing the value of an item; to change
an item, we actually delete the previous item and add a new one. So
let's take a look at DeleteItem
(shown in Figure block.23).
QuantumBlock::DeleteItem
function (from
quantum\block.cpp) (Figure block.23)codelist/block.23
After checking that the item number is legal and converting it to
an index into the item index array, we determine whether the item to be
deleted is the last one in the quantum. If so, we set its type to
UNUSED_ITEM. While this would be sufficient to allow the reuse of this
item index entry, if we stopped there, we would never be able to reduce
the size of the item index; eventually, the quantum would fill up with
unused item index entries, leaving no room for actual data. To prevent
this, we clean up any trailing unused entries in the item index by
searching backwards from the end of the item index, clearing any
trailing unused entries to zeroes.42 When
we get to an item index entry that is in use, or to the beginning of
the item index, we stop searching. At that point, we reset the item
count to 1 more than the last value of the index in the loop. If that
last loop index value was -1, then we have emptied the quantum, so we
call Clear
to reset it to an unused state. Finally, we
call UpdateFreeSpace
to recalculate the amount of free
space in the quantum and return the result to the caller.
That takes care of deleting the last item in a quantum, but what if
we need to delete some other item? In that case, we extract the
starting offset of the item to be deleted and calculate its length. For
the first item, the item length is equal to the m_Offset
field in the item, because the data for the first item in the quantum
is stored at the end of the quantum, as illustrated in Figure itemref1. For other items, the item
length is equal to the offset of the previous item subtracted from the
offset of the current item. This may seem backward, as the figure shows
that the second item is stored before the first, the third before the
second, and so on. However, the offset of an item is defined as the
distance in bytes from the start of the item to the first byte after
the end of the quantum, which is always nonnegative.
We also have to set the offset of this item to the offset of the previous item (if there is a previous item), which will indicate that this item's length is zero. For the first item, we can just set the offset to zero, since there is no previous item to worry about.
Now we're ready to compute the offset of the last item, the source
and destination addresses for the data to be moved into the deleted
item's space, and the length of that data. We calculate the source
address (DataFrom
) as the address of the last item (m_Block->m_Data
+BlockSize
-LastItemOffset
),
the destination address (DataTo
) as the source address
added to the length of the item we are deleting (DataFrom
+item
length), and the number of bytes to move (DataLength
) as
the total length of all the items from the one we are deleting to the
last item (LastItemOffset
- StartingOffset
)
Then we call BlockMove
to transfer the data and AdjustOffset
to update the offsets of the other items. Next, we set the item's type
to UNUSED_ITEM and its offset to the same value as the previous entry
in the item index (or 0, if it is the first item) so that the length of
the next item can be properly calculated. Finally, we call UpdateFreeSpace
and return to the calling function.
To make this clearer, let's look at what happens if we delete the
item holding the state name from the example in Figure itemref4. Let's suppose the address of
the quantum is 4096. The offset of the last item, marked by "^", is 41;
substituting values in the expression for DataFrom
, m_Block->m_Data+
BlockSize
-
LastItemOffset
equals 4096+2048-41, or 6103. The offset of the item we're deleting,
marked by the "*", is 19, and the offset of the previous item (marked
by the "&") is 17, which makes the item length 2; therefore, the DataTo
address is 6103+2, or 6105. This is where the last item will start
after the move, since we are removing the 2 bytes of the item being
deleted. Finally, DataLength
is the amount of data to be
moved, which is the length of all the items with higher index numbers
than the one being deleted, in this case 41-19, or 22.
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +----------------------+
Reference | 1 | 3 4 +----------------------+-+
Array | 2 | 3 5 +----------------------+-+-+
(IRA) | 3 | 3 3 +-------------------+ | | |
| 4 | 3 1 +-----------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +----------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | +-19 || VARSTRING 3 |±----+ | |
Quantum | 4 | ++-34 || VARSTRING 1 |±---------+ |
3 | 5 |+++-41 || VARSTRING 2 |±-----------+
| 6 |||| 41 || UNUSED 0 |
| ++++----++-------------------+
+- ||| |+---------------------------+
||| +----------------+ |
||+-------------------+ | |
+- |+-----+ | | |
| +--+------+--------------+-+-----------+----+
Quantum | | Baldwin123 Main StreetNYSteve Heller11510|
Data | +-------------------------------------------+
+- ^ * &
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-
After we do the move, the result looks like
Figure itemref5. The data is in
the right place now, but the pointers are wrong. Specifically, the m_Offset
fields of the items that have higher item numbers than the one we have
deleted are off by the length of the deleted field. That's why we have
to call AdjustOffset
to reduce the m_Offset
field of every item after the deleted one by ItemLength
,
which is the length of the deleted item. Then we set the data type of
the deleted item in the item index to UNUSED_ITEM
.
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +----------------------+
Reference | 1 | 3 4 +----------------------+-+
Array | 2 | 3 5 +----------------------+-+-+
(IRA) | 3 | 3 3 +-------------------+ | | |
| 4 | 3 1 +-----------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +----------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | +-19 || VARSTRING 3 |±----+ | |
Quantum | 4 | ++-34 || VARSTRING 1 |±---------+ |
3 | 5 |+++-41 || VARSTRING 2 |±-----------+
| 6 |||| 41 || UNUSED 0 |
| ++++----++-------------------+
+- ||| |+---------------------------+
||| +----------------+ |
||+-------------------+ | |
+- |+-----+ | | |
| +--+------+--------------+-+-----------+----+
Quantum | | Baldwin123 Main StreetSteve Heller11510|
Data | +-------------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-
After correcting the index, our quantum looks like Figure itemref6. We can now access the data in the quantum correctly, as long as we use the "official" route through the IRA and the item index.43
+-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +--------------------+
Reference | 1 | 3 4 +--------------------+-+
Array | 2 | 3 5 +--------------------+-+-+
(IRA) | 3 | 3 3 +-----------------+ | | |
| 4 | 3 1 +---------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +--------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | 17 || UNUSED 0 |±----+ | |
Quantum | 4 | +--32 || VARSTRING 1 |±---------+ |
3 | 5 |++--39 || VARSTRING 2 |±-----------+
| 6 ||| 39 || UNUSED 0 |
| +++-----++-----------------+
+- || |+---------------------------+
|| +----------------+ |
|+-------+ | |
+- +-+ | | |
| +----+------+--------------+-----------+----+
Quantum | | Baldwin123 Main StreetSteve Heller11510|
Data | +-------------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-
QuantumBlock::CalculateFreeSpace
functionAnother interesting function in this class
is CalculateFreeSpace
. You might think that this would be
a very simple function, but it has a few twists, as you will see by
looking at its code, shown in Figure block.24.
QuantumBlock::CalculateFreeSpace
function (from
quantum\block.cpp) (Figure block.24)codelist/block.24
As we frequently do, we begin with an assert that tests the
validity of the key data used by the function; in this case, that takes
the form of checking that the number of items in this quantum is within
the legal limit, i.e., that it is less than the NoItem
constant used to indicate an invalid item number. Assuming that it
passes that test, we then check whether we have already stored the
maximum possible number of items in this quantum. If so, we "lie" to
the caller by returning the value 0, indicating that there is no space
available in this quantum. Of course, this is a white lie, because
without an available entry in the item index, we cannot in fact add any
new items to this quantum, so the caller needs to look elsewhere.
Assuming that we pass that test, we can calculate the free space in the quantum by a series of steps. First, we subtract the size of the quantum block header (plus a margin of safety) from the size of a block. If the number of items in the block is equal to 0, we have the answer: the quantum should be marked as having the maximum possible free space, so we compute the free space code that indicates this situation and return it to the caller.44 On the other hand, if there are some items in the quantum then we calculate the offset of the last item in the quantum and subtract it from the free space. Then we calculate the size of the item index as the number of entries in the item index multiplied by the size of one entry, and subtract it from the free space. If the result comes out to be less than 0, we return 0; otherwise, we return the calculated free space.
QuantumBlock::UpdateFreeSpace
FunctionAs long as we're discussing the intriguing topic of free space, we
might as well take a look at UpdateFreeSpace
, whose code
is shown in Figure block.25.
QuantumBlock::UpdateFreeSpace
function (from
quantum\block.cpp) (Figure block.25)codelist/block.25
Why do we need this function, when we already have a function to
calculate the amount of free space in a quantum? This is not just
another version of the CalculateFreeSpace
function,
because it actually updates the information in the free space list. As
you may recall, the free space list stores more than just the amount of
free space in a quantum; it also stores the object number of the main
object that quantum belongs to. This is so that we can avoid scanning a
lot of quanta to find one that has enough free space in it and also
belongs to the main object that we need the additional space for. Now
that we know what this function does, it shouldn't be too hard to
determine how it does it. First, it sets the object number of a newly
created free space object to the object number of the block for which
the free space is being updated. Then it converts the free space
calculated by the CalculateFreeSpace
function to a free
space code, which occupies only one byte. Finally, it uses the SetFreeSpace
function of the quantum file class to update the free space list with
the new information for this quantum.
The rest of the functions in the quantum block class aren't particularly interesting or instructive, so we won't bother to go over them. You shouldn't have much trouble understanding how they work by simply looking at the code, which is all on the CD ROM in the back of the book, along with the DJGPP compiler. The next class we will examine is the big pointer block class, which is derived from the quantum block class.
BigPointerBlock class
The interface for BigPointerBlock
is shown in Figure blocki.09.
BigPointerBlock class
(from
quantum\blocki.h) (Figure blocki.09)codelist/blocki.09
Most of the functions in this class aren't very interesting, so we
won't bother to go over them. However, there are a couple that we
should take a quick look at, because they contain a few tricks that you
should understand if you're going to have to modify the behavior of any
of these classes. We'll start with SetBigArrayHeader
,
whose code is shown in Figure block.26.
BigPointerBlock::SetBigArrayHeader
Function
BigPointerBlock::SetBigArrayHeader
function (from
quantum\block.cpp) (Figure block.26)codelist/block.26
The purpose of this function is to allow us to access data that is
stored in the big pointer block quantum. To be precise, the data that
we are going to access is the "big array header", which includes
information about the number of elements in the array, the maximum
number of elements in the array, and the last quantum where we added
data for this array.45 As you can see by
looking at the code, the member variable m_BigArrayHeader
is a pointer to the data in the quantum. This is not hazardous, because
the header consists of several fields, each of which is fixed length;
therefore, we don't have to worry about running off the end of the data
in the header.
Once we have executed the SetBigArrayHeader
function,
we can call the various access functions that get and set these values
in the array header, where it resides in the big pointer array quantum
buffer in memory.46
BigPointerBlock::SetBigPointerArray
FunctionThe next function we'll look at is SetBigPointerArray
,
whose code is shown in Figure block.27.
BigPointerBlock::SetBigPointerArray
function
(from quantum\block.cpp) (Figure block.27)codelist/block.27
This function is superficially similar to the one that we just
discussed; its purpose is to allow us to access data that is in the big
pointer array quantum. However, there is a major difference between
these two functions: while the previous function could safely provide
us with a simple pointer to the big array header data structure in the
big pointer array quantum, that would be extremely unsafe in the case
of the big pointer array itself. The problem is that a big pointer
array can have a variable number of elements in it, so it would be all
too easy for us to accidentally step off the end of the big pointer
array, with potentially disastrous results. To prevent such a
possibility, I have created the data type called AccessVector
.
The purpose of this data type is to combine the safety features of a
normal SVector
with the ability to specify the address
where the data for the SVector
should start, rather than
relying on the run-time memory allocation library to assign the
address. Because this data type is designed to refer to an existing
area of memory, the copy constructor is defaulted, as is the assignment
operator and the destructor, and there is no SetSize
function such as exists for "regular" SVectors
. This data
type allows us to "map" a predefined structure onto an existing set of
data, which is exactly what we need to access a big pointer array
safely.
As this suggests, the "big array header" is an AccessVector
variable, which we can use just as though it were a normal SVector
.47
LittlePointerBlock class
The interface for LittlePointerBlock
is shown in
Figure blocki.10.
LittlePointerBlock class
(from
quantum\blocki.h) (Figure blocki.10)codelist/blocki.10
There's nothing in this class that isn't exactly analogous to the corresponding functions in the big pointer array class. Therefore, I won't waste either your time or mine by repeating the analysis of the big pointer array class here. Instead, let's move along to another class that is somewhat more interesting, if only because it seems to have no purpose for existing.
LeafBlock class
The interface for LeafBlock
is shown in Figure blocki.11.
LeafBlock class
(from
quantum\blocki.h) (Figure blocki.11)codelist/blocki.11
This is a real oddity: a class
that defines no new
member functions or member variables. Of what value could such a class
possibly be?
The answer is that it provides a "hook" for attaching a handle class
,
namely LeafBlockPtr
. This allows us to use a leaf block
just as we would any of the other quantum class
es. If we
did not have this class
, we could always create another class
called QuantumBlockPtr
, which would have much the same
effect as creating this class
. So why did I create this class
in the first place?
The answer is that originally it did have some member functions,
but they eventually turned out to be superfluous. At this point in the
development of this project, it would probably be unwise for me to go
through the code to root out all of the references to this class
.
And after all, a class
that defines no new member
functions or member variables certainly can't take up too much extra
space; in fact, using this class
should have absolutely
no effect on the size of the program or its execution time, so I think
I'll leave it just as it is, at least for now.
FreeSpaceArray class
Finally, we're finished with the block class
es. Our
next target of opportunity will be the class
es that
maintain and provide access to the free space list. We'll start with FreeSpaceArray
,
whose interface is shown in Figure newquant.22.
FreeSpaceArray class
(from
quantum\newquant.h) (Figure newquant.22)codelist/newquant.22
FreeSpaceArray
The first function we'll look at is the normal constructor for FreeSpaceArray
,
whose code is shown in Figure newquant.23.
FreeSpaceArray
(from
quantum\newquant.cpp) (Figure newquant.23)codelist/newquant.23
As you can see, all this function does (as is common in the case of
constructors) is to initialize a number of member variables. Most of
these initializations are fairly straightforward, but we should go over
them briefly. First, we set the current lowest free block number to 0,
because we have no idea what blocks might be free in the free space
list, as we have not looked through it yet. Then we get the free space
list count, the number of blocks in the free space list, and the
quantum number adjustment from the quantum file object; this last value
is used when we need to convert between block numbers and quantum
numbers. Next, we resize the block pointer SVector
so it
can hold block pointers for all of the free space list blocks in the
quantum file. Finally, we assign free space block pointers to all the
elements of that SVector
. Now we are ready to access the
free space list.
FreeSpaceArray::Get
FunctionThe next function we'll look at is FreeSpaceArray::Get
,
whose code is shown in Figure newquant.24.
FreeSpaceArray::Get
function (from
quantum\newquant.cpp) (Figure newquant.24)codelist/newquant.24
The operation of this function is fairly straightforward. First, we
check whether we're trying to access something that is off the end of
the free space list. If so, we return a value that indicates that there
is no free space in the quantum for which information was requested.
However, if the input argument is valid, we calculate which block and
which element in that block contains the information we need. We then
call the Get
function of the block pointer to retrieve
that element. Finally, we return the result to the caller.
FreeSpaceArray::Set
FunctionThe next function we'll look at is FreeSpaceArray::Set
,
whose code is shown in Figure newquant.25.
FreeSpaceArray::Set
function (from
quantum\newquant.cpp) (Figure newquant.25)codelist/newquant.25
This function is very similar to its counterpart, the Get
function. However, there is one difference that we should look at: if
the entry that we have just found in the array indicates that its
quantum is completely empty (i.e., has the maximum available space) and
this entry has a lower index than the current value of the "lowest free
block" variable, then we reset the "lowest free block" variable to
indicate that this is the lowest free block.
This is an optimization whose purpose is to avoid searching the entire free list every time we want to find a block that isn't committed to any particular main object. In both the previous C implementation and the current C++ one, we first check the last quantum to which we added an item; if that has enough space to add the new item, we use it.48 In the old implementation, the free space list contained only a "free space code", indicating how much space was available in the quantum but not which object it belonged to. Therefore, when we wanted to find a quantum belonging to the current object that had enough space to store a new item, we couldn't use the free space list directly. As a substitute, the C code went through the current little pointer array, looking up each quantum referenced in that array in the free space list; if one of them had enough space, we used it. However, this was quite inefficient; since each quantum can hold dozens or hundreds of items, this algorithm might require us to look at the same quantum that many times!49 Although this wasn't too important in the old implementation, where the free space list was held in memory, it could cause serious delays in the current one if we used the standard virtual memory services to access the free space list. The free space list in the old program took up 16K, one byte for each quantum in the maximum quantum file size allowed. In the new implementation, using 16K blocks of virtual memory, that same free space list would occupy only one block, so searching such a list would not require any extra disk accesses. However, the current implementation can handle much larger quantum files that might contain tens or hundreds of thousands of blocks, with correspondingly larger free space lists. Using the old method, searching the free space list from beginning to end could take quite a while, because the search routine would not access the list in a linear manner and therefore might require extra disk accesses to access the same free space list entries several times. At the very least, the free space blocks would be artificially promoted to higher levels of activity and would therefore tend to crowd other quanta out of the buffers.
Even if the free space blocks were already resident, virtual memory accesses are considerably slower than "regular" accesses; it would be much faster to scan the free space list sequentially by quantum number than randomly according to the entries in the little pointer array. Of course, we could make a list of which quanta we had already examined and skip the check in those cases, but I decided to simplify matters by another method.
FreeSpaceArray::FindSpaceForItem
FunctionIn the current implementation, the free space list contains not just
the free space for each quantum but also which object it belongs to (if
any).50 This lets us write a FreeSpaceArray::FindSpaceForItem
routine that finds a place to store a new item by scanning each block
of the free list sequentially in memory, rather than using a virtual
memory access to retrieve each free space entry; we stop when we find a
quantum that belongs to the current object and has enough free space
left to store the item (Figure newquant.26).51
FreeSpaceArray::FindSpaceForItem
function (from
quantum\newquant.cpp) (Figure newquant.26)codelist/newquant.26
However, if there isn't a quantum in the free space list that belongs to our desired main object and also has enough space left to add the new item, then we have to start a new quantum; how do we decide which one to use?
One way is to keep track of the first free space block we find in our search and use it if we can't find a suitable block already belonging to our object. However, I want to bias the storage mechanism to use blocks as close as possible to the beginning of the file, which should reduce head motion, as well as making it possible to shrink the file's allocated size if the amount of data stored in it decreases. My solution is to take a free block whenever it appears; if that happens to be before a suitable block belonging to the current object, so be it. This appears to be a self-limiting problem, since the next time we want to add to the same object, the newly assigned block will be employed if it has enough free space and is the first suitable block in the list.
This approach solves another problem as well, which is how we determine when to stop scanning the free space list in the first place. Of course, we could also maintain the block number of the last occupied block in the file and stop there. However, I felt this was unnecessary, since stopping at the first free block provides a natural shortcut, without contributing any obvious problems of its own. However, as with many design decisions, my analysis could be flawed: there's a possibility that using this algorithm with many additions and deletions could reduce the space efficiency of the file, although I haven't seen such an effect in my testing.
This mechanism did not mature without some growing pains. For
example, the one-byte FreeSpaceCode
code, used to
indicate the approximate space available in a quantum, is calculated by
dividing the size by a constant (32 in the case of 16K blocks) and
discarding the remainder. As a result, the size code calculated for
items of less than 32 bytes is 0. Since the original check for
sufficient space in a quantum tested whether the space code for the
quantum was greater than or equal to the space code for the new item,
even blocks with no free space (i.e., those having a free space code of
0) were considered possible storage places for these very small items.
This resulted in a large number of unnecessary disk accesses to read
those quanta so that their actual free space could be calculated.
Changing from ">=" to ">" in the test fixed this; of course, I
could also have made the size calculation round up rather than down.
Another problem I encountered with the free space handling in the new
algorithm is that blocks and quanta are no longer synonymous, as they
were in the old program. That is, the virtual memory system deals not
only with quanta (i.e., blocks containing user data) but also with free
space blocks and the block(s) occupied by the main object list. At one
point, some routines dealing with the free space list were using
physical block numbers and others were using logical quantum numbers;
however, since only blocks containing user data are controlled by the
free space list, the correct solution was to use only logical quantum
numbers in all such routines.
FreeSpaceArray::FindEmptyBlock
FunctionThe last function that we will cover in this class
is FindEmptyBlock
,
which is shown in Figure newquant.27.
FreeSpaceArray::FindEmptyBlock
function (from
quantum\newquant.cpp) (Figure newquant.27)codelist/newquant.27
As you can see, this code uses the member variable called m_CurrentLowestFreeBlock
to reduce the amount of time needed to find a free block as the file
fills up. Each time that a free block with a block number less than the
current value is created or located, the m_CurrentLowestFreeBlock
variable is updated to point to that block, and FindEmptyBlock
starts at that location when looking for an empty block. FindEmptyBlock
has an understandable resemblance to FindSpaceForItem
,
since both of them search the free space list looking for a quantum
with a certain amount of space available; however, this resemblance
misled me into introducing a bug in FindEmptyBlock
. The
problem came from the fact that I was using m_CurrentLowestFreeBlock
to determine the starting block and element for searching the free
space list before entering the outer loop, rather than initializing the
loop indices in the for
statements. This worked properly
on the first time through the outer loop that is executed once for each
block in the free space list, but when all the elements of the first
block had been read and the outer loop index was incremented to refer
to the next block, the inner index was left unchanged rather than being
reset to 0 to address the first element of the new block. This would
have showed itself in some unpleasant way after the blocks controlled
by the first block of the free space list were allocated to objects,
but luckily I found it by examination before that occurred.
FreeSpaceArrayPtr class
The last class involved in the maintenance of the free space list is
FreeSpaceArrayPtr
, whose interface is shown in Figure newquant.28.
FreeSpaceArrayPtr class
(from
quantum\newquant.h) (Figure newquant.28)codelist/newquant.28
Because this is a perfectly typical handle class, similar in every way to the ones that we've discussed already, you should be able to figure it out without further explanation.
There are four functions that we haven't covered yet: BlockMove
,
AdjustOffset
, FindUnusedItem
, and FindBuffer
.
What they have in common is that they're all used quite a few times
during the execution of the program and therefore would be good
candidates for rewriting in assembly language. In the previous edition
of this book I did just that, so why am I not doing it here?
There are actually two reasons. First, in the several years that have elapsed since the publication of the second edition, the use of processors other than Intel's has become much more common in high-end applications. Obviously, any assembly language enhancements would be useless to readers who are using, for example, DEC's Alpha processor.
However, an equally good reason is the lack of a standard for creating assembly language functions that interface with C++ programs or using assembly language in a C++ function, which would limit the utility of any such performance enhancements to those who were using the same compiler as the one for which I developed them.
Happily, compatibility among C++ compilers themselves has improved considerably of late. After several decades of development of the C++ language, which led to a number of similar but not identical dialects, ISO and ANSI (the international and American standards bodies, respectively) have finally approved a C++ standard. The compiler manufacturers now have no excuse for not producing compilers that comply with this standard, and in fact most of the features of the standard have already been incorporated into a number of commercial compilers.
The similarity among C++ compilers is already sufficient that I was able to compile all the programs in this book using both Microsoft's Visual C++ 5.0 compiler and the DJGPP 2.8.0 compiler on the CD-ROM in the back of the book. They should also compile successfully on any compiler that complies with the new draft standard for C++.52 Unfortunately, there is no such standard for assembly language enhancements to C++ programs. Therefore, any such enhancements that I would write would be useful only with one compiler, and I elected to eliminate such enhancements rather than providing them only for one compiler.
However, this does not mean that such enhancements would be useless for you. If you know what machine and compiler you're going to be using, and if the performance of this program is unacceptable on the machine which it must run, then you may very well want to write assembly language functions to replace C++ functions that are heavily used at runtime. The four functions whose declarations appear in Figure asmfunc.00 are excellent candidates for replacement with assembly language equivalents, according to the tests that I ran for the second edition of this book.
codelist/asmfunc.00
BlockMove
FunctionLet's start with the simplest of these functions, BlockMove
,
shown in Figure asmfunc.01.
BlockMove
global function (from
quantum\asmfunc.cpp) (Figure asmfunc.01)codelist/asmfunc.01
As you can see, this function consists of nothing more than a call
to the standard C library function memmove
. It might be
hard to imagine that such a basic function could be improved upon, but
that's what I found in the previous edition of this book: by re-coding
that routine in assembly language I was able to speed it up by a factor
of approximately two to one over the version supplied with Borland C++
3.1. Of course, without doing the same thing for the compilers I'm
using in this book, I can't guarantee that such a speedup is possible;
however, it is a good place to look for performance improvement because
it is used very frequently.
AdjustOffset
FunctionThe next of these functions, AdjustOffset
, is shown in
Figure asmfunc.02.
AdjustOffset
global function (from
quantum\asmfunc.cpp) (Figure asmfunc.02)codelist/asmfunc.02
This one also isn't very complicated. It merely steps through each of the items in an item index, starting from the one whose address is passed as its first argument, for the count provided as the second argument. To each of these items, it adds the adjustment provided as the third argument, which can be either positive or negative, depending on the reason that we're calling this function. If we have deleted something, then the adjustment will be negative because the items referred to by the item index will be moving closer to the end of the quantum. On the other hand, if we've inserted new data into the quantum, then the adjustment will be positive because those items will be moving farther from the end of the quantum.
FindUnusedItem
FunctionThe next of these functions, FindUnusedItem
, is shown
in Figure asmfunc.03.
FindUnusedItem
global function (from
quantum\asmfunc.cpp) (Figure asmfunc.03)codelist/asmfunc.03
This function is slightly more complicated than the previous two, but it's still pretty simple. It steps through the item index of a quantum, checking for an entry whose type is UNUSED_ITEM. If it finds such an entry, it returns the index of that entry; otherwise, it returns the value -1 to indicate that there are no unused entries. We use this function when we want to insert a new item in a quantum and need to find an available item index entry for that item.
FindBuffer
FunctionThe last of these functions, FindBuffer
, is shown in
Figure asmfunc.04.
FindBuffer
global function (from
quantum\asmfunc.cpp) (Figure asmfunc.04)codelist/asmfunc.04
This function steps through the block number list that keeps track of which block is in each buffer, looking for a particular block number. If it finds that the desired block is in one of the buffers, it returns the index of that buffer in the block number list. If the block isn't in any of the buffers, it returns the special value -1 to indicate that the search failed.
In this chapter, we have seen how quantum files allow us to gain efficient random access to a large volume of variable-length textual data. In the next chapter, we will use this algorithm as a building block to provide random access by key to large quantities of variable-length data.
(You can find suggested approaches to problems in Chapter artopt.htm).
BlockSize
and MaxFileQuantumCount
,
which together determine the maximum size of a quantum file. The
beginning of that file also contains a number of other constants and
structures related to this issue; you should be able to modify the
capacity and space efficiency of a quantum file fairly easily after
examining that header file.
class
user from concern about
internals of a given class
, as we've discussed briefly in
the sections titled "Data Hiding" and "Function Hiding".
class
, I'm going to omit the MainObjectArrayPtr
qualifier at the beginning of those names.
MainObjectBlock
class
,
but for now it's sufficient to note that the main object array is
potentially divided into blocks which are accessed via the standard
virtual memory system.
m_StartLookingHere
, but
I doubt it will cause you too much confusion after you see how it is
used.
qfassert.h
and qfassert.cpp
.
GetFreeSpace
,
and its return type was called FreeSpaceEntry
, but I had
to change the name of the return type to avoid a conflict with a name
that Microsoft had used once upon a time in their MFC class
es
and still had some claim on; I changed the name of the function to
match. This is a good illustration of the need to avoid polluting the
global name space; using the namespace
construct in the
new C++ standard would be a good solution to such a problem.
NewBigPointerBlock
variable is BigPointerBlockPtr
,
not BigPointerBlock
. However, the functions that we call
through that variable via the operator->
are from the BigPointerBlock
class
, because that is the type of the pointer that operator->
returns, as explained in the section on overloading operator->
.
class
to contain this function as well as a few others that are global in
this implementation.
FreeSpaceBlockPtr
, MainObjectBlockPtr
,
BigPointerBlockPtr
, LittlePointerBlockPtr
,
and LeafBlockPtr
.
class
that actually does anything. As we'll see, this program contains one class
that doesn't contribute either data or functions. I'll explain why that
is when we get to it.
class
with the name of its enclosing
class
, to make the explanations shorter; I'll just
include it in the title of the main section discussing the class
.
AvailableQuantum
.
If we did not return a value that corresponded to that free space code,
a quantum that was ever used for anything would never be considered
empty again.
AccessVector
type,
it is defined in the header file vector.h
; if you are
familiar with C++ templates, I recommend that you read that header file
to understand how this type actually works.