Optimizing C++ - the WWW version

ISBN: 0-13-977430-0

Copyright 1999 by Prentice-Hall PTR

Copyright 2000 by Chrysalis Software Corporation


Return to the table of contents

Return to my main page


Free at Last: An Efficient Method of Handling Variable-Length Records

Introduction

In this chapter we will develop an algorithm that uses the quantum file access method to handle a file containing a large number of records each of which can vary dynamically in size. In order to appreciate the power of this access method, we will start by considering the much simpler problem of access to fixed-length records.

Algorithm Discussed

The Quantum File Access Method

A Harmless Fixation

Let us suppose that we want to access a number of fixed-length customer records by their record numbers.1 Given the record number, we can locate that customer's record by multiplying the record number by the length of a record, which gives us the offset into the file where that record will be found. We can then read or write that record as needed.

Of course, a real application needs to reuse records of customers who have become inactive, to prevent the customer file from growing indefinitely. To handle this problem, we could set a limit on the file size and, when it is reached, start reusing records that haven't been referenced for a long time, making sure to correct or delete any records in other files that refer to the deleted customer.

This fixed-length record system is not very difficult to implement, but it has significant drawbacks; address fields, for example, tend to vary greatly in length, with some records needing 25 or 30 characters for a city name or street address and others needing only 10. If we allocate enough storage for the extreme case, the records become much longer than if we had to handle only the average case. However, allocating enough for only the average case will leave those customers whose names or addresses won't fit into the allotted space quite unhappy, as I know from personal experience as a software developer! The obvious solution is to allow the fields (and therefore the records) to vary in length as necessary.

Checking Out of the Hotel Procrustes

Unfortunately, variable-length records are much more difficult to deal with than fixed-length ones. The most obvious reason, as discussed above, is that determining where fixed-length records are stored requires only a simple calculation; this is not true of variable-length records. However, we could remedy this fairly easily by maintaining a record index consisting of an array of structures containing the starting location and length of each record, as depicted in Figure recindex.2

A sample record index array and data for variable-length records (Figure recindex)

          +-      Starting  Record
| Index Address Length
| +---------------+
| 0 | 0 12 +----+
Record | 1 | 12 12 +----++
Index | 2 | 24 7 +----++-------+
Array | 3 | 31 2 +----++-------+------+
| 4 | 33 5 +----++-------+------+-+
| +---------------+ || | | |
+- +--------------+| | | |
| +---+ | | |
+- | | | | |
| ++-----------+-----------+------+-+----+
Record | |Steve HellerP.O.Box 0335BaldwinNY11510|
Data | +--------------------------------------+
+-
+-
Record | 0000000000111111111122222222223333333333
Offset | 0123456789012345678901234567890123456789
+-

We encounter a more serious difficulty when we want to delete a record and reuse the space it occupied.3 In some situations we can sidestep the problem by adding the new version of the record to the end of the file and changing the record pointer to refer to the new location of the record; however, in the case of an actively updated file such an approach would cause the file to grow rapidly.

But how can we reuse the space vacated by a deleted record of some arbitrary size? The chances of a new record being exactly the same size as any specific deleted one are relatively small, especially if the records average several hundred bytes each, as is not at all unusual in customer data files. A possible solution is to keep a separate free list for each record size and reuse a record of the correct size. However, there is a very serious problem with this approach: a new record may need 257 bytes, for example, and there may be no available record of exactly that size. Even though half of the records in the file might be deleted, none of them could be reused, and we would still be forced to extend the file. The attempt to solve this difficulty by using a record that is somewhat larger than necessary leads to many unusably small areas being left in the file (a situation known as fragmentation).

However, there is a relatively unknown way to make variable-length records more tractable: the quantum file access method.4 The key is to combine them into groups of fixed length, which can then be allocated and deallocated in an efficient manner.

The Quantum File Access Method

Before the following discussion will make much sense to you, I will need to explain in general terms what we're trying to accomplish: building a virtual memory system that can accommodate records of varying lengths in an efficient manner. This means that even though at any given time, we are storing most of our data on the disk rather than maintaining it all in memory, we will provide access to all the data as though it were in memory. To do this, we have to arrange that any actively used data is actually in memory when it is needed. In the present application, our data is divided into fixed-size blocks called quanta (plural of quantum),5 so the task of our virtual memory system is to ensure that the correct blocks are in memory as needed by the user.6 The quanta in the file are generally divided into a number of addressable units called items.7 When adding a record to the file, we search a free space list, looking for a quantum with enough free space for the new record. When we find one, we add the record to that quantum and store the record's location in the item reference array, or IRA, which replaces the record index in Figure recindex; this array consists of entries of the form "quantum number, item number".8 The item number refers to an entry in the item index stored at the beginning of the quantum; the items are stored in the quantum in order of their item index entries, which allows the size of an item to be calculated rather than having to be stored.

For example, if we were to create an array of variable-length strings, some of its item references might look like those illustrated in Figure itemref1.

Sample IRA, item index, and data, before deletions (Figure itemref1)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-----------------+
Reference | 1 | 3 2 +-----------------+-+
Array | 2 | 3 3 +-----------------+-+--+
(IRA) | 3 | 3 4 +-----------------+-+--+-+
| 4 | 3 5 +-----------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- Item # Offset Type Index | | | | |
| +--------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 24 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-31 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-33 || VARSTRING 3 |±---------+ |
for | 5 |+++-38 || VARSTRING 4 |±-----------+
Quantum | 6 |||| 38 || UNUSED 0 |
3 | 7 |||| 38 || UNUSED 0 |
| 8 |||| 38 || UNUSED 0 |
| 9 |||| 38 || UNUSED 0 |
| 10 |||| 38 || UNUSED 0 |
| ++++----++-----------------+
+- ||| |+-----------------+
||| +------+ |
||+----+ | |
+- |+---+ | | |
| +--+----+-+------+-----------+-----------+
Quantum | | 11510NYBaldwinP.O.Box 0335Steve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-

When we delete an item from a quantum, we have to update the free space list entry for that quantum to reflect the amount freed, so the space can be reused the next time an item is to be added to the file. We also have to slide the remaining items in the quantum together so that the free space is in one contiguous block, rather than in slivers scattered throughout the quantum. With a record index like the one in Figure recindex, we would have to change the record index entries for all the records that were moved. Since the records that were moved might be anywhere in the record index, this could impose unacceptable overhead on deletions; to avoid this problem, we will leave the item index entry for a deleted item empty rather than sliding the other entries down in the quantum, so that the IRA is unaffected by changes in the position of records within a quantum. If we delete element 1 from the array, the resulting quantum looks like Figure itemref2.

Sample IRA, item index, and data (Figure itemref2)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +-------------------+
Reference | 1 | NONE 0 | |
Array | 2 | 3 3 +-------------------+----+
(IRA) | 3 | 3 4 +-------------------+----+-+
| 4 | 3 5 +-------------------+----+-+-+
| +-------------+ | | | |
+- | | | |
| | | |
+- Item # Offset Type Index | | | |
| +----------------------------+ | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | |
| 2 | 12 | UNUSED 0 | | | |
Item | 3 | 19-+| VARSTRING 2 |±-------+ | |
Index | 4 | +--21 || VARSTRING 3 |±---------+ |
for | 5 |++--26 || VARSTRING 4 |±-----------+
Quantum | 6 ||| 38 || UNUSED 0 |
3 | 7 ||| 38 || UNUSED 0 |
| 8 ||| 38 || UNUSED 0 |
| 9 ||| 38 || UNUSED 0 |
| 10 ||| 38 || UNUSED 0 |
| +++-----++-------------------+
+- || |+-----------------+
|| +-----------+ |
|+---------------+ | |
+- +-----------+ | | |
| +--------------+----+-+------+-----------+
Quantum | | 11510NYBaldwinSteve Heller|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-

Note that we set the offset value of the deleted item equal to the offset of the item before it so that its length can be calculated as zero. Now let us add a new street address of "123 Main Street". After this change, our data structures look like those in Figure itemref3.

Sample IRA, item index, and data, after field change (Figure itemref3)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 1 +------------------+
Reference | 1 | 3 2 +------------------+-+
Array | 2 | 3 3 +------------------+-+--+
(IRA) | 3 | 3 4 +------------------+-+--+-+
| 4 | 3 5 +------------------+-+--+-+-+
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
|Item # Offset Type Index | | | | |
| +---------------------------+ | | | | |
| 1 | 12 -+ VARSTRING 0 |±--+ | | | |
| 2 | 27 +| VARSTRING 1 |±----+ | | |
Item | 3 | +-34 || VARSTRING 2 |±-------+ | |
Index | 4 | ++-36 || VARSTRING 3 |±---------+ |
for | 5 +-+-++-41 || VARSTRING 4 |±-----------+
Quantum | 6 | | || 38 || UNUSED 0 |
3 | 7 | | || 38 || UNUSED 0 |
| 8 | | || 38 || UNUSED 0 |
| 9 | | || 38 || UNUSED 0 |
| 10 | | || 38 || UNUSED 0 |
| | +-++----++------------------+
+- | || |+-----------------+
| || +---+ |
| |+-+ | |
+- | ++ | | |
| ++----+-+------+--------------+-----------+
Quantum | |11510NYBaldwin123 Main StreetSteve Heller|
Data | +-----------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-

This is the actual mechanism used to change the value of an item: we delete the old item and add the new one. Of course, an item that increases in size may no longer fit in the same quantum, in which case we have to move it to a different quantum. Keeping track of such changes is not difficult as long as we always use the standard method of finding the address of an item: look it up in the IRA to get its quantum number and item number, then use its quantum number and item number to look it up in the item index. This allows us to move the item to a new quantum and change only its entries in the IRA and affected item indexes.

For example, if an item was originally in quantum 3 but now no longer fits there, we might move it to quantum 6. In order to do this, we first delete it from quantum 3 and then add it to quantum 6. As before, all the other IRA entries for items in quantum 3 still refer to the same data items as before, after we have adjusted the item index to account for the removed data.

Of course, the deleted item still takes up a slot in the item index of the old quantum; in our implementation this requires 10 bytes of storage.9 We need a method of reclaiming that space, so that we do not gradually fill our quanta with indexes pointing to nothing. We have already seen part of the solution to this problem: whenever we are adding an item to a quantum, we search for an unused item index entry rather than simply adding a new entry at the end.

This mechanism alone would only slow the growth of the item indexes; actually reducing the size of an index requires us to check for empty entries at the end of the item index after deleting an item. If such excess entries exist, we reclaim the space they occupy by reducing the entry count in the item index header to the number actually in use. This reduces wasted space in the item index without affecting entries in the IRA. Since we remove unused item index entries only when they are at the end of the index, no IRA entries can refer to items later in the same quantum.

A Large Array

Our records are stored in the quanta pointed to by our IRA. The IRA will be stored in its own quanta, which allows us to make it as large as needed. However, this means that we can't allocate the IRA in one piece. Since a quantum has a fixed size, we will have to break up our IRA into segments that will fit in a quantum; these segments will be collectively referred to as the little pointer array. We'll also need some way to find the segment that we need for a particular element in the IRA, which will be the responsibility of the big pointer array.

Figure bigpointer shows the relationship between the big pointer array and the IRA segments.

From the big pointer array to the IRA (Figure bigpointer)

+------------------------------------------+
| +- |
| Big | element_count = 8000 |
| Pointer | last_quantum_added_to = 3 |
| Array | |
| Header | |
| +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
| Big | 0 | 2 +-------+---+
| Pointer | 1 | 10 +-------+---+-+
| Array | 2 | 12 +-------+---+-+--+
| | 3 | 14 +-------+---+-+--++
| | 4 | 11 +-------+---+-+--+++
| | +-------------+ | | | |||
| +- | | | |||
| | | | |||
+------------------------------------------+ | | |||
+--------------------------------------+ | | |||
| |±------+ | |||
| +- | Quantum | |||
| | Quantum Item | 2 | |||
| | Index Number Number | | |||
| Item | +-------------+ | | |||
| Reference | 0 | 3 2 | | | |||
| Array | 1 | 3 4 | | | |||
| (IRA) | 2 | 3 5 | | | |||
| segment | 3 | 3 3 | | | |||
| 0 | 4 | 3 1 | | | |||
| | +-------------+ | | |||
| +- | | |||
+--------------------------------------+ | |||
+--------------------------------------+ | |||
| |±--------+ |||
| +- | Quantum |||
| | Quantum Item | 10 |||
| | Index Number Number | |||
| Item | +-------------+ | |||
| Reference | 0 | 3 12 | | |||
| Array | 1 | 4 6 | | |||
| (IRA) | 2 | 4 1 | | |||
| segment | 3 | 4 2 | | |||
| 1 | 4 | 4 3 | | ²²²
| | +-------------+ |
| +- |
+--------------------------------------+

Each IRA segment in the real program, of course, holds more than five entries: with a 16 KB quantum, about 4000 entries will fit in one quantum. However, the segmentation of the IRA works in the same way as this example. The segment number can be determined by dividing the element number by the number of elements in each IRA segment; the index into the IRA segment is the remainder of that division.

The big pointer array sets a limit on the number of elements referenced by an IRA, since that array has to fit in one quantum. However, this is not a very restrictive limit, since one big pointer array can contain about 4000 elements, using a 16 KB quantum. Each of these elements points to a little pointer array, which contains pointers to about 4000 entries, as mentioned above. Therefore, we could theoretically create an array with about 16,000,000 items in it.

Because our ArrayIndex type is defined as an unsigned long, which is usually 4 bytes, we can actually create arrays of that size.10

Many Large Arrays

There is one more generalization of the quantum file access method that will make it more generally useful: the ability to create more than one array of variable-length items.

The mechanism by which we will accomplish this goal is called the "main object index", which contains the quantum number of the big pointer array for each object; the number of an object is set when the object is created, by a directory facility which we will examine later.

Figure mainindex shows the path from the main object index through the big pointer array, the IRA segment, and the item index, to finally arrive at the data.

From the main object index to the data (Figure mainindex)

      +-
| Index Quantum Number
| +----------+
Main | 0 |NO_QUANTUM|
Object| 1 | 5 +-------------------------+
Index | 2 | 9 | |
| 3 | 1 | |
| 4 | 7 | |
+- +----------+ |
+-----------------------------------------------+ |
|Big +- |±----+
|Pointer | element_count = 8000 |
|Array | last_quantum_added_to = 3 |
|Header +- |
| +- |
| | Quantum | Quantum
| | Index Number | 5
| | +-------------+ |
|Big | 0 | 2 +--------------+---+
|Pointer | 1 | 10 | | |
|Array | 2 | 12 | | |
| | 3 | 14 | | |
| | 4 | 11 | | |
| +- +-------------+ | |
+-----------------------------------------------+ |
+-------------------------------------------+ |
| +- Quantum Item |±------+
| | Index Number Number | Quantum
|Item | +-------------+ | 2
|Reference | 0 | 3 2 +----------+----------+
|Array | 1 | 3 4 +----------+----------+-+
|(IRA) | 2 | 3 5 +----------+----------+-+-+
|segment | 3 | 3 3 +----------+-------+ | | |
|0 | 4 | 3 1 +----------+-----+ | | | |
| +- +-------------+ | | | | | |
+-------------------------------------------+ | | | | |
+-----------------------------------------------+ | | | | |
| +- | | | | | |
| | Item # Offset Type Index | | | | | |
| | +----------------------------+ | | | | | |
| | 1 | 5 -+ VARSTRING 4 |±+-+ | | | |
| | 2 | 17 +| VARSTRING 0 |±+---+--+ | |
|Item | 3 | +-19 || VARSTRING 3 |±+---+ | |
|Index | 4 | ++-31 || VARSTRING 1 |±+--------+ |
| | 5 |+++-38 || VARSTRING 2 |±+----------+
| | 6 |||| 38 || UNUSED 0 | | Quantum 3
| | ++++----++-------------------+ +---------+
| +- ||| |+------------------------+ |
| ||| +-------------+ | |
| ||+----------------+ | | |
| +- |+-----+ | | | |
|Quantum | +-+------+-----------+-+-----------+----+ |
|Data | | BaldwinP.O.Box 0335NYSteve Heller11510| |
| +- +---------------------------------------+ |
| 333333333222222222211111111110000000000 |
| 876543210987654321098765432109876543210 |
+---------------------------------------------------------+

We might use this facility to keep track of both customer records and inventory records, each of which would be logically separate although they may be stored in the same quantum file. In order to facilitate reconstruction of the file if part of the IRA is lost, we won't use the same quantum for items from two different arrays; as a result, the space freed by deleting an item will only be reused when an item is to be added to the same array.

To find an element in a main object, we index into the main object index by the object's number, retrieving the quantum number of the big pointer array for that object. Then we calculate the IRA segment and the index into that segment and retrieve the quantum number and item number for the element we want. The last step is to use the item index to find the exact location of the element.

Of course, we need a place to store the main object index. A normal application might have five or ten main objects; therefore, our allocation of 256 entries in the main object index is extremely generous in most cases.11

The Light at the End of the Tunnel

Let's start our examination of the implementation of the quantum file code by looking at a simple program that uses it to store a number of strings and then retrieve them (Figure flextest.00).

Initial test program for quantum file implementation (quantum\flextest.cpp) (Figure flextest.00)

codelist/flextest.00

Possibly the most striking feature of this program is how simple it appears, which is a direct consequence of the power of C++. Writing a similar program in C would require much greater knowledge of the implementation of the quantum file algorithm. However, in C++ all we have to know is which header file to include, how to open the quantum file, and how to create a FlexArray object, which behaves like a variable-sized array of variable-sized strings. Once we have dealt with this small set of prerequisites, we can use this extremely powerful and flexible data type in virtually the same way we would use a built-in type.

Let's take a detailed look at exactly how we set up and use a quantum file in this program. First, we have a couple of statements that declare a temporary buffer variable which we'll use to construct data to store in the quantum file and delete any previous version of the quantum file itself. Next, we create a QuantumFile object called QF via the default constructor of the QuantumFile class. Then we call an Open function on that object, supplying the name of the quantum file to be created.

The next statement creates a FlexArray object called TestArray. The arguments to this constructor are a pointer to the QuantumFile object that we'll use to store the data for TestArray, and a name by which we could reuse this same FlexArray object in another program or another execution of this program.

Now we're up to the loop where we will start to use the TestArray object. The loop steps through the first 100 elements of TestArray, assigning each element a String consisting of digits corresponding to the position of that element in the TestArray object. The strstream named Temp is used to format the numeric values of the loop variable i into String values that can be stored in a FlexArray object.

Finally, after we've put the values into TestArray, the second loop displays the values in the TestArray object.

Pay No Attention to the Man Behind the Curtain

At this point, I wouldn't be at all surprised if you had come to the conclusion that I have something up my sleeve ... and you would be right. Underneath this apparent simplicity is a wonderland of complex programming, which I hope I will be able to explain in an understandable way. Let's begin by examining some of these data types we've just seen, in enough detail to demystify them.

The first of these data types is the String. As it happens, this is actually a typedef for a more general type called SVector, so we'll start with the question of exactly what an SVector might be.

An SVector is very much like an array, with the following exceptions:

  1. If you try to store something beyond the end of an SVector, you will get an error message rather than a mysterious crash.
  2. You can find out how many elements an SVector has, by calling its size member function.
  3. You can change the size of an SVector after it has been created, by calling its resize member function.
  4. You can assign one SVector of a particular type to another SVector of the same type.
  5. You can create an SVector by copying from another SVector of the same type.

While these are very useful characteristics for any type of SVector, they are even more important in one particular type of SVector: the String, which is an SVector of chars. As a specialization of the SVector type, the String has all of the characteristics of any other SVector, but has additional abilities as well. Primary among these is its ability to be read or written using the standard C++ input/output stream functions. This String type, or one very much like it, is one of the most obvious improvements in C++ over C, whose corresponding char* type is notoriously error-prone. In this program, we will use the String data type in much the same way as we would use the char* data type in a C program. Unfortunately, we will not be able to go into detail on the implementation of either the SVector or String classes in this book; we will just use their facilities as described immediately above.

A Very Useful typedef

Another "type" used in this simple test program is ArrayIndex. This is also a typedef, but for a much simpler underlying type: unsigned long.

In this case, the reason I didn't just use the underlying type was to enhance portability. The original version of this program was implemented on a 16-bit compiler, where the definition of ArrayIndex was unsigned short, primarily because it was impossible to create an array or SVector having more than 64K elements. When I ported this program to a 32-bit compiler, I changed the definition of ArrayIndex and several other similar typedefs and recompiled.

I would like to be able to tell you that the program worked without further changes, but that would be a lie: it actually took me a couple of days to get it working again. However, considering the size of the source code for the classes (over 100KB), I was quite happy to take such a short time to port it.

If I had used the explicit type unsigned short everywhere, it would have taken me much longer to figure out which occurrences of that type should be changed to unsigned long and which should remain as they were. As it was, I found a few places where I did not use a typedef or used the wrong one and ran into trouble as a result. Overall, though, I did a pretty good job in using appropriate types originally, which saved me a lot of trouble later.

A Quantum Leap

Now let's start our journey into the interior of the quantum file implementation, starting with the QuantumFile class itself, whose header file is shown in Figure blockh.00.

Header file for QuantumFile class (quantum\block.h) (Figure blockh.00)

codelist/blockh.00

I can't say that I'm completely happy with the design of this class. Ideally, the user of any class should not have to worry about any member data or member variables that the user cannot or should not use. Unfortunately, this is not easily accomplished in C++. I believe it is worthwhile to digress slightly here to discuss the reasons why this is so.

First, let's examine the difficulty of "hiding" private and protected member variables from the class user. The technical problem here is that the compiler cannot allocate space for a variable unless its size is known. This implies that you cannot create a variable of a given class without access to its interface definition, which must in turn include the definitions of all of its member variables.

This does not allow the class user to use those member variables unless they are public, so we don't have to worry about non-member functions accidentally altering member variables. However, it does mean that the user of a class must ensure that all of the data types needed in the declaration of a class are available when the header for that class is compiled, and that the names of those data types do not conflict with any previously defined names. This can be a source of considerable difficulty when trying to use a fairly large library in a context other than the one in which it was developed, which of course reduces the payoff that such reuse would otherwise provide.

The other problem with the design of this class is that it defines a number of functions that application programmers have no need for. These functions are intended for use by other classes that contribute to the functionality of the quantum file implementation. You might think that I could simply make these functions private to prevent their being called by outside classes. Unfortunately, I wasn't able to do that with Microsoft Visual C++ 5.0, one of the compilers with which this code must work, as that compiler apparently cannot understand friend declarations that specify classes that are nested within other classes, as some of mine are. Even if this did work, I'd still prefer to spare the application programmer the details of these functions.

Is there a solution to either or both of these problems? Yes, in fact both of these can be remedied. Let's start with the data problem.

The standard solution to avoid exposing member variables to the application programmer is to create a secondary data structure that contains the actual data for the object, and include only a pointer to such a structure in the main object. Because the compiler knows the size of a pointer to a structure without having to see the definition of the structure itself, the header file that defines this secondary structure can be included only when needed in the implementation of the class. This frees the application programmer from any concern about the representation of the data.

What about those functions that the application programmer doesn't need to (and shouldn't) call? This can be solved in much the same way as the data problem. In this case, however, the pointer in the visible class points to an object of a class that supplies the additional functions needed by the other implementation classes. Because the client program would not include the header file that defines this secondary class, it could not access those additional functions. Of course, the other implementation classes would include that header file so that they could access the additional functions as needed.

Why didn't I implement either of these solutions (or better yet, both)? Because doing so would have required wholesale changes to the implementation, and I ran out of time. Of course, had I considered these issues when I was first creating these classes, they would have been fairly simple to solve, but I didn't think of them at that time.

In any event, the design is usable as it stands, even if it could be improved, so let's move on to how we can use it in its present form. We'll start with the functions that are needed by client programs such as our test program flextest.cpp. Figure block.00 shows the first of these functions, QuantumFile::QuantumFile().

The default constructor for the QuantumFile class (from quantum\block.cpp) (Figure block.00)

codelist/block.00

As you can see, this function's implementation consists entirely of initialization expressions in the member initialization list. We'll look at each of the member variables of this class in detail when we get to the functions that use them. For now, I'll just mention that they are all needed to keep track of the status of the quantum file and the buffers that maintain parts of its data in memory.

Next, let's take a look at Figure block.01, which shows the first "regular member function" in this class, QuantumFile::Open. This is a good place to start, since we can't do anything to a quantum file until we open it, as with any other file.

The Open function for the QuantumFile class (from quantum\block.cpp) (Figure block.01)

codelist/block.01

After declaring a number of variables, this function initializes the variables that keep track of the number of physical read and write operations for benchmarking purposes (m_ReadCount and m_WriteCount, respectively). Then it resizes the SVectors that keep track of the block buffers (m_Block), which quantum is in each block buffer (m_BlockNumber), whether each block buffer has been modified and therefore needs to be written back to the disk (m_BufferModified), and the "time stamp" for each block buffer that indicates how recently it has been accessed (m_TimeStamp). We'll see how each of these variables is used in the program as we go through the code.

The next few lines check whether the caller has supplied a non-null file name for the quantum file. If not, we bail out, because we need a valid file name to use when we open the file to store our data.

Assuming that we do have some sort of file name, we try to open the file for reading, to see whether it is already there. If fopen returns NULL, indicating that the file doesn't exist yet, we create the file, clear the QuantumFileHeaderStruct structure called QFHS, set the main_object_index_count and free_space_list_count to their default values, and copy the current program_version_info into the quantum_version_info structure element.

The next operation is to initialize the member variables of QFHS that keep track of the offsets in the file where the main_object_index, the free_space_list, and the quantum file data area begin, so that these areas of the file can be located when we open the file again.

Since we haven't created any main objects yet, our next task is to clear the main object index, setting it to zeros (which indicates available entries, as quantum number zero is not used), except for the entry for object number zero, which is set to NoQuantum; we will not allow a main object number zero, for better error control. We also write the header block to the file at this point.

The last operation in the new file initialization is to set up the free space list to indicate that all quanta have the maximum available free space, except of course for quantum zero, which is not used.

Of course, the process of opening an already existing quantum file starts differently; we have to read in the quantum file header and check whether it was created by the same version of the program. The current program simply tests whether the version of the program that created the file is different from the one reading it and aborts if they are different. A production program would use this information to determine whether the versions were compatible, so that if we changed the program in a way that makes old files unusable, we could warn the user that the file is of an old type and should be converted. Alternatively, we might automatically convert it to a new format; in either case, it is essential that we know what version of our program created the file. After we have taken care of this detail, the rest of the code for opening either an old or a new file is the same.

We are almost done with the initialization, but there are a few more data structures to handle. First, we have to calculate the block numbers where the main object list, the free space list, and the data area start. Then we initialize the member variables that keep track of the current number of main objects and the number of elements in the free space list. Then we initialize the SVectors of block buffers, block numbers, modified flags, and time stamps. Next, we initialize variables that will allow us to access the free space list and main object list; we'll get into exactly how that works later.

Finally, if we have created a new quantum file (as opposed to opening one that already existed), we create a main object called "Directory", which we will use to keep track of the names of the main objects that the user will create.

Now let's take a look at Figure block.02, which shows the next user function, QuantumFile::Flush. This function is used to ensure that all data written to the quantum file has actually been stored on the disk. It's a good idea to call Flush whenever you have finished with a logical portion of your data updating, to make sure that a power failure or other crash doesn't lose data.

The Flush function for the QuantumFile class (from quantum\block.cpp) (Figure block.02)

codelist/block.02

This isn't a very complicated function, but it does show how the m_BufferModified SVector is used: Each buffer whose flag in that SVector is TRUE is written to the disk via the Write member function. Any buffer whose flag is not TRUE has not been modified since it was last read from the disk, so there is no reason to write it back to the disk. Avoiding unnecessary writes is very helpful in reducing disk I/O and improving the speed of the program. The Write function resets the flag for any buffer it writes, so if we called Flush twice in a row, no buffers would be written out on the second execution.12 The final function that we're going to cover in this class at this point is Close (Figure block.03). As its name suggests, this function closes the quantum file and takes care of all the housekeeping required at that time. However, you generally won't call this function explicitly, because it is called automatically when the QuantumFile object is destroyed at the end of its scope, as you can see by looking at the QuantumFile destructor (Figure block.04).

The Close function for the QuantumFile class (from quantum\block.cpp) (Figure block.03)

codelist/block.03

The destructor for the QuantumFile class (from quantum\block.cpp) (Figure block.04)

codelist/block.04

Allowing the destructor to close the file automatically is easier and (more important) safer than closing the file explicitly, because it's not always easy to determine when all of the objects in the quantum file are inactive. Hence, I recommend allowing the compiler to take care of closing the file.

Flexible Flying

Now that we've taken a look at the member functions of the QuantumFile class that user programs are concerned with, let's do the same with the FlexArray class. In the process, we'll start to delve into the underlying layers of the quantum file implementation in more detail. Let's start with the interface for the FlexArray class, which is shown in Figure newquant.00.

The interface for the FlexArray class (from quantum\newquant.h) (Figure newquant.00)

codelist/newquant.00

As before, we'll postpone discussion of the member variables of this class until we see how they are used in the member functions. We'll start with the normal constructor for the FlexArray class, which is shown in Figure newquant.01.

The normal constructor for the FlexArray class (from quantum\newquant.cpp) (Figure newquant.01)

codelist/newquant.01

As you can see, this function simply calls an Open function in the same class to do all of the actual work. The purpose of this handoff is to allow the user to create a FlexArray via the default FlexArray constructor before opening the quantum file and fill in the details later via an explicit Open call. Now that I've cleared that up, let's take a look at the function that actually does the work, which is shown in Figure newquant.02.

The Open function for the FlexArray class (from quantum\newquant.cpp) (Figure newquant.02)

codelist/newquant.02

The header for this function shouldn't be too alarming. The first argument is simply the address of the quantum file object in which the FlexArray will be created. The second argument is the name under which the FlexArray will be stored in the quantum file's directory, which will make it accessible to another program or a later run of this program. The ModifiableElement type is just another name for a String, left over from a previous incarnation of that type. The third argument is the initial number of elements that the FlexArray will contain, and the last argument is the maximum number of elements to which the FlexArray will expand if necessary. If you look at the interface for this class, you'll notice that these last two arguments have default values of 1 and UINT_MAX-1 elements, respectively. Therefore, if you don't have any idea how large a FlexArray might need to be, you can omit these arguments and it will expand as needed.

Now we're up to the first statement inside the function, which appears innocent enough. It assigns a value to a member variable called m_MOA, which stands for "Main Object Array". This member variable can then be used to look up the name of the FlexArray in the quantum file directory, as well as to access a number of other attributes of the quantum file. But what is the type of m_MOA, and how does it give us access to the quantum file attributes?

To answer these questions, we're going to have to get into some fairly complex implementation and language issues. We'll start this excursion with a look at the interface for the MainObjectArrayPtr class, shown in Figure newquant.03.

The interface for the MainObjectArrayPtr class (from quantum\newquant.h) (Figure newquant.03)

codelist/newquant.03

You will notice that this class has an embedded class called MainObjectArray defined within it, and that the embedded class has quite a few member functions. You may also notice that the MainObjectArrayPtr class itself has only a few member functions, most of which are the standard "structural" member functions -- the normal and default constructors, the destructor, and the assignment operator. One of the constructors is the conversion function that constructs a MainObjectArrayPtr object from a QuantumFile pointer (used in the FlexArray Open function), and the others are the default constructor and copy constructor. However, there is one member function that is anything but standard: operator->. Why would we want to redefine that operator, and what can a custom version of operator-> do?

All Hope Abandon, Ye Who Enter Here?

Before we get through with this question, we will have entered a realm of C++ where relatively few have trespassed. I don't believe in adding complexity for the sake of complexity, so why would I use a feature of C++ that most C++ programmers have never dealt with?

Because we need this feature to improve the ease of use and reliability of programs using this quantum file access method implementation. A little history lesson is appropriate here so you can see that I'm not speaking from a solely theoretical perspective.

The first edition of this book included an implementation of the quantum file access method written in C.13 It had many of the features that the current implementation has, but it was much harder to use. The main problem was that the user program had to deal with memory allocation issues, including remembering to free memory for variables whose storage was assigned inside the quantum file code. In addition, the syntax of a quantum file "array" was much less convenient than dealing with a built-in array. It was obvious to me that resolving all of these issues so that the user could ignore the inner workings of the quantum file would greatly improve the usability of this algorithm.

Unfortunately, this isn't possible in C; however, a major impetus behind the creation of C++ was to make it possible for class library designers to provide such facilities. Since the quantum file access method would benefit greatly from such an approach, I decided to rewrite it in C++.

Rewriting History

Why am I using the word "rewrite", rather than "convert", "translate", or other euphemisms? Judging from the number of ads for "C/C++ programmers" I see in the newspapers, some employers have the notion that switching to C++ can be accomplished by changing source file extensions to ".cpp" and recompiling. After removing some syntax errors revealed by the stricter type checking of the C++ compiler, the program compiles and runs as it did before. What's so difficult about object-oriented programming? If you are one of these employers, or you have tried the above experiment yourself and now believe that you are a "C++ programmer", let me break the news to you gently: all you have accomplished is to switch to the C subset of C++, which has nothing to do with object-oriented programming. Virtually none of the code or design from the original quantum file implementation has survived the transition to object orientation unscathed, despite the fact that, according to an expert on object-oriented design, that original C implementation was "object-oriented"!

However, my loss (if that's what it was) can be your gain: I am going to break a long-standing tradition of object-oriented design literature by disclosing not only the final design but also many of the missteps, errors, and difficulties that ensued from my original determination to tackle this rather complex project. So, with no further ado, let's begin with an often neglected concept which is at the core of the implementation of this project: operator overloading.

Warning: Overload!

One of the most powerful mechanisms for hiding the details of implementation from the class user in C++ is operator overloading, which means defining (or in some cases redefining) the semantics of one or more of the standard operators +, -, =, and so forth, as they apply to class objects. For better or worse, the semantics of these operators cannot be changed as they apply to "intrinsic" data types, so the calculation 2+2 is always going to result in 4; this restriction is probably necessary, as the potential confusion could be horrendous.14 A good example of the most common type of use for this facility is the implementation of +, -, etc., for manipulating Complex class objects, following the rules of complex arithmetic. The ability to provide this sort of intuitive operation to the number-crunching user is starting to make C++ (with the proper class libraries, of course) a viable alternative to FORTRAN, long the dominant language for such users.

Hello, Operator?

As usual, it's probably best to start with a relatively simple example: in this case, we'll look at an example of overloading operator-.15 Consider the program in Figure overload1.

Overloading operator- (Figure overload1)

codelist/minus.00

As you can see, I've created a very simplified version of the dreaded Point class, used in innumerable textbooks to represent a point on the Euclidean plane. The result of running our sample program is 5, which is the distance between the Point (1,1) and the Point (4,5), calculated by the normal Euclidean calculation:

Distance = sqrt(xdiff*xdiff + ydiff*ydiff)

where xdiff is the difference between the x coordinates of the two Points, and ydiff is the difference between their y coordinates; sqrt, of course, represents the square root function.

The "magic" is in the definition of operator-, which is the syntax for redefining an operator; in this case it's the subtraction operator. When the compiler sees the expression x-y, it looks for a definition of operator- that is specified for class Point, taking an argument which is also of class Point. Since there is such a definition, the compiler generates a call to the code specified in that definition. Had there not been such a definition, a syntax error would have resulted, since the compiler doesn't have any built-in knowledge of how to subtract two Points.

The Mask of Arrow

Operator overloading doesn't have to be terribly complicated, as that example illustrates. However, there's a trick to the overloading of operator->. When we overload other operators, such as our earlier example of operator-, our code takes over the implementation of the operator completely. That is, the result of calling our operator is whatever we say it is; in the operator- example, it's the double result of the Euclidean distance formula. However, this is not true with operator->; in that case alone, a different scheme is followed by the compiler. Figure overload2 shows some code derived from an early version of the project. How does the compiler interpret it?16

Dangerous operator-> overloading (Figure overload2)

codelist/danger.00

The line X->Get(1); will be compiled as follows:

1. Since X is not a pointer to any type, its class definition is examined for a definition of operator->.

2. Since this definition is found, a call to that code is inserted in the function being compiled.

3. Then the return value of the operator-> code is examined. Since it is a pointer type (BlockPtr *, to be exact), the compiler continues by generating code to call BlockPtr->Get(int) with this equal to the result returned by operator->, which is the same BlockPtr that we started with.

Type-Safety First

So far, this is doing exactly what we wanted it to do. However, there is one serious problem with this method of implementing operator->, which is illustrated by what happens when the line Y->Get(1); is compiled. Unlike our previous example, Y is a pointer (to a BlockPtr); therefore, the compiler doesn't look for a definition of operator->, but merely generates code to call BlockPtr->Get(int) with this equal to Y. As a result, our overloaded operator is never called.

Much the same problem will occur when the line X.Get(1); is compiled. The compiler is happy to generate a call to BlockPtr.Get(int) with this being equal to the address of X; again, though, our custom operator-> code won't be executed.

The reason for this problem is that the compiler stops looking for operator-> overloading whenever an actual pointer is found. After all, if you already have a pointer in a place where a pointer is needed, you must have what you want! Obviously, we're going to have to come up with a type-safe solution, so we can't accidentally use the wrong syntax and end up with a nasty bug.17 The solution is to have not one, but two classes involved in the virtual memory mechanism, an "outer" (or handle) class and an "inner" (or body) class.18 The handle class has two primary functions: the initializing of a body class object and the overloading of operator-> to return a pointer to that body class object, which actually does the work. Figure overload3 presents this solution.

Type-safe operator-> overloading (Figure overload3)

codelist/safe.00

One interesting thing about this program is that the main program refers to the Get function in exactly the same way as it did before: in fact, the only difference visible to the class user is that the compiler is now able to prevent him (and us) from compiling the incorrect references as we did before.

However, there are some changes to the internals that deserve comment. First of all, in the class definition, the previous BlockPtr class has been renamed to Block and made a nested class of a new BlockPtr class. The reason for the renaming is to prevent unnecessary changes to the user's code; that was successful, as noted above. Another design decision whose justification is not quite as obvious is the use of a nested class rather than a freestanding class. Why is this appropriate?

Hidden Virtues

Since the purpose of the nested Block class is solely to add type safety, rather than to provide any visible functionality, nesting it inside the BlockPtr class reduces "name space pollution". That is, if the user needs a Block class for some other purpose, its name won't conflict with this one. Such potential conflicts are going to become more serious as the use of class libraries increases; we should do our part to "preserve the environment" when possible.19 The fact that the Block class is visible only to members of its enclosing class, BlockPtr, is responsible for the somewhat odd appearance of the member function declarations in Figure overload3. For example, what are we to make of the declaration "char BlockPtr::Block::Get(int p_Index)"?

Of course, we C++ programmers are used to "qualified" function names, such as Block::Get. But why do we need two qualifiers? Because the compiler knows the meaning of Block only in the context of BlockPtr; that's what prevents the global name pollution that would ensue if Block were declared outside of BlockPtr. This means that we could, for example, have another class called Block nested inside another enclosing class called NewBlockPtr. If that Block class had a Get member function, we could declare it as "char NewBlockPtr::Block::Get(int p_Index);", and the compiler would have no problem telling which Block::Get we meant. Similarly, an unadorned "Block::Get(int p_Index);" would be yet another completely distinct function.20 One more safety feature in this new arrangement is worthy of note: the constructor for Block is protected, so that the user can't accidentally create one and use it instead of referencing it through a BlockPtr object the way we intend. This isn't very likely anyway, since the user would have to specify BlockPtr::Block as the type, which isn't easy to do by accident; however, according to the principles of object-oriented programming, it is best to hide the internals of our classes whenever possible, so that class user code can't depend on those internals. That restriction makes it possible for us to improve the implementation of our classes without breaking user code.

Polite Pointing

Now that we've investigated the notion of redefining operator->, let's return to the MainObjectArrayPtr class that we were discussing a little while ago. We'll start with Figure newquant.04, which shows the code for the normal constructor for that class.

The normal constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.04)

codelist/newquant.04

This function is fairly simple. It creates a new MainObjectArray object using the normal constructor for that class. This is the "body" object that will actually do the work of both of these classes. Then it sets the reference count of that object to 1, indicating that there is one MainObjectArray "handle" object using that "body" object.

The general idea of reference counting is fairly simple, as most great ideas are (after you understand them, at least). It's inefficient to copy a lot of data whenever we set one variable to the same value as another; on the other hand, copying a pointer to the data is much easier. However, we have to consider how we will know when we can delete the data being pointed to, which will be when no one needs the data anymore.

One way to share data safely in such a situation is to write the constructor(s), destructor, and assignment operator to keep track of the number of objects that are using a particular body object, and when there are no more users of that object, to delete it.21 We'll see how this works in the case of the MainObjectArray body class and its MainObjectArrayPtr handle class; the implementation is much the same in the other handle/body classes in this program.

However, right now we're more interested in how we can use operator-> in the FlexArray::Open function (Figure newquant.02). After creating a member variable called m_MOA via the normal constructor of the MainObjectArrayPtr class, the next line in the Open function invokes MainObjectArrayPtr::operator-> (Figure newquant.06).

MainObjectArrayPtr::operator-> (from quantum\newquant.cpp) (Figure newquant.06)

codelist/newquant.06

Given the definition of operator-> in Figure newquant.06, how will the compiler interpret the expression m_MOA->FindObjectByName, which is used in the function FlexArray::Open?

1. Since m_MOA is not a pointer to any type, the interface of its class (MainObjectArrayPtr) is examined for a definition of operator->.

2. Since there is such a definition, a call to that code is inserted in the function being compiled.

3. Then the return value of the operator-> code is examined. Since it is a pointer type (MainObjectArray *, to be exact), the compiler continues by generating code to call MainObjectArray->FindObjectByName with this equal to the result returned by operator->, which is the address of the body object of the MainObjectArrayPtr that we started with. Of course, the same analysis applies to the other uses of the m_MOA object in the Open function as well as the other functions that use handle objects.

Now let's continue with the analysis of the Open function in the FlexArray class. The statement we've just examined calls a function called FindObjectByName, which returns an index into the main object array. If the name supplied in the call was in fact the name of an existing object in the main object array, then the returned value will be the element number of that object in the main object array. On the other hand, if the specified name was not found in the main object array, the returned value will be the special value NoObject. In that case, Open will use another member function of MainObjectArray, namely FindAvailableObject, to locate a free element in the main object array. Then it will call the CreateMainObject function to create a new main object which will use that free element.

At this point in the function, we have a valid index into the main object array for our FlexArray, whether it existed previously or was just created. Finally, we retrieve the current and maximum element counts for the FlexArray object.

Now that we have covered the Open function in the FlexArray class, there's only one other function in that class that could use a significant amount of explanation: the implementation of operator []. However, I'm going to postpone coverage of that function until the next chapter, where we'll be examining the somewhat complex task of making one of our data types look as much like an array as possible.

In the meantime, we should go over the rest of the member functions of the MainObjectArrayPtr class, so you can see how this typical handle class works.

The MainObjectArrayPtr class

We'll start with the default constructor (Figure newquant.07), which is pretty simple. In fact, it's exactly like the normal constructor (Figure newquant.04), except that it uses the default constructor of its body class to create its body object rather than the normal constructor of the body class.

The default constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.07)

codelist/newquant.07

The MainObjectArrayPtr Copy Constructor

The copy constructor (Figure newquant.08) isn't much more complicated. It copies the pointer to the main object array (body) object from the existing object to the same member variable in the newly created object. Then it increments the reference count because there is now one more user of that body object.

The copy constructor for MainObjectArrayPtr (from quantum\newquant.cpp) (Figure newquant.08)

codelist/newquant.08

The MainObjectArrayPtr::operator = Function

The next function, MainObjectArrayPtr::operator = (Figure newquant.09), is a bit more complicated than the previous ones, because it has to account for the reference counts in both the current object and the right-hand object being copied from.

MainObjectArrayPtr::operator= (from quantum\newquant.cpp) (Figure newquant.09)

codelist/newquant.09

We start by decrementing the reference count of the current body object, because we won't be pointing to it after this operation (unless the right-hand object happens to point to the same body object as ours does). Then we check whether the reference count is less than or equal to 0, in which case we can (and will) delete the current body object.22 After taking care of that issue, we copy the right-hand object's main object array pointer to the corresponding member variable in our object and increment the reference count for that main object array. Finally, we return our object to the caller.

The MainObjectArrayPtr Destructor

The final function in the MainObjectArrayPtr class is the destructor (Figure newquant.10).

The destructor for the MainObjectArrayPtr class (from quantum\newquant.cpp) (Figure newquant.10)

codelist/newquant.10

This shouldn't seem at all mysterious after our analysis of the operator = function, as its operation is basically a subset of what that function does. That is, it decrements the reference count of its body object; if the resulting reference count is less than or equal to 0, it deletes the body object.

Now that we're through with the MainObjectArrayPtr class, we'll examine its body class, MainObjectArray, so we can see how this fundamental data type of the quantum file algorithm works.23

The MainObjectArrayPtr::MainObjectArray class

We'll skip over the default constructor and destructor, as these functions don't do anything particularly interesting. However, the normal constructor for this class does some initialization that we should pay attention to, as shown in Figure newquant.11.

The normal constructor for the MainObjectArray class (from quantum\newquant.cpp) (Figure newquant.11)

codelist/newquant.11

As you can see, this constructor begins by checking whether its argument is non-null; if so, it continues by initializing a member variable to the quantum file pointer provided by that argument. Then it uses that quantum file pointer to retrieve the number of main objects in the quantum file as well as the number of blocks that the main object list occupies. Then it resizes its block pointer list to that count and initializes each of the elements in the list to the appropriate block pointer. We'll see exactly how those block pointers work when we get back to the QuantumFile class, but for now we'll just assume that they allow access to the individual blocks that make up the main object list.

The MainObjectArray::Get Function

Now let's take a look at the Get function in this class (Figure newquant.12).

The MainObjectArray::Get function (from quantum\newquant.cpp) (Figure newquant.12)

codelist/newquant.12

This function is reasonably straightforward. It begins by checking the index of the element in the main object array that the caller wants to retrieve. If it is out of range, the return value is NoObject, indicating an error. However, assuming that the index is valid, the function calculates which block and element within that block corresponds to the requested entry. It then retrieves that element via the MainObjectBlock::Get function, which is called for the m_BlockPtr variable for that block.24 Finally, the result of the Get function, which is the quantum number of the big pointer quantum of the object in question, is returned to the calling function.

The MainObjectArray::Set Function

The next function we'll cover, MainObjectArray::Set (Figure newquant.13), is the counterpart to Get.

The MainObjectArray::Set function (from quantum\newquant.cpp) (Figure newquant.13)

codelist/newquant.13

This function takes two arguments, the element number in the main object array and the quantum number of that object's big pointer quantum. After checking that the element number is valid, it calculates the block number and element number in which the desired entry will be found, then calls the MainObjectBlock::Set function for the m_BlockPtr variable of the appropriate block in the main object array. Finally, if the entry being stored is equal to NoObject and the element number of this element is less than the current "lowest free object", then we update that variable to indicate that we should start looking at this place in the main object array the next time we are searching for a free object.

The MainObjectArray::FindAvailableObject Function

The next function we'll cover, MainObjectArray::FindAvailableObject (Figure newquant.14), is used to find an available entry in the main object array when we want to create a new object.

The MainObjectArray::FindAvailableObject function (from quantum\newquant.cpp) (Figure newquant.14)

codelist/newquant.14

This isn't terribly complicated either. It searches the main object list block by block, starting with the entry corresponding to the last known value of the "lowest free object" variable and continuing until it finds an empty entry (i.e., one whose value is NoObject). Once it finds such an entry, it calculates the corresponding element number in the main object array and returns it as the result after storing it in the "lowest free object" variable.25

The MainObjectArray::GetModifiableElement Function

The next function we'll cover, MainObjectArray::GetModifiableElement (Figure newquant.15), is used to retrieve a ModifiableElement value from a main object.

The MainObjectArray::GetModifiableElement function (from quantum\newquant.cpp) (Figure newquant.15)

codelist/newquant.15

Among its many uses, GetModifiableElement is the function that a FlexArray uses to retrieve values, so it is quite important to the functioning of our test program. Although it is somewhat longer than most of the other functions we've seen so far, it's really not too complicated, and it will serve as a good introduction to the more complex functions in this class.

We start by using the Get member function to look up the big pointer array quantum number for the object in which the element is contained. Once we have that quantum number, we call the function QuantumFile::MakeBigPointerBlockPtr to create a handle object that will allow us to access that quantum as a big pointer array. Next, we calculate which big pointer array entry contains the quantum number of the little pointer block that contains the element reference that we'll use to access the actual data. Once we know that, we can call the BigPointerBlock::GetBigArrayElement function to retrieve the quantum number for that little pointer block. Then we call the function QuantumFile::MakeLittlePointerBlockPtr to create a handle object that will allow us to access that quantum as a little pointer array. Once we have done that, we calculate which element in the little pointer array is the element reference to the element we are interested in, and retrieve that element reference from the little pointer array.

Now we're ready to access the element itself, assuming that it actually exists. First, though, we have to account for the possibility that it is a null item, that is, one that contains no data. If this is the case, we return an empty (i.e., default-constructed) ModifiableElement. Why have I handled this case separately, rather than storing a zero-length ModifiableElement in the leaf block?

A null item is an optimization: we could treat items containing no data like all other items. However, special handling for this case is worthwhile, since it is very common. For example, a word processor might store each line of a file as a string, removing the trailing null from each string before storing it; with that approach, strings consisting only of a trailing null would be stored as null items. The advantage of using null items is that, rather than occupying an item index slot indicating a length of zero, they are represented by item references that contain a special quantum number called NO_QUANTUM and a special item number called NULL_ITEM_NUMBER, neither of which can appear in a real item reference. This means that a null item does not occupy any space in a item index, and that we can avoid loading a quantum to retrieve its value.

However, let us suppose that the element reference is not null. In that case, we call QuantumFile::MakeLeafPointerBlockPtr to create a handle object that will allow us to access items in the quantum containing the desired item. Then we use QuantumBlock::GetModifiableItem to retrieve the actual item and return it to the caller.

The MainObjectArray::PutElement Function

The next function we'll cover, MainObjectArray::PutElement, shown in Figure newquant.16, is used to store a ModifiableElement value into a main object.

The MainObjectArray::PutElement function (from quantum\newquant.cpp) (Figure newquant.16)

codelist/newquant.16

This starts out by using a preprocessor macro called qfassert to check whether the size of the item to be stored is legal, i.e., that it is less than the maximum allowable item size.26 After that, the code is very similar to its counterpart, GetModifiableElement, until we retrieve the item reference from the little pointer array. Then, if the item reference is not null, we call QuantumBlock::DeleteItem to delete the current contents of that element. Then we check whether the size of the element to be stored is zero; if so, we store a null reference in the little pointer array (as noted previously).

On the other hand, if we have some data to be stored, we have to find a place to store it. The most logical place to look is in the quantum that we have most recently used to store something for this object, and this information is therefore maintained by the big pointer block access code in the BigPointerBlock class.27 Therefore, we check whether there may be enough free space in that quantum by calling QuantumFile::QGetFreeSpace.28 Because a free space code is only one byte (to reduce the size of the free space list), it cannot exactly represent the amount of free space in a quantum. Therefore, if the free space list entry indicates that this quantum may have enough space for the item, we have to retrieve the quantum via QuantumFile::MakeLeafBlockPtr and then call QuantumBlock::CalculateFreeSpace to calculate its free space more precisely. If the actual free space is larger than the size of the item, then we have found the quantum we will use to store the item.

However, if that quantum does not have enough space to store the item, then we will have to search for one that does. That is handled by QuantumFile::FindSpaceForItem, which returns the number of the first quantum that fits the bill.

At this point, we have a quantum in which to store the item. Therefore, after setting the "last quantum added to" member variable, we set the quantum number in the element reference to that quantum number. Then we call QuantumBlock::AddItem to add the item to that quantum, and set the item number in the element reference to the value returned from that function. Finally, we update the little array pointer that points to the newly added element.

The MainObjectArray::GetMainObjectElementCount Function

The next function, MainObjectArray::GetMainObjectElementCount (Figure newquant.17), is used to retrieve the number of elements that a main object currently contains.

The MainObjectArray::GetMainObjectElementCount function (from quantum\newquant.cpp) (Figure newquant.17)

codelist/newquant.17

You might think that this would be a trivial "get" function that merely returns the value of a member variable, but that is not the case. The reason is that this information is stored in the quantum file itself, as part of the big pointer array, because it needs to be maintained between executions of this program (or any other program that might want to use the file).

However, it's not too complicated. After using the Get function to retrieve the quantum number of the big pointer array for the object, it uses QuantumFile::MakeBigPointerBlockPtr to provide access to the big pointer array. Then it calls BigPointerBlock::GetBigArrayElementCount to retrieve the value to be returned.

The MainObjectArray::GetMainObjectMaxElementCount Function

The next function, MainObjectArray::GetMainObjectMaxElementCount (Figure newquant.18), is used to retrieve the maximum number of elements that a main object is permitted to contain. Because it is exactly like the previous function except for the function it calls in the BigPointerBlock class, I'll just list it without further comment.

The MainObjectArray::GetMainObjectMaxElementCount function (from quantum\newquant.cpp) (Figure newquant.18)

codelist/newquant.18

The MainObjectArray::CreateMainObject Function

The next function we'll cover, MainObjectArray::CreateMainObject, shown in Figure newquant.19, is used to create a new main object.

The MainObjectArray::CreateMainObject function (from quantum\newquant.cpp) (Figure newquant.19)

codelist/newquant.19

This function is somewhat complex, but that should be understandable because it takes quite a bit of work to create a new main object. We start off by checking whether the user has asked to create a zero-length object. Such an object would cause complications in the way that the size of the little pointer array is calculated; namely, the calculation of the number of blocks needed to hold the little pointer arrays (LittlePointerBlockCount) produces correct results for all values of the element count that are greater than 0 but produces a nonsensical result when the element count is 0. Because arrays can grow in size, a zero-element array may be a reasonable thing to create, so I didn't want to disallow them; the simplest solution, and the one I adopted, was to treat a request to create a zero-element array as a request for a one-element array. Because a zero-length object isn't very useful until it has been resized, this isn't likely to surprise the user.

Once that detail has been handled, we compute a free space entry indicating that this block is no longer available for use. Then we continue by calling QuantumFile::FindEmptyBlock to find a block for the big pointer array, then calling QuantumFile::MakeBigPointerBlockPtr to allow access to that block as a big pointer block.

Now we are ready to begin setting up the big pointer block for use. We start by calling BigPointerBlock::Clear to erase any previous data in the block.29 Next, we call BigPointerBlock::SetQuantumType to set the type of the quantum to indicate that it contains a big pointer array; this information is used for consistency checking when accessing a quantum that is supposed to be a big pointer array quantum.30 Then we call BigPointerBlock::SetMainObjectNumber to set the entry in the big pointer block that indicates which main object it belongs to. This is needed when we are updating the free space list, because each free space entry includes the number of the main object for that quantum. This is relevant because we do not share a quantum among several main objects, to simplify the task of programs that recover as much data as possible from a quantum file that has become corrupted.

The next step is to set up the header for the big pointer array. This contains the current element count, the maximum element count, and the last quantum that we've added an element to; of course, the latter variable is initialized to NoQuantum, as we have not yet added any data to this new object. Once we have set the header up, we call QuantumBlock::AddItem to add it to the big pointer block, so that it will be accessible the next time we use the big pointer array.

Now it's time to initialize the little pointer array for this object and the big pointer array that allows us to access the little pointer array. To start this process, we calculate the number of little pointer blocks that we will need, which is dependent on the number of elements in the object and the number of item references that fit in one block. Once we have determined that number, we can create a big pointer array containing that number of quantum numbers, and a temporary little pointer array that we will use to initialize each block in the little pointer array.

We're up to the loop that initializes all of the blocks in the little pointer array. It begins by calling QuantumFile::FindEmptyBlock to find an empty block that we can use for the little pointer array. Then we call QuantumFile::SetFreeSpace to reserve this quantum for the current main object and mark it as full.31 Next, we set the current element of the big pointer array to the quantum number of the quantum where we are storing the current little pointer array block.

Now we're ready to initialize the little pointer block. We start by calling QuantumFile::MakeLittlePointerBlockPtr to allow us to treat the current quantum as a little pointer block. Once we have done that, we call LittlePointerBlock::Clear to initialize the quantum to an unused state, LittlePointerBlock::SetQuantumType to set the quantum type to "little pointer array" (for consistency checking while retrieving data), LittlePointerBlock::SetMainObjectNumber to allow us to update the free space list when adding data to this quantum, QuantumBlock::AddItem to add a section of the little pointer array to this quantum, and LittlePointerBlock::ClearLittleArray to initialize that section of the little pointer array to null item references.

One detail that might not be immediately obvious is why we have to call QuantumFile::SetFreeSpace again at the end of the loop. The reason is that the QuantumBlock::AddItem function recalculates the free space for the block to which it adds the item. Since we want to prevent any further data from being added to this block, we have to reset the free space manually (to "none available") to indicate that this block is full.

Once we have initialized all of the little pointer array sections, we're ready to finish setting up the big pointer array for use. The first step in that process is to add the big pointer array data we have just filled in to the big pointer quantum, via QuantumBlock::AddItem. Then we use QuantumFile::SetFreeSpace to reserve the big pointer quantum for our new main object and mark it full. Then we use the member function Set to set the big pointer array quantum number for our new main object. Finally, we call the member function PutElement to set up the directory entry for the new main object.

The MainObjectArray::FindObjectByName Function

The next function we'll cover, MainObjectArray::FindObjectByName (Figure newquant.20), is used to look up a main object in the "directory".

The MainObjectArray::FindObjectByName function (from quantum\newquant.cpp) (Figure newquant.20)

codelist/newquant.20

This one is fairly simple. It steps through the elements of the main object directory, looking for an element whose contents are equal to the argument to the function. If it finds such an element, it returns the index of that element, which is the object number of the object with that name. On the other hand, if it gets to the end of the main object directory without finding a match, it returns the value NoObject.

The MainObjectArray::GrowMainObject Function

The next function, MainObjectArray::GrowMainObject (Figure newquant.21), is used to increase the size of a main object when needed to accommodate more elements.

The MainObjectArray::GrowMainObject function (from quantum\newquant.cpp) (Figure newquant.21)

codelist/newquant.21

This is about as complicated as CreateMainObject. However, because it is quite similar to that function, we'll be able to shorten the explanation considerably. We start by retrieving the existing big pointer block for the object. Then we copy the big pointer array from that block to a new big pointer array. Next, we calculate the size of the new little pointer array needed to hold references to all the elements in the new, enlarged, object.

The next few lines of code might be a little mysterious if I don't explain a few details of the size calculation. To simplify the accounting needed to keep track of the sizes of all of the little pointer arrays, I've decided always to allocate a little pointer array to the largest size that will fit into a quantum. However, setting the accessible size of an array to the actual number of elements allocated for its little pointer arrays has the drawback that the user wouldn't be able to find out when the program accessed an element that is within the allocated size but outside the number of elements specified when creating the array. Therefore, I've crafted a compromise: when an array is created, I set the number of accessible elements to the number the user specifies32; however, if the array is increased in size, the new size is always equal to the total allocation for all little pointer arrays. This means that if an array is expanded by a reference to an element that was already allocated but was outside the specified array size, GrowMainObject simply resets the size to the number of elements contained in all little pointer arrays already extant and returns that value. Otherwise, a new little pointer array is allocated, and the big pointer array is updated to reflect the change.

Assuming that we actually do need to allocate some new little pointer blocks, we continue by increasing the size of the big pointer array to accommodate the new number of little pointer blocks. Then we allocate and initialize each of the new little pointer blocks in exactly the same way as we allocated and initialized the original little pointer blocks when we created the object. Then we delete the old big pointer array from the big pointer block, add the new big pointer array to the big pointer block, set the free space entry for the big pointer block to indicate that it is full, and return the new element count to the caller.

Now that we have reached the end of the discussion of the MainObjectArray class, we'll return to the QuantumFile class. This time, we're going to get into the member functions that are used in the implementation, since we've already dealt with those that are used by outside programs.

The QuantumFile class

We'll skip over the QuantumFile member functions that fall in three categories:

1. Those that merely compute the number of blocks occupied by a segment of the quantum file (i.e., GetMainObjectBlockCount and QGetFreeSpaceListBlockCount).

2. Those that merely return a block of the appropriate type to be accessed as a little pointer block, big pointer block, and the like (i.e., MakeFreeSpaceBlockPtr, MakeMainObjectBlockPtr, MakeLittlePointerBlockPtr, MakeBigPointerBlockPtr, MakeLeafBlockPtr).

3. Those that merely call member functions of the FreeSpaceArray class (i.e., SetFreeSpace, QGetFreeSpace, FindEmptyBlock, FindSpaceForItem).

We'll see how the functions in the third category work when we examine the underlying functions that they call.

The QuantumFile::SetModified Function

However, that still leaves us with several functions to discuss in the QuantumFile class. We'll start with QuantumFile::SetModified, which is shown in Figure block.06.

The QuantumFile::SetModified function (from quantum\block.cpp) (Figure block.06)

codelist/block.06

This is actually a fairly simple function. It calls a global function called FindBuffer to look up the specified quantum number in the list of quantum buffers.33 If the quantum with that quantum number is in fact in one of the quantum buffers, then the "buffer modified" flag for that buffer is set, and the function returns. The only complication is a consistency check in the form of an "assert" that the quantum in question is actually in one of the buffers. If that is not the case, then we have a serious problem, because the program should not be trying to set the modified flag for a quantum that isn't in memory: that indicates that the buffer that contained the quantum has been reused too soon.

The QuantumFile::Position Function

The next function we will examine is QuantumFile::Position, which is shown in Figure block.07.

The QuantumFile::Position function (from quantum\block.cpp) (Figure block.07)

codelist/block.07

This function is used by the Read and Write functions to position the file pointer to the correct place for reading or writing a quantum. This is quite simple because every block is the same size. However, it is important to remember that the quantum file contains data other than the quanta themselves, so the block number of a quantum is not the same as its quantum number. To convert between these two numbers is the responsibility of the function that computes the block number supplied to Position.

The QuantumFile::Read Function

The next function we will examine is QuantumFile::Read, which is shown in Figure block.08.

The QuantumFile::Read function (from quantum\block.cpp) (Figure block.08)

codelist/block.08

After positioning the file pointer to the location in the file where this block is stored, we read it into the buffer. Then we set the modified flag for the buffer to FALSE, indicating that the disk and memory versions of the block are the same. Finally, we increment the read count, which is used for performance analysis.

The QuantumFile::Write Function

The next function we will examine is QuantumFile::Write, which is shown in Figure block.09. This is identical to the Read function, except of course that it writes the data to the file rather than reading it.

The QuantumFile::Write function (from quantum\block.cpp) (Figure block.09)

codelist/block.09

The QuantumFile::MakeBlockResident Function

The last function in this class is QuantumFile::MakeBlockResident. This is where we rejoin the thread of discussion about operator->. You see, MakeBlockResident is called during the execution of operator-> for any of the BlockPtr classes.34 Figure mbr1 shows an early implementation of this routine.

Early MakeBlockResident code (Figure mbr1)

codelist/mbr1.00

This is a fairly straightforward implementation of a least recently used (LRU) priority algorithm, but there are a few wrinkles to reduce overhead. First, we use the reference parameter p_OldBufferNumber to retrieve the buffer number (if any) in which the searched-for block was located on our last attempt. If the number of the block held in that buffer matches the one we're looking for, we will be able to avoid the overhead of searching the entire buffer list. The reason that p_OldBufferNumber is a reference parameter is so that we can update the caller's copy when we locate the block in a different buffer; that way, the next time that MakeBlockResident is called to retrieve the same block's address, we can check that buffer number first.

In order to make this work, we can't implement the LRU priority list by moving the most recently used block to the front of the block list; the saved buffer number would be useless if the blocks moved around in the list every time a block other than the most recently used one was referenced. Instead, each slot has an attached timestamp, updated by calling the time function every time the corresponding block is referenced. When we need to free up a buffer, we select the one with the lowest (i.e., oldest) timestamp. If that buffer has the "modified" attribute set, then it has been updated in memory and needs to be written back to the disk before being reused, so we do that. Then we read the new block into the buffer, update the caller's p_OldBufferNumber for next time, and return the address of the buffer.

This seems simple enough, without much scope for vast improvements in efficiency; at least, it seemed that way to me, until I ran Turbo ProfilerTM on it and discovered that it was horrendously inefficient. The profiler indicated that 73% of the CPU time of my test program was accounted for by calls to the time routine! To add insult to injury, the timing resolution of that routine is quite coarse (approximately 18 msec), with the result that most of the blocks would often get the same timestamp when running the program normally.35 Upon consideration, I realized that the best possible "timestamp" would be a simple count of the number of times that MakeBlockResident was called. This would entail almost no overhead and would be of exactly the correct resolution to decide which block was least recently used; this is the mechanism used in the current version.

Punching the Time Clock

One interesting consideration in the original design of this mechanism was what size of counter to use for the timestamp. At first, it seemed necessary (and sufficient) to use a 32-bit type such as an unsigned long, which would allow 4 billion accesses before the counter would "turn over" to 0. However, because the original implementation was compiled with a 16-bit compiler, the natural size to use would be an unsigned (meaning unsigned short) variable. Therefore, I had to decide whether a longer type was necessary.

After some thought, I decided that it wasn't. The question is: what happens when the counter turns over? To figure this out, let's do a thought experiment, using two-byte counters. Suppose that the priority list looks like Figure timestamp1, with the latest timestamp being 65535.

Timestamps before turnover (Figure timestamp1)

Block number    Timestamp

14 65533

22 65535

23 65000

9 65100

Let's suppose that the next reference is to block 9, with the counter turning over to 0. The list will now look like Figure timestamp2.

Timestamps immediately after turnover (Figure timestamp2)

Block number    Timestamp

14 65533

22 65535

23 65000

9 0

The next time that a new block has to be loaded, block 9 will be replaced, instead of block 23, which is actually the least recently used block. What effect will this have on performance? At first glance, it doesn't appear that the maximum possible effect could be very large; after all, each turnover would only cause each buffer to be replaced incorrectly once. If we have 100 buffers (a typical number), the worst case would be that the "wrong" buffer is replaced 100 times out of 64K, which is approximately 1.5% of the time; with fewer buffers, the effect is even smaller. There is no danger to the data, since the buffers will be written out if they have changed. I suspected that the cost (under a 16-bit compiler) of handling a unsigned long counter instead of an unsigned short on every call to MakeBlockResident would probably be larger than the cost of this inefficient buffer use, but it didn't appear important either way.

Getting Our Clocks Cleaned

Although the preceding analysis was good enough to convince me not to worry about the counter turning over, unfortunately it wasn't good enough to convince the machine. What actually happened was that after a large number of block references, the program started to run very slowly. I was right that the data wasn't in danger, but performance suffered greatly. Why? Let's go back to our example. Which buffer would be replaced when the next block needed to be read in? The one currently holding block 9, since it has the "lowest" priority. If that block number happened to be 32, for example, that would leave us with the arrangement in Figure timestamp3.

Timestamps shortly after turnover (Figure timestamp3)

Block number    Timestamp

14 65533

22 65535

23 65000

32 1

The problem should now be obvious: the newest block still has the "lowest" priority! The reason that the program started to run very slowly after turnover was that the "fossil" timestamps on the old blocks were preventing them from being reused for more active blocks, so every block that had to be read in had to share buffers with the ones that had been read in after turnover. The solution was fairly simple; on turnover, I set all of the timestamps to 0 to give every buffer the same priority. This isn't really optimal, since it doesn't preserve the relative priority of the blocks already in memory; however, it has the virtue of simplicity, and does reduce the problem to the fairly insignificant level indicated by my first analysis of the turnover problem.

Speed Demon

Is this the end of our concern for the MakeBlockResident routine? Not at all; as befits its central role in the virtual memory mechanism, this routine has undergone quite a few transformations during the development process. One attempt to speed it up took the form of creating a FastResidenceCheck routine that would have the sole purpose of checking whether the old buffer number saved from the previous call to load the same block number was still good; if so, it would return that buffer number after resetting the timestamp. The theoretical advantage of splitting this function off from the more general case was that such a routine might be simple enough to be inlined effectively, which would remove the overhead of one function call from the time needed to make sure that the block in question was memory resident. Unfortunately, this measure turned out to be ineffective; one reason was that the routines that called MakeBlockResident typically didn't reuse the object where the former buffer number was saved, but had to create another one every time they were called by their client routines. Therefore, the attempt to "remember" the previous buffer number wasn't successful in most cases.

While FastResidenceCheck was in use, it suffered from a bug caused by improperly initializing the old buffer number to 0, a valid buffer number. The result of this error was that when a block happened to be loaded into buffer number 0, operator-> didn't initialize the pointer to the ItemIndex array, since the new buffer number "matched" the old buffer number. This problem would have been solved anyway by the new versions of operator->, which always initialize any pointers that might need to be updated; after the attempt to avoid apparently redundant initializations of these pointers caused a couple of bugs, I decided that discretion was the better part of optimization. As a result of this change, we no longer have to inform the caller of the buffer number that this block contains, so the reference argument to MakeBlockResident for that purpose has been removed.

Virtual Perfection

The acid test of this virtual memory mechanism is to run the program with only one block buffer; unsurprisingly, this test revealed a nasty bug. It seems that I had neglected to initialize the value of the EarliestStamp variable, used to keep track of the buffer with the earliest timestamp. When running with only one buffer, it was possible under some circumstances for a block to be replaced before it was ever used; when this happened, the timestamp on the buffer it had occupied was left set to its initial value of ULONG_MAX. This initial value was significant because the search for the earliest timestamp also starts out by setting the TimeStamp variable to ULONG_MAX, which should be greater than any timestamp found in the search. If there were no blocks in the list with a "real" timestamp, the conditional statement that set the EarliestStamp value in the search loop was never executed. As a result, the EarliestStamp variable was left in an uninitialized state, which caused a wild access to a nonexistent block buffer. The fix was to initialize EarliestStamp to 0, so the first buffer will be selected under these circumstances; you can see this implemented in the current version of MakeBlockResident (Figure block.05).

The QuantumFile::MakeBlockResident function

The MakeBlockResident function (from quantum\block.cpp) (Figure block.05)

codelist/block.05

We've already covered most of the tricky parts of this function, but let's go over it one more time just to be on the safe side. We begin by incrementing the counter that serves as a timestamp; if it turns over to 0, we set all of the timestamps to 0 to prevent the newest block from being replaced continually.36 Then we look up the buffer number for the quantum in question, via the FindBuffer global function. If the quantum is found in a buffer, then we merely reset the timestamp on that buffer to indicate that it is the most recently used, and return a pointer to that buffer for use by the operator-> function that called MakeBlockResident.

However, in the event that the quantum we want isn't yet in a buffer, then we have to figure out which buffer it should be read into. We do this by examining the timestamp for each buffer, and selecting the buffer with the earliest timestamp.

Once we have found the buffer that we are going to reuse, we check whether it has been modified; if so, we write it back to the disk. Then we set its block number to the block number of the new quantum that is being read in, set its modified flag to FALSE, set the timestamp on the buffer to the current timestamp, and read the quantum into the buffer. Finally, we return a pointer to the buffer for use by the calling operator-> function.

The Ptr-Patter of Little Blocks

Now that we've seen how MakeBlockResident works, let's see how it is used in one of the "BlockPtr" classes. Because all of these classes work in much the same way, we will examine only one of them in any detail. I've chosen LittlePointerBlockPtr because it made for a better section title, but any of them would have done just as well. First, Figure blocki.00 shows the interface for this class.

The interface for the LittlePointerBlockPtr class (from quantum\blocki.h) (Figure blocki.00)

codelist/blocki.00

As you can see, this interface isn't very large. In fact, with only one member variable and three functions, it's the smallest one we'll see.37 However, that doesn't mean that it is completely without interest, as we'll discover. Let's start with the normal constructor, shown in Figure block.10.

The normal constructor for the LittlePointerBlockPtr class (from quantum\block.cpp) (Figure block.10)

codelist/block.10

This constructor is simple but unrevealing: to see the effects of the variables it is initializing, we will have to wait until we examine the LittlePointerBlock class. For now, let me just say that the block number is the number of the block where the little pointer block is stored, and the block manager is a pointer to the QuantumFile object that controls the quantum file where the little pointer block is stored. The block variable is a pointer to the buffer where the block is stored in memory; it is assigned a value in the next function we will examine, operator-> (Figure block.11).

The LittlePointerBlockPtr::operator-> Function

The LittlePointerBlockPtr::operator-> function (from quantum\block.cpp) (Figure block.11)

codelist/block.11

As we've seen in the prior discussion of operator->, the return value of this function will be used to point to the object for which the final function will be executed. In this case, that object will be the little pointer block object embedded in this LittlePointerBlockPtr object. The main purpose of this operator-> function is to make sure that the block that the little pointer block object refers to is in memory before we try to do anything with it. That's why QuantumFile::MakeBlockResident exists, and that's why we call it here. Once that function returns, we know that the little pointer block is resident in memory. At that point, we set the item index member variable to the address of the item index in the block, call LittlePointerBlock::SetLittlePointerArray to allow access to the little pointer array in the block, and return the address of our little pointer block object.

When the function that called our operator-> resumes execution, it will reapply the same function call to the pointer that we return. This time, because the return value is in fact a pointer rather than an object, the function call will go through to its final destination.38 This would seem to be a logical time to start our examination of how the body class, LittlePointerBlock, works. However, before we get to that class, we have quite a bit of ground to cover. You see, the block classes are arranged in an inheritance tree that begins with a very simple structure called Block, and builds in complexity from there. The best place to start is at the beginning, so that's what we'll do.

Just Up the Block

Figure blocki.01 shows the definition of the Block union.

The definition of the Block union (from quantum\blocki.h) (Figure blocki.01)

codelist/blocki.01

I don't use the union type very much. In fact, this Block type is the only example of a union in this book. That's because they can be the source of subtle errors if you don't keep track of the actual type of the data in a union very carefully. However, in this case it came in very handy because the different types of blocks have to be identified at run-time anyway; putting the logic that discriminates among the various types of blocks into the various block classes should make it reasonably safe, and in fact I haven't seen any problems from this implementation decision.

So what are all the types of data that can be stored in a Block? The simplest is an array of char taking up the entire block; this low-level type is used when we are copying a block or otherwise treating it as simply a bunch of undifferentiated data. The next possibility, a 64-element array of char, is used in the header block of a quantum file, and consists of a 64 character "version information string". This is used to determine whether the version of the quantum file implementation that wrote the file is the same as the current implementation, as described earlier. The next type is an array of QFreeSpaceEntry elements, which is used in the free space list classes. The next possible type is an array of MainObjectEntry elements, which is used in the main object classes. Finally, a Block can contain a QuantumBlockStruct, which is the data type used to access blocks that contain normal quanta.

The QuantumBlockStruct struct

The last of these types is actually a bit more complicated than it looks, as we'll see when we examine the definition of QuantumBlockStruct, shown in Figure blocki.02.

The QuantumBlockStruct struct (from quantum\blocki.h) (Figure blocki.02)

codelist/blocki.02

What makes this apparently innocent QuantumBlockStruct data type more mysterious than it looks? The clue is in the comment on the m_ItemIndex variable: this is actually a place holder for a variable-length array of item index elements. Although you cannot directly declare a variable-length array in C or C++, it is possible to use such a place holder to simplify the calculation of the beginning address of a variable-length array that is "mapped" onto a buffer, as we do when accessing data in a quantum block. We'll see how this works when we get to the first of the block classes that is used for accessing data in the normal quantum format rather than as one of the other types (e.g., free space entry or main object entries).

The QuantumBlockHeader struct

We should also take a look at the definition of QuantumBlockHeader, shown in Figure blocki.03.

The QuantumBlockHeader struct (from quantum\blocki.h) (Figure blocki.03)

codelist/blocki.03

This struct represents the data that is present at the beginning of every block that holds a normal quantum. It consists of three fields: the quantum type field, the object number field, and the item count field.

The quantum type field contains a value indicating the type of the quantum, such as "little pointer array", "big pointer array", or "leaf node". This information could be used to allow a deeper hierarchy of pointer types, but currently it is used only for consistency checking. That is, the top-level quantum pointed to by a main object entry must be a big pointer array quantum, the quantum pointed to by a big pointer array entry must be a little pointer array quantum, and the quantum pointed to by a little pointer entry must be a leaf node (i.e., one that contains user data).

The object number field contains the number of the main object to which this quantum belongs. This is used when we update the free space list, because we do not share a quantum among more than one main object.

The item count field represents the number of data items in this quantum, which is also the number of entries in the variable-length m_ItemIndex array.

The BaseBlockPtr class

Now that we've discussed those data types, we're ready to look at the base class of the block types, BaseBlockPtr, whose interface is shown in Figure blocki.04.

We've already discussed the purpose of the Block data type, so I don't have to explain why we have such a variable in this class. The block number variable keeps track of which block in the quantum file this object represents. The block manager variable represents the quantum file object itself, which is used when we are reading or writing the block. As for the implementation of the only member function (Figure block.11a), it merely sets the block number and block manager variable to 0, so I won't bother to discuss it further.

The interface for the BaseBlockPtr class (from quantum\blocki.h) (Figure blocki.04)

codelist/blocki.04

The default constructor for the BaseBlockPtr class (from quantum\block.cpp) (Figure block.11a)

codelist/block.11a

Now we're ready to start looking at the derived block classes. I won't bother going over the Ptr handle classes, because they are all essentially identical to LittlePointerBlockPtr, which we've already covered. However, each of the body classes has its own set of member functions appropriate to its specific task, so we'll look at each one in turn, starting with the FreeSpaceBlock class.39

The FreeSpaceBlockPtr::FreeSpaceBlock class

The interface for FreeSpaceBlock is shown in Figure blocki.05.

The interface for the FreeSpaceBlock class (from quantum\blocki.h) (Figure blocki.05)

codelist/blocki.05

The default constructor (Figure block.12) and normal constructor (Figure block.13) for this class merely initialize the block pointer, block number, and block manager variables for the free space block, exactly as the analogous functions do in the LittlePointerBlockPtr class that we've already examined; we'll see how those variables are used when we get to the other member functions.

The default constructor for the FreeSpaceBlock class (from quantum\block.cpp) (Figure block.12)

codelist/block.12

The normal constructor for the FreeSpaceBlock class (from quantum\block.cpp) (Figure block.13)

codelist/block.13

The Get function (Figure block.14) returns an element of the free space data array in the block, using the m_FreeSpaceData member of the Block union.

The FreeSpaceBlock::Get function (from quantum\block.cpp) (Figure block.14)

codelist/block.14

The Set function (Figure block.15) sets the value of an element of the free space data array in the block, also using the m_FreeSpaceData union member.

The FreeSpaceBlock::Set function (from quantum\block.cpp) (Figure block.15)

codelist/block.15

The MakeLeafBlockPtr function (Figure block.16) is used in the FreeSpaceArrayPtr::FreeSpaceArray::FindSpaceForItem function to allow that function to access a block as a leaf block so that it can be used to store data. As you can see, this function calls the corresponding function in the QuantumFile class to do the actual work.

The FreeSpaceBlock::MakeLeafBlockPtr function (from quantum\block.cpp) (Figure block.16)

codelist/block.16

The MainObjectBlockPtr::MainObjectBlock class

The interface for MainObjectBlock is shown in Figure blocki.06.

The interface for the MainObjectBlock class (from quantum\blocki.h) (Figure blocki.06)

codelist/blocki.06

As you can see, this interface is exactly the same as the one for FreeSpaceArrayPtr::FreeSpaceArray, except that the MakeLeafBlockPtr function is not included and the types used by the Get and Set functions are different, as is necessary for their proper functioning. Because the default and normal constructors are exactly the same as those in the previous class, I'm not going to waste space reproducing them here, so let's move right along to the Get and Set functions.

The Get function (Figure block.18) returns an element of the main object array in the block, using the m_MainObjectData union member.

The MainObjectBlock::Get function (from quantum\block.cpp) (Figure block.18)

codelist/block.18

The Set function (Figure block.19) sets the value of an element of the main object array in the block, also using the m_MainObjectData union member.

The MainObjectBlock::Set function (from quantum\block.cpp) (Figure block.19)

codelist/block.19

The QuantumBlock class

The interface for QuantumBlock is shown in Figure blocki.07. This is a more interesting class than the previous few we've discussed; it has a number of fairly complicated functions that we will need to go over in detail. Probably the best place to start is with the AddItem function, because that is the function that adds each item to a quantum.

The interface for the QuantumBlock class (from quantum\blocki.h) (Figure blocki.07)

codelist/blocki.07

The QuantumBlock::AddItem Function

Since there are actually two versions of this function, let's start with the simpler one, which takes a ModifiableElement (i.e., String) first argument. The code for this function is shown in Figure block.20.

One overloaded version of the QuantumBlock::AddItem function (from quantum\block.cpp) (Figure block.20)

codelist/block.20

As you can see, this function retrieves the size and data address from its ModifiableElement argument and then calls the other overloaded AddItem function, returning the result of that function to its caller. So let's continue by looking at the other AddItem function, whose code is shown in Figure block.21.

The other overloaded version of the QuantumBlock::AddItem function (from quantum\block.cpp) (Figure block.21)

codelist/block.21

This starts by setting the modified flag for its block; because we are going to modify the block, we want to make sure that the block is written back to disk before the buffer is reused. Then we call a global function called FindUnusedItem to locate an item index entry that we can use for our new item.40 Then we call CalculateFreeSpace to figure out how much space is available in this block. This is actually a safety measure, as the calling function is responsible for making sure that this block has enough free space before attempting to add an item to the block. But I'd rather be safe than sorry, so I check whether the free space is insufficient; if not, this is an error, which is trapped by the assert that immediately follows.

However, let's assume that we do have sufficient room to store the new item. In that case, we continue by checking whether the item number that was returned by FindUnusedItem is greater than or equal to 0. If so, we are going to reuse an item that has been previously used, so we set its type and index value to the new values supplied as arguments to the AddItem function.

We've referred to the ItemIndex data type a few times in the past, but haven't really looked at what it consists of. I think this would be a good time to do so, so let's take a look at its definition, which is shown in Figure blocki.08.

The definition of the ItemIndex struct (from quantum\blocki.h) (Figure blocki.08)

codelist/blocki.08

This is a fairly simple data type, consisting of an index entry (which represents the element number of this item in the array of which it is an element), the offset of the beginning of the data for the element from the end of the block, and a type indicator that specifies the type of the element. While the purpose of the offset member variable should be fairly obvious, the purpose of the other two member variables may not be. The purpose of the type indicator is consistency checking; if we're trying to access a little pointer array, we can use this type indicator to make sure that we are getting what we expect. In a similar vein, the purpose of the index entry is to provide information needed by a program that will recover data from a quantum file that has become corrupted.

Now let's continue with our analysis of the AddItem function (Figure block.21), where we have just updated the type and index of the item index entry that we are reusing to the new values supplied as arguments to this function. Next, we extract the offset field from that item index entry and call GetItemCount to find out how many items are in the index. Now we have to calculate the parameters needed to shift the data already in the quantum so that we have room for our new item. The offset of the last item is equal to the distance in bytes from the beginning of that item to the first byte after the end of the quantum. Therefore, the address of the beginning of the last item in the quantum, DataFrom, can be calculated as the address of the beginning of the buffer in which the quantum resides, plus the size of a quantum, minus the offset of the last item in the quantum. The byte at that address, and all those after it up to the starting address for the new item's data, must be moved down in memory by enough to leave room for the new data.41 Therefore, the new address of that first byte of the last item's data, DataTo, will be its old address minus p_ElementSize. We want to move all of the items that precede our new item in memory, which means those that have higher item numbers; therefore, the number of bytes we want to move, DataLength, is equal to the distance from the beginning of the last item to the beginning of the area where we will store our new item. Now that we have calculated these parameters, we can call the global BlockMove function to perform the transfer.

Now we have moved the existing data, but the index still refers to the old locations of the data, so we have to call the global AdjustOffset function to adjust the elements of the item index that refer to data that has been moved. Once we have adjusted the item index, we calculate the address where the new data will be stored and call BlockMove again, this time to copy the new data to the correct place in the block. Finally, we calculate the return value from this function, which is the item number of the item we have entered, as one more than the index of the item entry we reused; this adjustment is needed because item numbers start at 1 rather than 0.

I think an example might be very helpful here. Suppose that we want to insert a new field with index number 6 to store the company name ("Chrysalis") in the following quantum, reusing the currently unused fourth item (see Figure insert1).

Item index and data before item insertion (Figure insert1)

          +-
| Item # Offset Type Index
| +---------------------------------+
| 1 | 5 -+ VARSTRING 4 |
Item | 2 | 17 +| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 19 || UNUSED 0 |
| 5 | ++-31 || VARSTRING 1 |
| 6 |+++-38 || VARSTRING 2 |
| ++++----++------------------------+
+- ||| |+------------------------+
||| +-------------+ |
||+----------------+ | |
+- |+-----+ | | |
| +--+------+-----------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335NYSteve Heller11510|
Data | +----------------------------------------+
+-
+-
Quantum | 333333333222222222211111111110000000000
Offset | 876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666
| 111111111111111111111111111111111111111
Address | 000011111111112222222222333333333344444
| 678901234567890123456789012345678901234
+-

To do this, we have to move items 5 and 6 toward the beginning of the quantum by 9 characters, which is the length of the new item. Assuming that the address of the beginning of the buffer holding the quantum is 4096 and BlockSize is 2048, the address of the beginning of the last item is 4096+2048-38, or 6106, which is the value of DataFrom. The value of DataTo is calculated from DataFrom by subtracting the length of the new item, or 9, resulting in 6097. The value of DataLength is the amount of data to move; we have to move everything from the last item up to but not including the place where we are going to put the new data. We could add up the lengths of all of the items from the last item to the one before which we will insert our new data, but it is easier simply to subtract the offset of that item from the offset of the last item, which produces the same result: 38 - 19, or 19 bytes to be moved.

After the move, the quantum looks like Figure insert2.

Item index and data after move (Figure insert2)

          +-
| Item # Offset Type Index
| +--------------------------------------+
| 1 | 5 -+ VARSTRING 4 |
Item | 2 | 17 +| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 19 || UNUSED 0 |
| 5 | ++-31 || VARSTRING 1 |
| 6 |+++-38 || VARSTRING 2 |
| ++++----++-----------------------------+
+- ||| |+---------------------------------+
||| +----------------------+ |
||+-------------------------+ | |
+- |+--------------+ | | |
| +--------+ | | | |
| +-----------+------+-----------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335?????????NYSteve Heller11510|
Data | +-------------------------------------------------+
+-
+-
Quantum | 444444443333333333222222222211111111110000000000
Offset | 765432109876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666666666666
| 999111111111111111111111111111111111111111111111
Address | 999000000000011111111112222222222333333333344444
| 789012345678901234567890123456789012345678901234
+-

Of course, all of the item index entries from the one we are inserting to the last one now have the wrong offset; we have to add the size of the new item to these offsets, to account for the move we just did, which we do by calling AdjustOffset.

We're almost done with this insertion. All we have left is to calculate the actual position of the new item in the quantum as the quantum buffer's address + the size of a quantum - the offset of this item, (4096+2048-28, or 6116) and store the result in DataTo. Then we can call BlockMove to copy the new item's data to that position, and finish up by setting the relative item number (which is the return value from this function) to ItemFound + 1 and returning it. The final quantum contents look like Figure insert3.

Item index and data after item insertion (Figure insert3)

          +-
| Item # Offset Type Index
| +---------------------------------------+
| 1 | 5 --+ VARSTRING 4 |
Item | 2 | 17 -+| VARSTRING 0 |
Index | 3 | +-19 || VARSTRING 3 |
| 4 | | 28-+|| VARSTRING 6 |
| 5 | ++-40 ||| VARSTRING 1 |
| 6 |+++-47 ||| VARSTRING 2 |
| ++++----+++-----------------------------+
+- ||| ||+--------------------------------+
||| |+---------------------+ |
||+----+--------------------+ | |
+- || +-----------+ | | |
| || | | | |
| |+-----+ | | | |
| +--+------+-----------+--------+-+-----------+----+
Quantum | | BaldwinP.O.Box 0335ChrysalisNYSteve Heller11510|
Data | +-------------------------------------------------+
+-
+-
Quantum | 444444443333333333222222222211111111110000000000
Offset | 765432109876543210987654321098765432109876543210
+-
+-
| 666666666666666666666666666666666666666666666666
| 999111111111111111111111111111111111111111111111
Address | 999000000000011111111112222222222333333333344444
| 789012345678901234567890123456789012345678901234
+-

The above scenario handles the case of reusing an item index slot. The code following the comment that says "must make a new item" deals with the case where we have to add a new item index element at the end of the index, which is somewhat simpler, as we don't have to move any preexisting data: our new item is the last in the quantum.

When adding a new item, there are two subsidiary possibilities: that this is the first element in the item index, or that there are existing elements in the item index and we're adding a new one at the end.

Of course, the simplest case of all is the one where there aren't any items in the quantum as yet. In this case, the offset of the new item is equal to its size. However, if there are some items in the quantum already, then the offset of the new item is equal to the offset of the last item in the quantum added to the size of the new item. Once we have calculated the offset, we can move the new item's data to its position in the quantum. Then we set the type and index of the new item and store it in the last position of the m_ItemIndex array, set the ItemNumber to one more than the number of items that were in the array before, increment the item count, and return the relative item number to the calling routine.

Finally, whether we have reused an existing item or added a new one, we call UpdateFreeSpace to recompute the amount of free space in the quantum, then return the item number of the new item.

The QuantumBlock::GetModifiableItem Function

Now that we have seen how to add an item to a quantum, perhaps we should take a look at the function that allows us to get it back: GetModifiableItem (shown in Figure block.22).

The QuantumBlock::GetModifiableItem Function (from quantum\block.cpp) (Figure block.22)

codelist/block.22

As you might imagine, retrieving an item is simpler than storing the item in the first place, because we don't have to move anything around. We start with an assert that checks two conditions: that the item number is greater than 0 and that it is less than or equal to the item count. Once we have passed that sanity check, we decrement the item number to convert it to an index into the item index array, then retrieve the offset of the item from its item index entry. Then we calculate its actual starting address and length, which gives us the information necessary to create a ModifiableElement from the data. We do so, and return the result to the caller.

The QuantumBlock::DeleteItem function

The third fundamental operation that we can perform on an item is to delete it. This operation is used both when simply deleting an item from the main object and when changing the value of an item; to change an item, we actually delete the previous item and add a new one. So let's take a look at DeleteItem (shown in Figure block.23).

The QuantumBlock::DeleteItem function (from quantum\block.cpp) (Figure block.23)

codelist/block.23

After checking that the item number is legal and converting it to an index into the item index array, we determine whether the item to be deleted is the last one in the quantum. If so, we set its type to UNUSED_ITEM. While this would be sufficient to allow the reuse of this item index entry, if we stopped there, we would never be able to reduce the size of the item index; eventually, the quantum would fill up with unused item index entries, leaving no room for actual data. To prevent this, we clean up any trailing unused entries in the item index by searching backwards from the end of the item index, clearing any trailing unused entries to zeroes.42 When we get to an item index entry that is in use, or to the beginning of the item index, we stop searching. At that point, we reset the item count to 1 more than the last value of the index in the loop. If that last loop index value was -1, then we have emptied the quantum, so we call Clear to reset it to an unused state. Finally, we call UpdateFreeSpace to recalculate the amount of free space in the quantum and return the result to the caller.

That takes care of deleting the last item in a quantum, but what if we need to delete some other item? In that case, we extract the starting offset of the item to be deleted and calculate its length. For the first item, the item length is equal to the m_Offset field in the item, because the data for the first item in the quantum is stored at the end of the quantum, as illustrated in Figure itemref1. For other items, the item length is equal to the offset of the previous item subtracted from the offset of the current item. This may seem backward, as the figure shows that the second item is stored before the first, the third before the second, and so on. However, the offset of an item is defined as the distance in bytes from the start of the item to the first byte after the end of the quantum, which is always nonnegative.

We also have to set the offset of this item to the offset of the previous item (if there is a previous item), which will indicate that this item's length is zero. For the first item, we can just set the offset to zero, since there is no previous item to worry about.

Now we're ready to compute the offset of the last item, the source and destination addresses for the data to be moved into the deleted item's space, and the length of that data. We calculate the source address (DataFrom) as the address of the last item (m_Block->m_Data+BlockSize-LastItemOffset), the destination address (DataTo) as the source address added to the length of the item we are deleting (DataFrom+item length), and the number of bytes to move (DataLength) as the total length of all the items from the one we are deleting to the last item (LastItemOffset - StartingOffset) Then we call BlockMove to transfer the data and AdjustOffset to update the offsets of the other items. Next, we set the item's type to UNUSED_ITEM and its offset to the same value as the previous entry in the item index (or 0, if it is the first item) so that the length of the next item can be properly calculated. Finally, we call UpdateFreeSpace and return to the calling function.

To make this clearer, let's look at what happens if we delete the item holding the state name from the example in Figure itemref4. Let's suppose the address of the quantum is 4096. The offset of the last item, marked by "^", is 41; substituting values in the expression for DataFrom, m_Block->m_Data+BlockSize-LastItemOffset equals 4096+2048-41, or 6103. The offset of the item we're deleting, marked by the "*", is 19, and the offset of the previous item (marked by the "&") is 17, which makes the item length 2; therefore, the DataTo address is 6103+2, or 6105. This is where the last item will start after the move, since we are removing the 2 bytes of the item being deleted. Finally, DataLength is the amount of data to be moved, which is the length of all the items with higher index numbers than the one being deleted, in this case 41-19, or 22.

Sample IRA, etc., before deletion (Figure itemref4)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +----------------------+
Reference | 1 | 3 4 +----------------------+-+
Array | 2 | 3 5 +----------------------+-+-+
(IRA) | 3 | 3 3 +-------------------+ | | |
| 4 | 3 1 +-----------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +----------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | +-19 || VARSTRING 3 |±----+ | |
Quantum | 4 | ++-34 || VARSTRING 1 |±---------+ |
3 | 5 |+++-41 || VARSTRING 2 |±-----------+
| 6 |||| 41 || UNUSED 0 |
| ++++----++-------------------+
+- ||| |+---------------------------+
||| +----------------+ |
||+-------------------+ | |
+- |+-----+ | | |
| +--+------+--------------+-+-----------+----+
Quantum | | Baldwin123 Main StreetNYSteve Heller11510|
Data | +-------------------------------------------+
+- ^ * &
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-

After we do the move, the result looks like Figure itemref5. The data is in the right place now, but the pointers are wrong. Specifically, the m_Offset fields of the items that have higher item numbers than the one we have deleted are off by the length of the deleted field. That's why we have to call AdjustOffset to reduce the m_Offset field of every item after the deleted one by ItemLength, which is the length of the deleted item. Then we set the data type of the deleted item in the item index to UNUSED_ITEM.

Sample IRA, etc., during deletion (Figure itemref5)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +----------------------+
Reference | 1 | 3 4 +----------------------+-+
Array | 2 | 3 5 +----------------------+-+-+
(IRA) | 3 | 3 3 +-------------------+ | | |
| 4 | 3 1 +-----------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +----------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | +-19 || VARSTRING 3 |±----+ | |
Quantum | 4 | ++-34 || VARSTRING 1 |±---------+ |
3 | 5 |+++-41 || VARSTRING 2 |±-----------+
| 6 |||| 41 || UNUSED 0 |
| ++++----++-------------------+
+- ||| |+---------------------------+
||| +----------------+ |
||+-------------------+ | |
+- |+-----+ | | |
| +--+------+--------------+-+-----------+----+
Quantum | | Baldwin123 Main StreetSteve Heller11510|
Data | +-------------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-

After correcting the index, our quantum looks like Figure itemref6. We can now access the data in the quantum correctly, as long as we use the "official" route through the IRA and the item index.43

Sample IRA, etc., after deletion (Figure itemref6)

          +-
| Quantum Item
| Index Number Number
| +-------------+
Item | 0 | 3 2 +--------------------+
Reference | 1 | 3 4 +--------------------+-+
Array | 2 | 3 5 +--------------------+-+-+
(IRA) | 3 | 3 3 +-----------------+ | | |
| 4 | 3 1 +---------------+ | | | |
| +-------------+ | | | | |
+- | | | | |
| | | | |
+- | | | | |
| Item # Offset Type Index | | | | |
| +--------------------------+ | | | | |
Item | 1 | 5 -+ VARSTRING 4 |±--+ | | | |
Index | 2 | 17 +| VARSTRING 0 |±----+--+ | |
for | 3 | 17 || UNUSED 0 |±----+ | |
Quantum | 4 | +--32 || VARSTRING 1 |±---------+ |
3 | 5 |++--39 || VARSTRING 2 |±-----------+
| 6 ||| 39 || UNUSED 0 |
| +++-----++-----------------+
+- || |+---------------------------+
|| +----------------+ |
|+-------+ | |
+- +-+ | | |
| +----+------+--------------+-----------+----+
Quantum | | Baldwin123 Main StreetSteve Heller11510|
Data | +-------------------------------------------+
+-
+-
Quantum | 443333333333222222222211111111110000000000
Offset | 109876543210987654321098765432109876543210
+-

The QuantumBlock::CalculateFreeSpace function

Another interesting function in this class is CalculateFreeSpace. You might think that this would be a very simple function, but it has a few twists, as you will see by looking at its code, shown in Figure block.24.

The QuantumBlock::CalculateFreeSpace function (from quantum\block.cpp) (Figure block.24)

codelist/block.24

As we frequently do, we begin with an assert that tests the validity of the key data used by the function; in this case, that takes the form of checking that the number of items in this quantum is within the legal limit, i.e., that it is less than the NoItem constant used to indicate an invalid item number. Assuming that it passes that test, we then check whether we have already stored the maximum possible number of items in this quantum. If so, we "lie" to the caller by returning the value 0, indicating that there is no space available in this quantum. Of course, this is a white lie, because without an available entry in the item index, we cannot in fact add any new items to this quantum, so the caller needs to look elsewhere.

Assuming that we pass that test, we can calculate the free space in the quantum by a series of steps. First, we subtract the size of the quantum block header (plus a margin of safety) from the size of a block. If the number of items in the block is equal to 0, we have the answer: the quantum should be marked as having the maximum possible free space, so we compute the free space code that indicates this situation and return it to the caller.44 On the other hand, if there are some items in the quantum then we calculate the offset of the last item in the quantum and subtract it from the free space. Then we calculate the size of the item index as the number of entries in the item index multiplied by the size of one entry, and subtract it from the free space. If the result comes out to be less than 0, we return 0; otherwise, we return the calculated free space.

The QuantumBlock::UpdateFreeSpace Function

As long as we're discussing the intriguing topic of free space, we might as well take a look at UpdateFreeSpace, whose code is shown in Figure block.25.

The QuantumBlock::UpdateFreeSpace function (from quantum\block.cpp) (Figure block.25)

codelist/block.25

Why do we need this function, when we already have a function to calculate the amount of free space in a quantum? This is not just another version of the CalculateFreeSpace function, because it actually updates the information in the free space list. As you may recall, the free space list stores more than just the amount of free space in a quantum; it also stores the object number of the main object that quantum belongs to. This is so that we can avoid scanning a lot of quanta to find one that has enough free space in it and also belongs to the main object that we need the additional space for. Now that we know what this function does, it shouldn't be too hard to determine how it does it. First, it sets the object number of a newly created free space object to the object number of the block for which the free space is being updated. Then it converts the free space calculated by the CalculateFreeSpace function to a free space code, which occupies only one byte. Finally, it uses the SetFreeSpace function of the quantum file class to update the free space list with the new information for this quantum.

The rest of the functions in the quantum block class aren't particularly interesting or instructive, so we won't bother to go over them. You shouldn't have much trouble understanding how they work by simply looking at the code, which is all on the CD ROM in the back of the book, along with the DJGPP compiler. The next class we will examine is the big pointer block class, which is derived from the quantum block class.

The BigPointerBlock class

The interface for BigPointerBlock is shown in Figure blocki.09.

The interface for the BigPointerBlock class (from quantum\blocki.h) (Figure blocki.09)

codelist/blocki.09

Most of the functions in this class aren't very interesting, so we won't bother to go over them. However, there are a couple that we should take a quick look at, because they contain a few tricks that you should understand if you're going to have to modify the behavior of any of these classes. We'll start with SetBigArrayHeader, whose code is shown in Figure block.26.

The BigPointerBlock::SetBigArrayHeader Function

The BigPointerBlock::SetBigArrayHeader function (from quantum\block.cpp) (Figure block.26)

codelist/block.26

The purpose of this function is to allow us to access data that is stored in the big pointer block quantum. To be precise, the data that we are going to access is the "big array header", which includes information about the number of elements in the array, the maximum number of elements in the array, and the last quantum where we added data for this array.45 As you can see by looking at the code, the member variable m_BigArrayHeader is a pointer to the data in the quantum. This is not hazardous, because the header consists of several fields, each of which is fixed length; therefore, we don't have to worry about running off the end of the data in the header.

Once we have executed the SetBigArrayHeader function, we can call the various access functions that get and set these values in the array header, where it resides in the big pointer array quantum buffer in memory.46

The BigPointerBlock::SetBigPointerArray Function

The next function we'll look at is SetBigPointerArray, whose code is shown in Figure block.27.

The BigPointerBlock::SetBigPointerArray function (from quantum\block.cpp) (Figure block.27)

codelist/block.27

This function is superficially similar to the one that we just discussed; its purpose is to allow us to access data that is in the big pointer array quantum. However, there is a major difference between these two functions: while the previous function could safely provide us with a simple pointer to the big array header data structure in the big pointer array quantum, that would be extremely unsafe in the case of the big pointer array itself. The problem is that a big pointer array can have a variable number of elements in it, so it would be all too easy for us to accidentally step off the end of the big pointer array, with potentially disastrous results. To prevent such a possibility, I have created the data type called AccessVector. The purpose of this data type is to combine the safety features of a normal SVector with the ability to specify the address where the data for the SVector should start, rather than relying on the run-time memory allocation library to assign the address. Because this data type is designed to refer to an existing area of memory, the copy constructor is defaulted, as is the assignment operator and the destructor, and there is no SetSize function such as exists for "regular" SVectors. This data type allows us to "map" a predefined structure onto an existing set of data, which is exactly what we need to access a big pointer array safely.

As this suggests, the "big array header" is an AccessVector variable, which we can use just as though it were a normal SVector.47

The LittlePointerBlock class

The interface for LittlePointerBlock is shown in Figure blocki.10.

The interface for the LittlePointerBlock class (from quantum\blocki.h) (Figure blocki.10)

codelist/blocki.10

There's nothing in this class that isn't exactly analogous to the corresponding functions in the big pointer array class. Therefore, I won't waste either your time or mine by repeating the analysis of the big pointer array class here. Instead, let's move along to another class that is somewhat more interesting, if only because it seems to have no purpose for existing.

The LeafBlock class

The interface for LeafBlock is shown in Figure blocki.11.

The interface for the LeafBlock class (from quantum\blocki.h) (Figure blocki.11)

codelist/blocki.11

This is a real oddity: a class that defines no new member functions or member variables. Of what value could such a class possibly be?

The answer is that it provides a "hook" for attaching a handle class, namely LeafBlockPtr. This allows us to use a leaf block just as we would any of the other quantum classes. If we did not have this class, we could always create another class called QuantumBlockPtr, which would have much the same effect as creating this class. So why did I create this class in the first place?

The answer is that originally it did have some member functions, but they eventually turned out to be superfluous. At this point in the development of this project, it would probably be unwise for me to go through the code to root out all of the references to this class. And after all, a class that defines no new member functions or member variables certainly can't take up too much extra space; in fact, using this class should have absolutely no effect on the size of the program or its execution time, so I think I'll leave it just as it is, at least for now.

The FreeSpaceArray class

Finally, we're finished with the block classes. Our next target of opportunity will be the classes that maintain and provide access to the free space list. We'll start with FreeSpaceArray, whose interface is shown in Figure newquant.22.

The interface for the FreeSpaceArray class (from quantum\newquant.h) (Figure newquant.22)

codelist/newquant.22

The Normal Constructor for FreeSpaceArray

The first function we'll look at is the normal constructor for FreeSpaceArray, whose code is shown in Figure newquant.23.

The normal constructor for FreeSpaceArray (from quantum\newquant.cpp) (Figure newquant.23)

codelist/newquant.23

As you can see, all this function does (as is common in the case of constructors) is to initialize a number of member variables. Most of these initializations are fairly straightforward, but we should go over them briefly. First, we set the current lowest free block number to 0, because we have no idea what blocks might be free in the free space list, as we have not looked through it yet. Then we get the free space list count, the number of blocks in the free space list, and the quantum number adjustment from the quantum file object; this last value is used when we need to convert between block numbers and quantum numbers. Next, we resize the block pointer SVector so it can hold block pointers for all of the free space list blocks in the quantum file. Finally, we assign free space block pointers to all the elements of that SVector. Now we are ready to access the free space list.

The FreeSpaceArray::Get Function

The next function we'll look at is FreeSpaceArray::Get, whose code is shown in Figure newquant.24.

The FreeSpaceArray::Get function (from quantum\newquant.cpp) (Figure newquant.24)

codelist/newquant.24

The operation of this function is fairly straightforward. First, we check whether we're trying to access something that is off the end of the free space list. If so, we return a value that indicates that there is no free space in the quantum for which information was requested. However, if the input argument is valid, we calculate which block and which element in that block contains the information we need. We then call the Get function of the block pointer to retrieve that element. Finally, we return the result to the caller.

The FreeSpaceArray::Set Function

The next function we'll look at is FreeSpaceArray::Set, whose code is shown in Figure newquant.25.

The FreeSpaceArray::Set function (from quantum\newquant.cpp) (Figure newquant.25)

codelist/newquant.25

This function is very similar to its counterpart, the Get function. However, there is one difference that we should look at: if the entry that we have just found in the array indicates that its quantum is completely empty (i.e., has the maximum available space) and this entry has a lower index than the current value of the "lowest free block" variable, then we reset the "lowest free block" variable to indicate that this is the lowest free block.

Free the Quantum 16K!

This is an optimization whose purpose is to avoid searching the entire free list every time we want to find a block that isn't committed to any particular main object. In both the previous C implementation and the current C++ one, we first check the last quantum to which we added an item; if that has enough space to add the new item, we use it.48 In the old implementation, the free space list contained only a "free space code", indicating how much space was available in the quantum but not which object it belonged to. Therefore, when we wanted to find a quantum belonging to the current object that had enough space to store a new item, we couldn't use the free space list directly. As a substitute, the C code went through the current little pointer array, looking up each quantum referenced in that array in the free space list; if one of them had enough space, we used it. However, this was quite inefficient; since each quantum can hold dozens or hundreds of items, this algorithm might require us to look at the same quantum that many times!49 Although this wasn't too important in the old implementation, where the free space list was held in memory, it could cause serious delays in the current one if we used the standard virtual memory services to access the free space list. The free space list in the old program took up 16K, one byte for each quantum in the maximum quantum file size allowed. In the new implementation, using 16K blocks of virtual memory, that same free space list would occupy only one block, so searching such a list would not require any extra disk accesses. However, the current implementation can handle much larger quantum files that might contain tens or hundreds of thousands of blocks, with correspondingly larger free space lists. Using the old method, searching the free space list from beginning to end could take quite a while, because the search routine would not access the list in a linear manner and therefore might require extra disk accesses to access the same free space list entries several times. At the very least, the free space blocks would be artificially promoted to higher levels of activity and would therefore tend to crowd other quanta out of the buffers.

Even if the free space blocks were already resident, virtual memory accesses are considerably slower than "regular" accesses; it would be much faster to scan the free space list sequentially by quantum number than randomly according to the entries in the little pointer array. Of course, we could make a list of which quanta we had already examined and skip the check in those cases, but I decided to simplify matters by another method.

The FreeSpaceArray::FindSpaceForItem Function

In the current implementation, the free space list contains not just the free space for each quantum but also which object it belongs to (if any).50 This lets us write a FreeSpaceArray::FindSpaceForItem routine that finds a place to store a new item by scanning each block of the free list sequentially in memory, rather than using a virtual memory access to retrieve each free space entry; we stop when we find a quantum that belongs to the current object and has enough free space left to store the item (Figure newquant.26).51

The FreeSpaceArray::FindSpaceForItem function (from quantum\newquant.cpp) (Figure newquant.26)

codelist/newquant.26

However, if there isn't a quantum in the free space list that belongs to our desired main object and also has enough space left to add the new item, then we have to start a new quantum; how do we decide which one to use?

One way is to keep track of the first free space block we find in our search and use it if we can't find a suitable block already belonging to our object. However, I want to bias the storage mechanism to use blocks as close as possible to the beginning of the file, which should reduce head motion, as well as making it possible to shrink the file's allocated size if the amount of data stored in it decreases. My solution is to take a free block whenever it appears; if that happens to be before a suitable block belonging to the current object, so be it. This appears to be a self-limiting problem, since the next time we want to add to the same object, the newly assigned block will be employed if it has enough free space and is the first suitable block in the list.

This approach solves another problem as well, which is how we determine when to stop scanning the free space list in the first place. Of course, we could also maintain the block number of the last occupied block in the file and stop there. However, I felt this was unnecessary, since stopping at the first free block provides a natural shortcut, without contributing any obvious problems of its own. However, as with many design decisions, my analysis could be flawed: there's a possibility that using this algorithm with many additions and deletions could reduce the space efficiency of the file, although I haven't seen such an effect in my testing.

This mechanism did not mature without some growing pains. For example, the one-byte FreeSpaceCode code, used to indicate the approximate space available in a quantum, is calculated by dividing the size by a constant (32 in the case of 16K blocks) and discarding the remainder. As a result, the size code calculated for items of less than 32 bytes is 0. Since the original check for sufficient space in a quantum tested whether the space code for the quantum was greater than or equal to the space code for the new item, even blocks with no free space (i.e., those having a free space code of 0) were considered possible storage places for these very small items. This resulted in a large number of unnecessary disk accesses to read those quanta so that their actual free space could be calculated. Changing from ">=" to ">" in the test fixed this; of course, I could also have made the size calculation round up rather than down. Another problem I encountered with the free space handling in the new algorithm is that blocks and quanta are no longer synonymous, as they were in the old program. That is, the virtual memory system deals not only with quanta (i.e., blocks containing user data) but also with free space blocks and the block(s) occupied by the main object list. At one point, some routines dealing with the free space list were using physical block numbers and others were using logical quantum numbers; however, since only blocks containing user data are controlled by the free space list, the correct solution was to use only logical quantum numbers in all such routines.

The FreeSpaceArray::FindEmptyBlock Function

The last function that we will cover in this class is FindEmptyBlock, which is shown in Figure newquant.27.

The FreeSpaceArray::FindEmptyBlock function (from quantum\newquant.cpp) (Figure newquant.27)

codelist/newquant.27

As you can see, this code uses the member variable called m_CurrentLowestFreeBlock to reduce the amount of time needed to find a free block as the file fills up. Each time that a free block with a block number less than the current value is created or located, the m_CurrentLowestFreeBlock variable is updated to point to that block, and FindEmptyBlock starts at that location when looking for an empty block. FindEmptyBlock has an understandable resemblance to FindSpaceForItem, since both of them search the free space list looking for a quantum with a certain amount of space available; however, this resemblance misled me into introducing a bug in FindEmptyBlock. The problem came from the fact that I was using m_CurrentLowestFreeBlock to determine the starting block and element for searching the free space list before entering the outer loop, rather than initializing the loop indices in the for statements. This worked properly on the first time through the outer loop that is executed once for each block in the free space list, but when all the elements of the first block had been read and the outer loop index was incremented to refer to the next block, the inner index was left unchanged rather than being reset to 0 to address the first element of the new block. This would have showed itself in some unpleasant way after the blocks controlled by the first block of the free space list were allocated to objects, but luckily I found it by examination before that occurred.

The FreeSpaceArrayPtr class

The last class involved in the maintenance of the free space list is FreeSpaceArrayPtr, whose interface is shown in Figure newquant.28.

The interface for the FreeSpaceArrayPtr class (from quantum\newquant.h) (Figure newquant.28)

codelist/newquant.28

Because this is a perfectly typical handle class, similar in every way to the ones that we've discussed already, you should be able to figure it out without further explanation.

The Global Functions

There are four functions that we haven't covered yet: BlockMove, AdjustOffset, FindUnusedItem, and FindBuffer. What they have in common is that they're all used quite a few times during the execution of the program and therefore would be good candidates for rewriting in assembly language. In the previous edition of this book I did just that, so why am I not doing it here?

There are actually two reasons. First, in the several years that have elapsed since the publication of the second edition, the use of processors other than Intel's has become much more common in high-end applications. Obviously, any assembly language enhancements would be useless to readers who are using, for example, DEC's Alpha processor.

However, an equally good reason is the lack of a standard for creating assembly language functions that interface with C++ programs or using assembly language in a C++ function, which would limit the utility of any such performance enhancements to those who were using the same compiler as the one for which I developed them.

Raising the Standard

Happily, compatibility among C++ compilers themselves has improved considerably of late. After several decades of development of the C++ language, which led to a number of similar but not identical dialects, ISO and ANSI (the international and American standards bodies, respectively) have finally approved a C++ standard. The compiler manufacturers now have no excuse for not producing compilers that comply with this standard, and in fact most of the features of the standard have already been incorporated into a number of commercial compilers.

The similarity among C++ compilers is already sufficient that I was able to compile all the programs in this book using both Microsoft's Visual C++ 5.0 compiler and the DJGPP 2.8.0 compiler on the CD-ROM in the back of the book. They should also compile successfully on any compiler that complies with the new draft standard for C++.52 Unfortunately, there is no such standard for assembly language enhancements to C++ programs. Therefore, any such enhancements that I would write would be useful only with one compiler, and I elected to eliminate such enhancements rather than providing them only for one compiler.

However, this does not mean that such enhancements would be useless for you. If you know what machine and compiler you're going to be using, and if the performance of this program is unacceptable on the machine which it must run, then you may very well want to write assembly language functions to replace C++ functions that are heavily used at runtime. The four functions whose declarations appear in Figure asmfunc.00 are excellent candidates for replacement with assembly language equivalents, according to the tests that I ran for the second edition of this book.

The declarations of the global functions (quantum\asmfunc.h) (Figure asmfunc.00)

codelist/asmfunc.00

The BlockMove Function

Let's start with the simplest of these functions, BlockMove, shown in Figure asmfunc.01.

The BlockMove global function (from quantum\asmfunc.cpp) (Figure asmfunc.01)

codelist/asmfunc.01

As you can see, this function consists of nothing more than a call to the standard C library function memmove. It might be hard to imagine that such a basic function could be improved upon, but that's what I found in the previous edition of this book: by re-coding that routine in assembly language I was able to speed it up by a factor of approximately two to one over the version supplied with Borland C++ 3.1. Of course, without doing the same thing for the compilers I'm using in this book, I can't guarantee that such a speedup is possible; however, it is a good place to look for performance improvement because it is used very frequently.

The AdjustOffset Function

The next of these functions, AdjustOffset, is shown in Figure asmfunc.02.

The AdjustOffset global function (from quantum\asmfunc.cpp) (Figure asmfunc.02)

codelist/asmfunc.02

This one also isn't very complicated. It merely steps through each of the items in an item index, starting from the one whose address is passed as its first argument, for the count provided as the second argument. To each of these items, it adds the adjustment provided as the third argument, which can be either positive or negative, depending on the reason that we're calling this function. If we have deleted something, then the adjustment will be negative because the items referred to by the item index will be moving closer to the end of the quantum. On the other hand, if we've inserted new data into the quantum, then the adjustment will be positive because those items will be moving farther from the end of the quantum.

The FindUnusedItem Function

The next of these functions, FindUnusedItem, is shown in Figure asmfunc.03.

The FindUnusedItem global function (from quantum\asmfunc.cpp) (Figure asmfunc.03)

codelist/asmfunc.03

This function is slightly more complicated than the previous two, but it's still pretty simple. It steps through the item index of a quantum, checking for an entry whose type is UNUSED_ITEM. If it finds such an entry, it returns the index of that entry; otherwise, it returns the value -1 to indicate that there are no unused entries. We use this function when we want to insert a new item in a quantum and need to find an available item index entry for that item.

The FindBuffer Function

The last of these functions, FindBuffer, is shown in Figure asmfunc.04.

The FindBuffer global function (from quantum\asmfunc.cpp) (Figure asmfunc.04)

codelist/asmfunc.04

This function steps through the block number list that keeps track of which block is in each buffer, looking for a particular block number. If it finds that the desired block is in one of the buffers, it returns the index of that buffer in the block number list. If the block isn't in any of the buffers, it returns the special value -1 to indicate that the search failed.

Summary

In this chapter, we have seen how quantum files allow us to gain efficient random access to a large volume of variable-length textual data. In the next chapter, we will use this algorithm as a building block to provide random access by key to large quantities of variable-length data.

Problems

  1. What modifications to the quantum file implementation would be needed to add a "mass load mode" to facilitate the addition of large numbers of records by reducing the free space list search time?
  2. What changes to the "Directory" facility would make it more generally useful?
  3. Write a QFIX program that employs the redundant information in the block headers and item index to reconstruct as much as possible of the logical structure of a quantum file that has become corrupted: i.e., in which at least one of the blocks is unreadable or logically inconsistent.

(You can find suggested approaches to problems in Chapter artopt.htm).

Footnotes

  1. Obviously, this is a simplification: normally, we would want to be able to find a customer's record by his name or other salient characteristic. However, that part of the problem can be handled by, for example, a hash coded lookup from the name into a record number, as we will see in the next chapter. Here we are concerned with what happens after we know the record number.
  2. This figure could just as easily be considered a layout for a single record with variable-length fields; however, the explanation is valid either way.
  3. The problem of changing the length of an existing record can be handled by deleting the old version of the record and adding a new version having a different length.
  4. I am indebted for this extremely valuable algorithm to its inventor, Henry Beitz, who generously shared it with me in the mid-1970's.
  5. In general, I use the terms "quantum" and "block" interchangeably; in the few cases where a distinction is needed, I will note it explicitly.
  6. In the current implementation, the default block size is 16K. However, it is easy to change that size in order to be able to handle larger individual items or to increase storage efficiency.
  7. Some blocks used to store internal tables such as the free space list are not divided into items, but consist of an array of fixed-size elements, sometimes preceded by a header structure describing the array.
  8. For simplicity, in our sample application each user record is stored in one item; however, since any one item must fit within a quantum, applications dealing with records that might exceed the size of a quantum should store each potentially lengthy field as a separate item.
  9. Four of these bytes are used to hold the index in the IRA to which this item corresponds; this information would be very helpful in reconstructing as much as possible of the file if it should become corrupted. Another two bytes are used to keep track of the type of the item. These entries are for error trapping and file reconstruction if the file should somehow become corrupted.
  10. Of course, this assumes that we have set the parameters of the quantum file to values that allow us to expand the file to a size large enough to hold that much data. The header file "blocki.h" contains the constants BlockSize and MaxFileQuantumCount, which together determine the maximum size of a quantum file. The beginning of that file also contains a number of other constants and structures related to this issue; you should be able to modify the capacity and space efficiency of a quantum file fairly easily after examining that header file.
  11. The limit is 256 so that an object number can fit into one byte; this reduces the size of the free space list, as we will see later.
  12. By the way, there is a possible optimization that could be employed here: sorting the buffers to be rewritten to the disk in order of their quantum numbers (i.e., their positions in the file). This could improve performance in systems where the hard disk controller doesn't already provide this service; however, most (if not all) modern disk systems take care of this for us, so sorting by quantum number would not provide any benefit.
  13. The second edition also included this C implementation along with an earlier, less capable, version of the C++ implementation we're examining here.
  14. Another restriction in C++ operator overloading is that it's impossible to make up your own operators. According to Bjarne Stroustrup, this facility has been carefully considered by the standards committee and has failed of adoption due to difficulties with operator precedence and binding strength. Apparently, it was the exponentiation operator that was the deciding factor; it's the first operator that users from the numerical community usually want to define, but its mathematical properties don't match the precedence and binding rules of any of the "normal" C++ operators.
  15. I'm not claiming this is a good use for overloading; it's only for tutorial purposes.
  16. Warning: do not compile and execute the program in Figure
  17. overload2. Although it will compile, it reads from random locations, which may cause a core dump on some systems.
  18. By the way, this isn't just a theoretical problem: it happened to me during the development of this program.
  19. This is an example of the "handle/body" class paradigm, described in Advanced C++: Programming Styles and Idioms, by James O. Coplien (Addison-Wesley Publishing Company, Reading, Massachusetts, 1992). Warning: as its title indicates, this is not an easy book; however, it does reward careful study by those who already have a solid grasp of C++ fundamentals. For a kinder, gentler introduction to several advanced C++ idioms of wide applicability, see my Who's Afraid of More C++? (AP Professional, San Diego, California, 1998).
  20. In order to solve this problem in a more general way, the ANSI standards committee for C++ has approved the addition of "namespaces", which allow the programmer to specify the library from which one or more functions are to be taken in order to prevent name conflicts.
  21. Of course, there are other ways to accomplish the goal of protecting the class user from concern about internals of a given class, as we've discussed briefly in the sections titled "Data Hiding" and "Function Hiding".
  22. If we had any functions that could change the contents of a shared object, they would also have to be modified to prevent undesirable interactions between "separate" handle objects that share data. However, we don't have any such functions in this case.
  23. Actually, the reference count should never be less than 0, but I'm engaging in some defensive programming here.
  24. To reduce the length of the function names in this class, I'm going to omit the MainObjectArrayPtr qualifier at the beginning of those names.
  25. We'll see exactly how this block access works when we cover the MainObjectBlock class, but for now it's sufficient to note that the main object array is potentially divided into blocks which are accessed via the standard virtual memory system.
  26. Actually, the name of the "lowest free object" variable should be something like m_StartLookingHere, but I doubt it will cause you too much confusion after you see how it is used.
  27. If a preprocessor variable called DEBUG is defined, then the action of this macro will be to terminate the program if the condition is not met; otherwise, it will do nothing. You can find the implementation of this macro and its underlying function in qfassert.h and qfassert.cpp.
  28. We often will step through an array assigning values to each element in turn, for example when importing data from an ASCII file; since we are not modifying previously stored values, the quantum we used last is the most likely to have room to add another item. In such a case, the most recently written-to quantum is half full on the average; this makes it a good place to look for some free space. In addition, it is very likely to be already in memory, so we won't have to do any disk accesses to get at it.
  29. By the way, this function was originally named GetFreeSpace, and its return type was called FreeSpaceEntry, but I had to change the name of the return type to avoid a conflict with a name that Microsoft had used once upon a time in their MFC classes and still had some claim on; I changed the name of the function to match. This is a good illustration of the need to avoid polluting the global name space; using the namespace construct in the new C++ standard would be a good solution to such a problem.
  30. The alert reader will notice that the type of the NewBigPointerBlock variable is BigPointerBlockPtr, not BigPointerBlock. However, the functions that we call through that variable via the operator-> are from the BigPointerBlock class, because that is the type of the pointer that operator-> returns, as explained in the section on overloading operator->.
  31. Another possible use is to implement variant arrays, in which the structure of the array is variable. In that case, we might use the type to determine whether we have the item we want or some kind of intermediate structure requiring further processing to extract the actual data.
  32. The reason we mark this quantum as being full is twofold: first, there won't be very much (if any) space left in this quantum after we have added the little pointer array; and second, we don't want to store any actual data for our new main object in the same quantum as we are using for a section of the little pointer array, to make the reconstruction of a partially corrupted file easier.
  33. There's one exception to this rule, for reasons described above: if the user specifies 0 elements, I change it to 1.
  34. It would probably have been better to create a class to contain this function as well as a few others that are global in this implementation.
  35. These are FreeSpaceBlockPtr, MainObjectBlockPtr, BigPointerBlockPtr, LittlePointerBlockPtr, and LeafBlockPtr.
  36. Of course, while stepping through the program at human speeds in Turbo Debugger, the timestamps were nicely distributed; this is a demonstration of Heisenberg's Uncertainty Principle as it applies to debugging.
  37. In a 32-bit implementation, it's entirely possible that the counter will never turn over, as its maximum value is more than four billion. However, if you let the program run long enough, eventually that will occur; at full tilt, such an event might take a few months.
  38. At least, it's the smallest interface for a class that actually does anything. As we'll see, this program contains one class that doesn't contribute either data or functions. I'll explain why that is when we get to it.
  39. For a detailed example of how this works, see the section entitled "Polite Pointing".
  40. I'm not going to prefix the name of each embedded function or class with the name of its enclosing class, to make the explanations shorter; I'll just include it in the title of the main section discussing the class.
  41. I'll cover this and the other global functions in the next chapter.
  42. It is important to remember that the last item in a quantum is actually at the lowest address of any item in that quantum, because items are stored in the quantum starting from the end and working back toward the beginning of the quantum.
  43. We can't delete unused item index entries that aren't at the end of the index because that would change the item numbers of the following items, rendering them inaccessible.
  44. Of course, in the real program, we will find the IRA by looking it up in the main object index, but that detail is irrelevant to the current discussion of deleting an element.
  45. The statement that calculates the value we should return for an empty quantum may be a bit puzzling. The reason it works as it does is that the routine that looks for an empty block compares the available free space code to a specific value, namely AvailableQuantum. If we did not return a value that corresponded to that free space code, a quantum that was ever used for anything would never be considered empty again.
  46. It is important to note that the number of elements that I'm referring to here is not the number of elements in the big pointer array (i.e., the number of quanta that the small pointer array occupy) but the actual number of elements in the array itself (i.e., the number of data items that the user has stored in the array).
  47. Note that we have to remember to set the modified flag for the big pointer array quantum whenever we change a value in the big pointer array header or the big pointer array itself, so that these changes will be reflected on the disk rather than being lost.
  48. Although we will be unable to go into the details of the implementation of the AccessVector type, it is defined in the header file vector.h; if you are familiar with C++ templates, I recommend that you read that header file to understand how this type actually works.
  49. The "last quantum added to" variable is stored in the big pointer quantum. When first writing the code to update that variable, I had forgotten to update the "modified" flag for the big pointer quantum when the variable actually changed. As a result, every time the buffer used for the big pointer quantum was reused for a new block, that buffer was overwritten without being written out to the disk. When the big pointer quantum was reloaded, the "last quantum added to" variable was reset to an earlier, obsolete value, with the result that a new quantum was started unnecessarily. This error caused the file to grow very rapidly.
  50. For a similar reason, if we were adding an item to a large object with many little pointer arrays, each of which contained only a few distinct quantum number references, we wouldn't be gathering information about very much of the total storage taken up by the object; we might very well start a new quantum when there was plenty of space in another quantum owned by this object.
  51. In order to reduce the size of the free space list entries, I've also reduced the number of objects in the main object list to 256, so that an object number will fit in one byte.
  52. Another possible optimization would be of use when we are loading a large number of items into a quantum file; in that case, we could just start a new quantum whenever we run out of space in the "last added to" quantum. This would eliminate the search of the free space list entirely and might be quite effective in decreasing the time needed to do such a mass load. Of course, this solution would require some way for the class user to inform the quantum file system that a mass load is in progress, as well as when it is over, so that the default behavior could be reestablished.
  53. And they may even work properly after you compile them. Unfortunately, I discovered shortly before sending this book to the publisher that the mailing list program in Chapter
  54. mail.htm doesn't work when compiled under DJGPP 2.8.0, although it runs perfectly when compiled under VC++ 5.0. Apparently the similarity among compilers isn't quite as great as I thought it was.

Comment on this book!


Return to the table of contents

Return to my main page