Optimizing C++ - the WWW version

ISBN: 0-13-977430-0

Mozart, No. Would You Believe Gershwin?

Introduction

In this final chapter we will summarize the characteristics of the algorithms we have encountered in previous chapters (Figures ioopt-processoropt), discuss the future of the art of optimization, and examine approaches to the problems posed in previous chapters.

Summary of Characteristics

Characteristics of file access time reduction techniques (Figure ioopt)

Standard disk-based hashing¹ (Chapter superm.htm)
- Excellent time efficiency
- Extra storage needed to handle collisions (usually 25% extra is enough)
- Appropriate for tables of a few hundred or more entries
- Data can be accessed by a unique key which can be assigned arbitrarily
Dynamic hashing (Chapter dynhash.htm)
- Excellent time efficiency
- Table expands as needed
- Appropriate for tables of almost unlimited size
- Data can be accessed by a unique key which can be assigned arbitrarily
Caching² (Chapter superm.htm)
- Excellent time efficiency
- Memory utilization can be adjusted according to availability
- Useful when the same data are read repeatedly

Characteristics of quantum file access method (Figure quantumfile)

Quantum file access method (Chapters quantum.htm and dynhash.htm)
- Excellent time efficiency
- Memory utilization can be adjusted according to availability
- Allows random access to records whose length can vary dynamically
- Provides array notation for ease of integration

Characteristics of data compression techniques (Figure datacomp)

Radix40 (Chapters prologue.htm and superm.htm)
- Predictable output size
- Good time efficiency
- Moderate compression ratio: output size = .67 * input size
- Character set limited to 40 distinct characters
BCD (Chapter superm.htm)
- Predictable output size
- Good time efficiency
- Good compression ratio: output size = .5 * input size
- Character set limited to 16 distinct characters
Bitmaps (Chapter mail.htm)
- Predictable output size
- Good time efficiency
- Excellent compression ratio: output size = .125 * input size
- Character set limited to two distinct characters
Arithmetic coding (Chapter compress.htm)
- Unpredictable output size
- Fair time efficiency
- Very good compression ratio: output size typically ranges from .3 to .5 * input size
- Character set not artificially restricted

Characteristics of processor time reduction techniques (Figure processoropt)

Hash coding (Chapter superm.htm)
- See entry in Figure ioopt
Lookup tables (Chapters prologue.htm and compress.htm)
- Excellent time efficiency
- May use somewhat more memory than searching a list
- Appropriate for tables of a few hundred or thousand items
- Data can be accessed only by a unique index number, not an arbitrary key
The distribution counting sort (Chapter mail.htm and zensort.htm)
- Excellent time efficiency, proportional to number of keys
- Predictable timing
- Extra memory needed for copy of pointer or data, depending on implementation
- Appropriate for sorting a few hundred or more keys
- Each type of key to be sorted must be handled separately
Caching (Chapter superm.htm)
- See entry in Figure ioopt

Some Thoughts on the Future

Although many managers might wish it otherwise, the writing of efficient computer programs will always be an art, not a science. This may appear to be a rash statement, but I think time will prove me correct. Of course, if I am wrong, I am following a rich tradition: according to (Arthur C.) Clarke's Law, if a distinguished elderly scientist [defined as over 30 years old] says that something is possible, he is almost always right; if he says that something is impossible, he is almost always wrong.³ However, I feel I am on reasonably firm ground. The main reason is that, as demonstrated by Goedel's Proof, no automatic procedure can discover all of the true statements in a consistent axiomatic system such as the behavior of a programmable computer.⁴ This implies that the development of new algorithms will always be an adventure. A fine example of this is the optimization in Chapter compress.htm, in which I discovered that I could replace 16-bit occurrence counts with 4-bit indexes and actually speed up the operation of the program at the same time; I'd like to see the optimizing compiler that could figure that out!

Obviously, such unpredictability can be quite expensive. While other engineering professionals, such as structural engineers, routinely estimate the time and money required for their projects to within a few percent, software engineers are not nearly so accurate. In fact, overruns of hundreds of percent are not uncommon. Why is this so?

Why Johnny Can't Estimate

The answer is quite simple: once a structural engineer has built a dozen bridges (for example), the thirteenth is mostly an application of the knowledge he has already acquired. Of course, this is not to say that the expertise of the engineer is unimportant; far from it. However, experience is a great teacher, and experience gained in such a specialty is extremely valuable.

We software engineers, however, face a fundamentally different situation: once we have solved a given problem, we should never need to solve it again, as we can reuse the solution already developed. Therefore, in our development projects, we are always venturing into unexplored territory, and we all know how to identify pioneers.⁵ Actually, I find it enjoyable to have a new set of challenges with each new project. We should take pride in our work; it is fairly clear to me, at least, that large software systems are the most complex of human creations. They combine the characteristics of art and engineering: first, we conceive of and build something out of sheer thought, the most evanescent material any artist has ever used, and then it actually performs a useful function!⁶

Goodbye for Now

We have reached the end of this book. I hope you have learned as much from reading this volume as I have while writing it.

Suggested Approaches to Problems

Chapter superm.htm

Modifications to add capabilities to the program:
1. To delete records, we need to invent another flag value (for example, 14 or 0xe for the first digit) indicating that the record has been used in the past, but currently is unused. This allows us to continue looking for a record that might be after the current position in the file; however, we should remember the position of this "previously owned" record, so that if the record we are seeking turns out not to be in the file, we can reuse this record rather than taking up an unused one. Of course, this solution will not prevent the file from gradually being filled up with previously used records, which will result in slower and slower access times, particularly when looking for records which are not in the file; clearly, if every record has been used at least once, we would have to search the entire file to determine that a particular record is not in the file. This can be taken care of by periodic maintenance such as that described in the answer to the next question.
2. In order to handle the case of a full file, or a file which has become nearly filled with deleted records, we can copy the records in the file to a larger one. This requires opening both the old new file and a new one, with the initialization of the new file set to create a larger file. Then we read every valid record in the old file and write it to the new one; since we don't copy any deleted records, the new file starts out with a clean slate. This situation illustrates one reason for using a PriceFile structure to keep track of the data for each file; this is what allows us to have both the new and old file open at once.
3. Keeping track of the inventory of each item requires an inventory field to the ItemRecord structure and updating this information whenever items are sold or added to inventory.
Hash coding could be applied to tables in memory in much the same way as we have applied it to disk files: the main difference is that since memory is almost a true random-access device, we could use a different method of selecting which slot to use for overflow records, rather than having to use the next sequential slot.⁷
If we wish to reduce the time needed to look up an entry in a table in memory by a key, we can maintain a cache of the most recently seen entries, employing a hash code based on the key to be looked up. Each time we find an entry, we store its key and record number in the cache entry corresponding to its hash code. Then, when we want to find a record, the first place we look is in the cache entry corresponding to its key's hash code. If the key in that cache entry matches the key of the record we are looking for, we use the record number in the cache entry to access the record; if the key doesn't match, we have to search the table as we otherwise would have to do anyway. Since the amount of time needed to check the cache entry is small compared to the time needed to search the table, we should come out ahead in most cases.

Chapter mail.htm

One way to produce output in descending rather than ascending order is to subtract each character from 255 before using it as an index.
In order to make only one pass through the data for each character position, rather than two as at present, we have to add the pointer for each key to the end of a linked list for characters having the ASCII code of the character at the current position in the key. For example, if the current character in the key we are processing is 'A', then the pointer to that key would be added to the 'A' list; similarly for 'B', 'C', and any other character that might occur in the key. When the pointers for all the keys have been added to the appropriate lists, the lists are concatenated into one big list, with the 'B' list attached to the end of the 'A' list, the 'C' list attached to the end of the 'B' list, and so on. Then the process starts again for the next character position.
In order to sort integer or floating-point data rather than character data, we have to know the internal structure of the data. For example, on the Intel 80x86 machines, the low byte of a two-byte unsigned value has a lower memory address than the high byte. We can view this as a two-byte key, with the less significant part lower in memory. Since we have to sort by the less significant portion first, our outer loop index must count up from 0 to 1, rather than the other way around as it would with a two-byte string. Of course, on a Motorola or other "high end first" machine, such unsigned values would be sorted in the same order as strings. Signed values pose additional difficulty, as the high-order byte of a negative value has a greater character value than the high-order byte of a positive value: one way around this is to negate all the keys both before and after the sort. Applying the distribution sort or any other noncomparison sort to floating-point values is even more complex, as the internal representation of a floating-point value is composed of several parts with differing functions; for this reason, the algorithm must be tailored to the individual circumstances. However, the performance of the resulting algorithm makes this effort worthwhile if the number of values to be sorted is reasonably large.
Probably the easiest way to handle variable-length strings as keys is to make one pass through the key array, calling strlen to identify the length of each key, and then saving the result in a temporary array. Then, whenever we are ready to extract a character from the key, we first determine whether we are past the end of the key, in which case we substitute a null byte.
Another possible way to sort variable-length strings is to ignore the fact that they differ in length. Since a C string is normally terminated by a null byte, two strings that are identical up to the end of the shorter one will sort correctly if both are treated as having the length of the longer one; the null byte will make the shorter one sort before the longer one. However, this approach will not work if you are using a protected-mode operating system and you try to access bytes following the end of a short string when these bytes have not been allocated to your program; you will get a protection violation. Another aspect of this approach is that the garbage after the end of a string that is shorter than the key being sorted will be used to rearrange identical keys that should remain in the same relative position in the file. This may cause problems in applications that rely on identical keys keeping their relative orders.

Chapter compress.htm

In order to use arithmetic coding where each of a number of blocks of text must be decompressed without reference to the other blocks, we must generate a table of probabilities by analyzing a file or files which have characteristics similar to those to be compressed. Then, rather than compute the probabilities on the fly when compressing or decompressing one of our text blocks, we use the precomputed probability table. This also increases the speed of compressing and decompressing, as the table does not need to be updated.
If the data to be compressed consists entirely of characters with ASCII codes 0 through 127 (or any other subset of the ASCII range), we can eliminate a significant amount of the storage for the probability tables by allocating only those tables needed to encode or decode the characters that can occur in the data. In the example given, we need only 128 tables of 128 elements, rather than 256 tables of 256 elements; the resulting memory requirement is approximately 8 Kbytes rather than approximately 32 Kbytes, saving about 24 Kbytes.

Chapter quantum.htm

To provide a "mass load mode", you'll have to add a function to change the behavior of PutElement, so that it will start a new block when the current "last added to" block doesn't have enough space. Of course, you'll also have to add a complementary function to reset its behavior when you're done with the mass load.
The directory utility will have to be able to add and delete array names randomly. A hash-coded lookup would probably be the best solution for large numbers of arrays, although this is not critical in the current implementation, which is limited to 256 arrays per file.
The QFIX program has the following general description. The first phase starts from the main object index, checking the validity of each main object. The big pointer array quantum is checked to see that it belongs to the object in question. Then each of the small pointer arrays is similarly checked, and then each small pointer array entry is checked as well, along with the element to which it points. If all is well so far, the next main object is checked in the same manner. Any object that checks out completely is added to a list of valid objects; all other objects are noted as being incomplete or erroneous.
When all entries in the main object table have been checked, we report the results to the user. Assuming that the user wants us to rebuild the file, we start phase two. After opening a new quantum file where we will store the rebuilt file, we examine each quantum containing actual user data, skipping any quanta containing the main object table, big pointer arrays, and little pointer arrays. For each such user-data quantum, we extract the object number to which that quantum belongs, and add the elements in that quantum to the appropriate element of the corresponding object in the new quantum file, using the index number of each element as stored with the element.

Chapter dynhash.htm

Modifications to add capabilities to the dynamic hashing implementation:
1. Deleting records is almost exactly the inverse of adding records. Every time the number of records decreases by the average number of records per slot, we take the storage element from the highest-numbered slot and combine it with the one from its buddy slot, placing the result back in the buddy slot. Then we decrease the active slot count by one. If the active slot count becomes equal to one-half of the allocated slot count, we halve the allocated count.
2. The main problem with duplicate keys is that since records with the same key will always wind up in the same storage element, they make it imperative to handle the overflow problem. Once that is solved, handling duplicate keys is easy: once you have located all storage elements that might contain records with that key, search each of them to the end, rather than stopping when you get a match. Of course, you then have to decide which record you want to use, but that's an application-specific problem.
This is a wonderful application for templates. Even the name of the class is intended to suggest that it could very easily be changed to PersistentArray<type>.

Chapter zensort.htm

One fairly obvious way to handle the problem of improper key distributions for the "divide-and-conquer" versions of the program is to use the information gained in the first pass to determine which algorithm to use. If the keys are reasonably uniformly distributed, then the "divide-and-conquer" version will have the best performance and should be used; however, if the keys are not uniformly distributed, the program should fall back to an earlier version of the algorithm that will work properly and efficiently with a skewed distribution of keys.

Footnotes

Hash coding can also be used to eliminate binary or linear searches through tables in memory.

Caching can also be used to reduce processor time during searches of tables in memory, as the most recently or most frequently referenced items from the table can be kept in a cache.

A remarkable example of such hyperconservatism is the following statement, from an otherwise fine book by Edward Yourdon: "As a programmer, you will always be working for an employer. Unless you are very rich and very eccentric, you will not enjoy the luxury of having a computer in your own home...". (emphasis in the original) Yourdon, E. Techniques of Program Structure and Design. Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1975, pp. 2-3. This clearly ranks with the worst predictions of all time.

An excellent explanation of this important mathematical discovery can be found in Douglas Hofstadter's book Goedel, Escher, Bach. New York: Vintage Books, 1979.

They're the ones with the arrows in their backs.

Interestingly enough, the previously cited passage in Yourdon's book goes on to say: "In short, you cannot be an artist, separate and aloof; it is highly unlikely that the programming community will ever generate a Michaelangelo or a Rembrandt". This is quite a difficult standard to meet: how many artists of this stature have come from any field, including sculpture and painting? While the jury is still out on this prediction, I think we will see a great artist of programming one of these days, although his or her work might be inaccessible to the lay public.

Why is memory only "almost" a true random access device? Because the processor is much faster than the main memory of the computer; therefore, frequently used memory locations are copied into much faster memory either on or very close to the processor itself. This means that accessing locations in widely scattered parts of memory is likely to be noticeably slower than sequential access to memory. However, the difference in speed may be a factor of three or four rather than the relative speed difference of hundreds or thousands of times that we see between sequential and random disk accesses, so optimizing memory accesses is not as great a concern to us as the latter problem.

Comment on this book!

Return to the table of contents

Return to my main page