Optimizing C++ - the WWW version
ISBN: 0-13-977430-0
Copyright 1999 by Prentice-Hall PTR
Copyright 2000 by Chrysalis Software Corporation
Return to the table of contents
Return to my main page
Mozart, No. Would You Believe Gershwin?
Introduction
In this final chapter we will summarize the characteristics of the
algorithms we have encountered in previous chapters (Figures ioopt-processoropt),
discuss the future of the art of optimization, and examine approaches
to the problems posed in previous chapters.
Summary of Characteristics
Characteristics of file access time reduction techniques (Figure ioopt)
- Standard disk-based hashing1
(Chapter superm.htm)
- Excellent time efficiency
- Extra storage needed to handle collisions (usually 25% extra
is enough)
- Appropriate for tables of a few hundred or more entries
- Data can be accessed by a unique key which can be assigned
arbitrarily
- Dynamic hashing (Chapter dynhash.htm)
- Excellent time efficiency
- Table expands as needed
- Appropriate for tables of almost unlimited size
- Data can be accessed by a unique key which can be assigned
arbitrarily
- Caching2 (Chapter superm.htm)
- Excellent time efficiency
- Memory utilization can be adjusted according to availability
- Useful when the same data are read repeatedly
Characteristics of quantum file access method
(Figure quantumfile)
Characteristics of data compression techniques
(Figure datacomp)
- Radix40 (Chapters prologue.htm
and superm.htm)
- Predictable output size
- Good time efficiency
- Moderate compression ratio: output size = .67 * input size
- Character set limited to 40 distinct characters
- BCD (Chapter superm.htm)
- Predictable output size
- Good time efficiency
- Good compression ratio: output size = .5 * input size
- Character set limited to 16 distinct characters
- Bitmaps (Chapter mail.htm)
- Predictable output size
- Good time efficiency
- Excellent compression ratio: output size = .125 * input size
- Character set limited to two distinct characters
- Arithmetic coding (Chapter compress.htm)
- Unpredictable output size
- Fair time efficiency
- Very good compression ratio: output size typically ranges
from .3 to .5 * input size
- Character set not artificially restricted
Characteristics of processor time reduction techniques (Figure processoropt)
- Hash coding (Chapter superm.htm)
- See entry in Figure ioopt
- Lookup tables (Chapters prologue.htm
and compress.htm)
- Excellent time efficiency
- May use somewhat more memory than searching a list
- Appropriate for tables of a few hundred or thousand items
- Data can be accessed only by a unique index number, not an
arbitrary key
- The distribution counting sort (Chapter mail.htm
and zensort.htm)
- Excellent time efficiency, proportional to number of keys
- Predictable timing
- Extra memory needed for copy of pointer or data, depending on
implementation
- Appropriate for sorting a few hundred or more keys
- Each type of key to be sorted must be handled separately
- Caching (Chapter superm.htm)
- See entry in Figure ioopt
Some Thoughts on the Future
Although many managers might wish it otherwise, the writing of
efficient computer programs will always be an art, not a science. This
may appear to be a rash statement, but I think time will prove me
correct. Of course, if I am wrong, I am following a rich tradition:
according to (Arthur C.) Clarke's Law, if a distinguished elderly
scientist [defined as over 30 years old] says that something is
possible, he is almost always right; if he says that something is
impossible, he is almost always wrong.3
However, I feel I am on reasonably firm ground. The main reason is
that, as demonstrated by Goedel's Proof, no automatic procedure can
discover all of the true statements in a consistent axiomatic system
such as the behavior of a programmable computer.4
This implies that the development of new algorithms will always be an
adventure. A fine example of this is the optimization in Chapter compress.htm, in which I discovered that I
could replace 16-bit occurrence counts with 4-bit indexes and actually
speed up the operation of the program at the same time; I'd like to see
the optimizing compiler that could figure that out!
Obviously, such unpredictability can be quite expensive. While
other engineering professionals, such as structural engineers,
routinely estimate the time and money required for their projects to
within a few percent, software engineers are not nearly so accurate. In
fact, overruns of hundreds of percent are not uncommon. Why is this so?
Why Johnny Can't Estimate
The answer is quite simple: once a structural engineer has built a
dozen bridges (for example), the thirteenth is mostly an application of
the knowledge he has already acquired. Of course, this is not to say
that the expertise of the engineer is unimportant; far from it.
However, experience is a great teacher, and experience gained in such a
specialty is extremely valuable.
We software engineers, however, face a fundamentally different
situation: once we have solved a given problem, we should never need to
solve it again, as we can reuse the solution already developed.
Therefore, in our development projects, we are always venturing into
unexplored territory, and we all know how to identify pioneers.5 Actually, I find it enjoyable to have a new
set of challenges with each new project. We should take pride in our
work; it is fairly clear to me, at least, that large software systems
are the most complex of human creations. They combine the
characteristics of art and engineering: first, we conceive of and build
something out of sheer thought, the most evanescent material any artist
has ever used, and then it actually performs a useful function!6
Goodbye for Now
We have reached the end of this book. I hope you have learned as
much from reading this volume as I have while writing it.
Suggested Approaches to Problems
Chapter superm.htm
- Modifications to add capabilities to the program:
- To delete records, we need to invent another flag value (for
example, 14 or 0xe for the first digit) indicating that the record has
been used in the past, but currently is unused. This allows us to
continue looking for a record that might be after the current position
in the file; however, we should remember the position of this
"previously owned" record, so that if the record we are seeking turns
out not to be in the file, we can reuse this record rather than taking
up an unused one. Of course, this solution will not prevent the file
from gradually being filled up with previously used records, which will
result in slower and slower access times, particularly when looking for
records which are not in the file; clearly, if every record has been
used at least once, we would have to search the entire file to
determine that a particular record is not in the file. This can be
taken care of by periodic maintenance such as that described in the
answer to the next question.
- In order to handle the case of a full file, or a file which
has become nearly filled with deleted records, we can copy the records
in the file to a larger one. This requires opening both the old new
file and a new one, with the initialization of the new file set to
create a larger file. Then we read every valid record in the old file
and write it to the new one; since we don't copy any deleted records,
the new file starts out with a clean slate. This situation illustrates
one reason for using a
PriceFile
structure to keep track
of the data for each file; this is what allows us to have both the new
and old file open at once.
- Keeping track of the inventory of each item requires an
inventory field to the
ItemRecord
structure and updating
this information whenever items are sold or added to inventory.
- Hash coding could be applied to tables in memory in much the same
way as we have applied it to disk files: the main difference is that
since memory is almost a true random-access device, we could use a
different method of selecting which slot to use for overflow records,
rather than having to use the next sequential slot.7
- If we wish to reduce the time needed to look up an entry in a
table in memory by a key, we can maintain a cache of the most recently
seen entries, employing a hash code based on the key to be looked up.
Each time we find an entry, we store its key and record number in the
cache entry corresponding to its hash code. Then, when we want to find
a record, the first place we look is in the cache entry corresponding
to its key's hash code. If the key in that cache entry matches the key
of the record we are looking for, we use the record number in the cache
entry to access the record; if the key doesn't match, we have to search
the table as we otherwise would have to do anyway. Since the amount of
time needed to check the cache entry is small compared to the time
needed to search the table, we should come out ahead in most cases.
Chapter mail.htm
- One way to produce output in descending rather than ascending
order is to subtract each character from 255 before using it as an
index.
- In order to make only one pass through the data for each
character position, rather than two as at present, we have to add the
pointer for each key to the end of a linked list for characters having
the ASCII code of the character at the current position in the key. For
example, if the current character in the key we are processing is 'A',
then the pointer to that key would be added to the 'A' list; similarly
for 'B', 'C', and any other character that might occur in the key. When
the pointers for all the keys have been added to the appropriate lists,
the lists are concatenated into one big list, with the 'B' list
attached to the end of the 'A' list, the 'C' list attached to the end
of the 'B' list, and so on. Then the process starts again for the next
character position.
- In order to sort integer or floating-point data rather than
character data, we have to know the internal structure of the data. For
example, on the Intel 80x86 machines, the low byte of a two-byte
unsigned value has a lower memory address than the high byte. We can
view this as a two-byte key, with the less significant part lower in
memory. Since we have to sort by the less significant portion first,
our outer loop index must count up from 0 to 1, rather than the other
way around as it would with a two-byte string. Of course, on a Motorola
or other "high end first" machine, such unsigned values would be sorted
in the same order as strings. Signed values pose additional difficulty,
as the high-order byte of a negative value has a greater character
value than the high-order byte of a positive value: one way around this
is to negate all the keys both before and after the sort. Applying the
distribution sort or any other noncomparison sort to floating-point
values is even more complex, as the internal representation of a
floating-point value is composed of several parts with differing
functions; for this reason, the algorithm must be tailored to the
individual circumstances. However, the performance of the resulting
algorithm makes this effort worthwhile if the number of values to be
sorted is reasonably large.
- Probably the easiest way to handle variable-length strings as
keys is to make one pass through the key array, calling
strlen
to identify the length of each key, and then saving the result in a
temporary array. Then, whenever we are ready to extract a character
from the key, we first determine whether we are past the end of the
key, in which case we substitute a null byte.
- Another possible way to sort variable-length strings is to
ignore the fact that they differ in length. Since a C string is
normally terminated by a null byte, two strings that are identical up
to the end of the shorter one will sort correctly if both are treated
as having the length of the longer one; the null byte will make the
shorter one sort before the longer one. However, this approach will not
work if you are using a protected-mode operating system and you try to
access bytes following the end of a short string when these bytes have
not been allocated to your program; you will get a protection
violation. Another aspect of this approach is that the garbage after
the end of a string that is shorter than the key being sorted will be
used to rearrange identical keys that should remain in the same
relative position in the file. This may cause problems in applications
that rely on identical keys keeping their relative orders.
Chapter compress.htm
- In order to use arithmetic coding where each of a number of
blocks of text must be decompressed without reference to the other
blocks, we must generate a table of probabilities by analyzing a file
or files which have characteristics similar to those to be compressed.
Then, rather than compute the probabilities on the fly when compressing
or decompressing one of our text blocks, we use the precomputed
probability table. This also increases the speed of compressing and
decompressing, as the table does not need to be updated.
- If the data to be compressed consists entirely of characters with
ASCII codes 0 through 127 (or any other subset of the ASCII range), we
can eliminate a significant amount of the storage for the probability
tables by allocating only those tables needed to encode or decode the
characters that can occur in the data. In the example given, we need
only 128 tables of 128 elements, rather than 256 tables of 256
elements; the resulting memory requirement is approximately 8 Kbytes
rather than approximately 32 Kbytes, saving about 24 Kbytes.
Chapter quantum.htm
- To provide a "mass load mode", you'll have to add a function to
change the behavior of
PutElement
, so that it will start
a new block when the current "last added to" block doesn't have enough
space. Of course, you'll also have to add a complementary function to
reset its behavior when you're done with the mass load.
- The directory utility will have to be able to add and delete
array names randomly. A hash-coded lookup would probably be the best
solution for large numbers of arrays, although this is not critical in
the current implementation, which is limited to 256 arrays per file.
- The QFIX program has the following general description. The first
phase starts from the main object index, checking the validity of each
main object. The big pointer array quantum is checked to see that it
belongs to the object in question. Then each of the small pointer
arrays is similarly checked, and then each small pointer array entry is
checked as well, along with the element to which it points. If all is
well so far, the next main object is checked in the same manner. Any
object that checks out completely is added to a list of valid objects;
all other objects are noted as being incomplete or erroneous.
- When all entries in the main object table have been checked, we
report the results to the user. Assuming that the user wants us to
rebuild the file, we start phase two. After opening a new quantum file
where we will store the rebuilt file, we examine each quantum
containing actual user data, skipping any quanta containing the main
object table, big pointer arrays, and little pointer arrays. For each
such user-data quantum, we extract the object number to which that
quantum belongs, and add the elements in that quantum to the
appropriate element of the corresponding object in the new quantum
file, using the index number of each element as stored with the
element.
Chapter dynhash.htm
- Modifications to add capabilities to the dynamic hashing
implementation:
- Deleting records is almost exactly the inverse of adding
records. Every time the number of records decreases by the average
number of records per slot, we take the storage element from the
highest-numbered slot and combine it with the one from its buddy slot,
placing the result back in the buddy slot. Then we decrease the active
slot count by one. If the active slot count becomes equal to one-half
of the allocated slot count, we halve the allocated count.
- The main problem with duplicate keys is that since records
with the same key will always wind up in the same storage element, they
make it imperative to handle the overflow problem. Once that is solved,
handling duplicate keys is easy: once you have located all storage
elements that might contain records with that key, search each of them
to the end, rather than stopping when you get a match. Of course, you
then have to decide which record you want to use, but that's an
application-specific problem.
- This is a wonderful application for templates. Even the name of
the class is intended to suggest that it could very easily be changed
to
PersistentArray<type>
.
Chapter zensort.htm
- One fairly obvious way to handle the problem of improper key
distributions for the "divide-and-conquer" versions of the program is
to use the information gained in the first pass to determine which
algorithm to use. If the keys are reasonably uniformly distributed,
then the "divide-and-conquer" version will have the best performance
and should be used; however, if the keys are not uniformly distributed,
the program should fall back to an earlier version of the algorithm
that will work properly and efficiently with a skewed distribution of
keys.
Footnotes
- Hash coding can also be used to eliminate binary or
linear searches through tables in memory.
- Caching can also be used to reduce processor time
during searches of tables in memory, as the most recently or most
frequently referenced items from the table can be kept in a cache.
- A remarkable example of such hyperconservatism is
the following statement, from an otherwise fine book by Edward Yourdon:
"As a programmer, you will always be working for an employer.
Unless you are very rich and very eccentric, you will not enjoy the
luxury of having a computer in your own home...". (emphasis in the
original) Yourdon, E. Techniques of Program Structure and Design.
Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1975, pp. 2-3. This
clearly ranks with the worst predictions of all time.
- An excellent explanation of this important
mathematical discovery can be found in Douglas Hofstadter's book Goedel,
Escher, Bach. New York: Vintage Books, 1979.
- They're the ones with the arrows in their backs.
- Interestingly enough, the previously cited
passage in Yourdon's book goes on to say: "In short, you cannot be an
artist, separate and aloof; it is highly unlikely that the programming
community will ever generate a Michaelangelo or a Rembrandt". This is
quite a difficult standard to meet: how many artists of this stature
have come from any field, including sculpture and painting? While the
jury is still out on this prediction, I think we will see a great
artist of programming one of these days, although his or her work might
be inaccessible to the lay public.
- Why is memory only "almost" a true random access
device? Because the processor is much faster than the main memory of
the computer; therefore, frequently used memory locations are copied
into much faster memory either on or very close to the processor
itself. This means that accessing locations in widely scattered parts
of memory is likely to be noticeably slower than sequential access to
memory. However, the difference in speed may be a factor of three or
four rather than the relative speed difference of hundreds or thousands
of times that we see between sequential and random disk accesses, so
optimizing memory accesses is not as great a concern to us as the
latter problem.
Comment
on this book!
Return to the table of contents
Return to my main page