Return to the table of contents
Return to my main page
In this chapter we will use a selective mailing list system to illustrate rapid access to and rearrangement of information selected by criteria specified at run time. Our example will allow us to select certain customers of a mail-order business whose total purchases this year have been within a particular range and whose last order was within a certain time period. This would be very useful for a retailer who wants to send coupons to lure back the (possibly lost) customers who have spent more than $100 this year but who haven't been in for 30 days. The labels for the letters should be produced in ZIP code order, to take advantage of the discount for presorted mail.
The Distribution Counting Sort, Bitmaps
To begin, let us assume that the information we have about each customer can be described by the structure definition in Figure custinfo.
A straightforward approach would be to store all
of this information in a disk file, with one
record for each customer. In order to construct a selective mailing
list, we read through the file, testing each record to see whether it
meets our criteria. We extract the sort key (ZIP code) for each record
that passes the tests of amount spent and last transaction date,
keeping these keys and their associated record numbers in memory. After
reaching the end of the file, we sort the keys (possibly with a
heapsort or Quicksort) and rearrange the record numbers according to
the sort order, then reread the selected records in the order of their
ZIP codes and print the address labels.
It may seem simpler just to collect records in memory as they meet our criteria. However, the memory requirements of this approach might be excessive if a large percentage of the records are selected. The customer file of a mail-order business is often fairly large, with 250000 or more 100-byte records not unheard of; in this situation a one-pass approach might require as much as 25 megabytes of memory for storing the selected records.
However, if we keep only the keys of the selected records in memory and print the label for each customer as his record is reread on the second pass, we never have to have more than one record in memory at a time. In our example, the length of the key (ZIP code) is nine bytes and the record number is two bytes long, so that 250000 selected records would require only 2.75 megabytes. This is a much more reasonable memory budget, considering that our program might not be the only one running, especially under an operating system like WindowsTM.
Even so, there is no reason to allocate all the storage we might ever need in advance, and every reason not to. We'd like the program to run in the minimum memory possible, so that it would be useful even on a machine with limited memory or one that is loaded up with network drivers and memory-resident utilities.
A linked list is a fairly simple way to allocate memory as needed, but we must be careful to use this method efficiently. Allocating a new block of storage for each record that matches our criteria would be very wasteful, as extra control storage is needed to keep track of each allocation. In MS-DOS, each allocation requires at least 16 bytes of control storage, so 250000 allocations would use about 4 megabytes just for the control storage! This problem is easy to solve; we will allocate storage for a number of records at once, 1000 in our example, which reduces that 4 megabyte overhead to 4 kilobytes.1
Let's start our optimization by trying to reduce the memory we need
for each selected record during this first pass through the file. One
possibility is to convert the ZIP codes to
representation to save memory. Unfortunately, such data is not suitable
However, there is another way for us to save memory during the first pass: using a bitmap to keep track of the records that match our criteria. A bitmap is an array of bits that can be used to record one characteristic of a number of items, as long as that characteristic can be expressed as yes/no, true/false, present/absent or some similar pair of opposites. In this case, all we want to remember is whether each record was selected. So if we allocate a bitmap with one bit for each possible record number, we can clear the whole bitmap at the start (indicating we haven't selected any records yet) and then set the bit for each record number we select. When we get done with the selection, we will know how many records were selected, so we can allocate storage for the record numbers we need to sort. Then we can go through the bitmap, and every time we find a bit that is set, we will add the corresponding record number to the end of the list to be passed to the sort.
Of course, we could use an array of bytes, one per record, instead of a bitmap, which would not require us to write bit access functions. However, the bitmap requires only one-eighth as much as storage as an array of bytes, which can result in considerable savings with a big array. In our example, with a 250000-record file, a bitmap with one bit per record would occupy only about 31 KB, whereas an equivalent byte array would occupy 250 KB. While this difference may not be significant in the present case, larger files produce proportionally larger savings.
As with almost everything, there is a drawback to the use of bitmaps; they usually require more time to access than byte arrays. This is especially true for those machines (such as the 80x86) that do not have an instruction set designed for easy or efficient access to individual bits. Even on machines like the 68020 and its successors, which do have good bit manipulation instructions, compiler support for these functions may be poor. However, since CPU time is unlikely to be the limiting factor while we are reading records from our input file, this extra processing time is probably unimportant in our case.
Now that we have reduced the memory requirements for the first pass, is there any way to speed up the program? In my tests with large numbers of selected records, it turns out that the disk reading and writing time are a small part of the entire elapsed time required to run the program, at least when we are using a standard sorting function. In such a case, by far the greatest proportion of the time is spent in sorting the keys, so that's where we'll focus our attention next.
Which sort should we use? Surely we can't improve on the standard sort supplied with the C++ compiler we're using (typically Quicksort) or one of its relatives such as heapsort. After all, the conventional wisdom is that even the fastest sorting algorithm takes time proportional to n*log(n). That is, if sorting 1000 items takes 1 second, sorting 1,000,000 items would take at least 2000 seconds, using an optimal sorting algorithm. Since these commonly used algorithms all have average times proportional to this best case, it seems that all we have to do is select the one best suited for our particular problem.
This is a common misconception: in fact, only sorts that require comparisons among keys have a lower bound proportional to n*log(n). The distribution counting sort we are going to use takes time proportional to n, not n*log(n), so that if it takes 1 second to sort 1000 items, it would take 1000 seconds to sort 1,000,000 items, not 2000 seconds.2 Moreover, this sort is quite competitive with the commonly used sorts even for a few hundred items, and it is easy to code as well. For some applications, however, its most valuable attribute is that its timing is data independent. That is, the sort takes the same amount of time to execute whether the data items are all the same, all different, already sorted, in reverse order, or randomly shuffled. This is particularly valuable in real-time applications, where knowing the average time is not sufficient.3 Actually, this sort takes time proportional to the number of keys multiplied by the length of each key. The reason that the length of the keys is important is that a distribution counting sort actually treats each character position of the sort keys as a separate "key"; these keys are used in order from the least to the most significant. Therefore, this method actually takes time proportional to n*m, where n is the number of keys and m is the length of each key. However, in most real-world applications of sorting, the number of items to be sorted far exceeds the length of each item, and additions to the list take the form of more items, not lengthening of each one. If we had a few very long items to sort, this sort would not be as appropriate.
You're probably wondering how fast this distribution sort really
is, compared to Quicksort. According to the results of my tests, which
sorted several different numbers of records between 23480 and 234801 on
5-digit ZIP codes using both Microsoft's implementation of the
Quicksort algorithm (
qsort) in version 5.0 of their Visual
C++ compiler and my distribution counting sort, which I call
"Megasort", there's no contest.4 The
difference in performance ranged from approximately 43 to 1 at the low
end up to an astonishing 316 to 1 at the high end! Figures promailx.00 and promail.00, near the end of this
chapter, show the times in seconds when processing these varying sets
Now let's see how such increases in performance can be achieved with a simple algorithm.
The basic method used is to make one pass through the keys for each character position in the key, in order to discover how many keys have each possible ASCII character in the character position that we are currently considering, and another pass to actually rearrange the keys. As a simplified example, suppose that we have ten keys to be sorted and we want to sort only on the first letter of each key.
The first pass consists of counting the number of keys that begin with each letter. In the example in Figure unsorted, we have three keys that begin with the letter 'A', five that begin with the letter 'B', and two that begin with the letter 'C'. Since 'A' is the lowest character we have seen, the first key we encounter that starts with an 'A' should be the first key in the result array, the second key that begins with an 'A' should be the second key in the result array, and the third 'A' key should be the third key in the result array, since all of the 'A' keys should precede all of the 'B' keys.
The next keys in sorted order will be the ones that start with 'B'; therefore, the first key that we encounter that starts with a 'B' should be the fourth key in the result array, the second through fifth 'B' keys should be the fifth through eighth keys in the result array, and the 'C' keys should be numbers nine and ten in the result array, since all of the 'B' keys should precede the 'C' keys. Figure counts.and.pointers illustrates these relationships among the keys.
Counts Starting indexes
A B C A B C
3 5 2 1 4 9
| | |
+-+ | +--+
Key Old | New | | Explanation
Index | Index | |
Bicycle 1 |+--4---+ | The first B goes to position 4.
Airplane 2 ++--1--+ | The first A goes to position 1.
Anonymous 3 ++--2--+ | The second A goes after the first.
Cashier 4 +++--9-----+ The first C goes to position 9.
Bottle 5 ||+--5-+ The second B goes after the first.
Bongos 6 ||+--6-+ The third B goes after the second.
Antacid 7 |++--3 The third A goes after the second.
Competent 8 +-+--10 The second C goes after the first.
Bingo 9 +--7-+ The fourth B goes after the third.
Bombardier 10 8-+ The fifth B goes after the fourth.
If we rearrange the keys in the order indicated by their new indexes, we arrive at the situation shown in Figure afterfirst:
Index Unsorted Keys Sorted Keys
1 Bicycle-------+ +--------Airplane
2 Airplane------+------+ +-----Anonymous
3 Anonymous-----+---------+ +---Antacid
4 Cashier----+ +------+--------Bicycle
7 Antacid----+---------+ +---Bingo
8 Competent--+--+ +------+---Bombardier
9 Bingo---------+----+------+ +-Cashier
10 Bombardier---------+ +----Competent
Of course, we usually want to sort on more than the first character. As we noted earlier, each character position is actually treated as a separate key; that is, the pointers to the keys are rearranged to be in order by whatever character position we are currently sorting on. With this in mind, let's take the case of a two character string; once we can handle that situation, the algorithm doesn't change significantly when we add more character positions.
We know that the final result we want is for all the A's to come before all the B's, which must precede all the C's, etc., but within the A's, the AA's must come before the AB's, which have to precede the AC's, etc. Of course, the same applies to keys starting with B and C: the BA's must come before the BB's, etc. How can we manage this?
We already know that we are going to have to sort the data separately on each character position, so let's work backward from what the second sort needs as input. When we are getting ready to do the second (and final) sort, we need to know that all the AA's precede all the AB's, which precede all the AC's, etc., and that all the BA's precede the BB's, which precede the BC's. The same must be true for the keys starting with C, D, and any other letter. The reason that the second sort will preserve this organization of its input data is that it moves keys with the same character at the current character position from input to output in order of their previous position in the input data. Therefore, any two keys that have the same character in the current position (both A's, B's, C's, etc.) will maintain their relative order in the output. For example, if all of the AA's precede all of the AB's in the input, they will also do so in the output, and similarly for the BA's and BB's. This is exactly what we want. Notice that we don't care at this point whether the BA's are behind or ahead of the AB's, as arranging the data according to the first character is the job of the second sort (which we haven't done yet). But how can we ensure that all the AA's precede the AB's, which precede the AC's, etc. in the input? By sorting on the second character position first!
For example, suppose we are sorting the following keys: AB, CB, BA, BC, CA, BA, BB, CC. We start the sort by counting the number of occurrences of each character in the second position of each key (the less significant position). There are three A's, three B's, and two C's. Since A is the character closest to the beginning of the alphabet, the first key that has an A in the second position goes in the first slot of the output. The second and third keys that have an A in the second position follow the first one. Those that have a B in the second position are next, in output positions 4, 5, and 6. The C's bring up the rear, producing the situation in Figure lesser.char.
After this first sort (on the second character position), all of the keys that have an A in the second position are ahead of all of the keys that have a B in the second position, which precede all those that have a C in the second position. Therefore, all AA keys precede all AB keys, which precede all AC keys, and the same is true for BA, BB, BC and CA, CB, and CC as well. This is the exact arrangement of input needed for the second sort.
1 AB---------++--------BA *
2 CB-------+ ||+-------CA
3 BA-------+-++|+------BA *
| | ||
4 BC-----+ | +-++------AB
6 BA-----+------++-----BB *
7 BB-------------+ +---BC *
Now we are ready for the second, and final, sorting operation. We start by counting the number of occurrences of each character in the first, or more significant, position. There is one A in this position, along with four B's, and three C's. Starting with the lowest character value, the key that has an A in this position ends up in the first slot of the output. The B's follow, starting with the second slot. Remember that the keys are already in order by their second, or less significant, character, because of our previous sort. This is important when we consider the order of keys that have the same first character. For example, in Figure lesser.char, the asterisks mark the keys that start with B.5 They are in order by their second character, since we have just sorted on that character. Therefore, when we rearrange the keys by their first character position, those that have the same first character will be in order by their second character as well, since we always move keys from the input to the output in order of their position in the input. That means that the B records have the same order in the output that they had in the input, as you can see in Figure greater.char.
This may seem like a lot of work to sort a few strings. However, the advantage of this method when we have many keys to be sorted is that the processing for each pass is extremely simple, requiring only a few machine instructions per byte handled (less than 30 on the 80x86 family).
1 BA-------+ +----------AB
2 CA ----+ +-+----------BA
4 AB-----+---+ +--------BB
5 CB----+| | +----BC
Having finished sorting the keys, we have only to retrieve the records we want from the input file in sorted order and write them to the output file. In our example, this requires reading the desired records and writing them into an output file. But how do we know which records we need to read and in what order?
The sort function requires more than just the keys to be sorted. We also have to give it a list of the record numbers to which those keys correspond, so that it can rearrange that list in the same way that it rearranges the keys. Then we can use the rearranged list of record numbers to retrieve the records in the correct order, which completes the task of our program.
Let's start our examination of the mailing list program by looking at the header file that defines its main constants and structures, which is shown in Figure mail.00a.
Now we're ready to look at the implementation of these algorithms,
starting with function
main (Figure mail.00).
mainfunction definition (from mail\mail.cpp) (Figure mail.00)
This function begins by checking the number of arguments with which
it was called, and exits with an informative message if there aren't
enough. Otherwise, it constructs the output file name and opens the
file for binary input. Then it calls the
function (Figure mail.01), which sets
up the selection criteria according to input arguments 3 through 6
(minimum spent, maximum spent, earliest last-here date, and latest
last-here date). Now we are ready to call
mail.02), to select the records that
meet those criteria.
initializefunction definition (from mail\mail.cpp) (Figure mail.01)
processfunction definition (from mail\mail.cpp) (Figure mail.02)
The first order of business in
process is to set up
the buffering for the list (output), and data files. It is important to
note that we are using a large buffer for the list file and for the
first pass through the data file, but are changing the buffer size to
the size of 1 record for the second pass through the data file. What is
the reason for this change?
On the first pass through the data file, we are going to read every record in physical order, so a large buffer is useful in reducing the number of physical disk accesses needed.
This analysis, however, does not apply to the second pass through the data file. In this case, using a bigger buffer for the data file would actually reduce performance, since reading a large amount of data at once is helpful only if you are going to use the data that you are reading.6 On the second pass, we will read the data records in order of their ZIP codes, forcing us to move to a different position in the data file for each record rather than reading them consecutively. Using a big buffer in this situation would mean that most of the data in the buffer would be irrelevant.
process, we calculate the number of
records in the data file, which determines how large our record
selection bitmap should be. Then we call the macro
which is defined in
bitfunc.h (Figure bitfunc.00a), to allocate storage for
Of course, each byte of a bitmap can store eight bits, so the macro divides the number of bits we need by eight and adds one byte to the result. The extra byte is to accommodate any remainder after the division by eight.
Now that we have allocated our bitmap, we can read through the data
file and select the records that meet our criteria. After initializing
our counts of "items read" and "found" to zero, we are ready to start
reading records. Of course, we could calculate the number of times
through the loop rather than continue until we run off the end of the
input file, since we know how many records there are in the file.
However, since we are processing records in batches, the last of which
is likely to be smaller than the rest, we might as well take advantage
of the fact that when we get a short count of
the operating system is telling us that we have reached the end of the
The first thing we do in the "infinite" loop is to read a set of
records (to avoid the overhead of calling the operating system to read
each record). Now we are ready to process one record at a time in the
inner loop. Of course, we want to know whether the record we are
examining meets our selection criteria, which are whether the customer
has spent at least
min_spent, no more than
and has last been here between
(inclusive). If the record fails to meet any of these criteria, we skip
the remainder of the processing for this record via "continue".
However, let's suppose that a record passes these four tests.
In that case, we increment
items_found. Then we want
to set the bit in the
found bitmap that corresponds to
this record. To do this, we need to calculate the current record
number, by adding the number of records read before the current
processing batch (
total_items_read) and the entry number
in the current batch (
i). Now we are ready to call
setbitfunction definition (from mail\bitfunc.cpp) (Figure bitfunc.00)
setbit function is quite simple. Since there are
eight bits in a byte, we have to calculate which byte we need to access
and which bit within that byte. Once we have calculated these two
values, we can retrieve the appropriate byte from the bitmap.
In order to set the bit we are interested in, we need to create a
"mask" to isolate that bit from the others in the same byte. The
statement that does this,
mask = 1 << bitnumber;,
may seem mysterious, but all we are doing is generating a value that
has a 1 in the same position as the bit we are interested in and 0 in
all other positions. Therefore, after we perform a "logical or"
operation of the mask and the byte from the bitmap, the resulting
value, stored back into the bitmap, will have the desired bit set.
setbit function also returns a value indicating
the value of the bit before we set it. Thus, if we want to know whether
we have actually changed the bit from off to on, we don't have to make
a call to
testbit before the call to
we can use the return value from
setbit to determine
whether the bit was set before we called
would be useful, for example, in an application where the bitmap was
being used to allocate some resource, such as a printer, which cannot
be used by more than one process at a time. The function would call
and, if that bit had already been set, would return an error indicating
that the resource was not available.
Now we have a means of keeping track of which records have been selected. However, we also need to save the ZIP code for each selected record for our sort. Unfortunately, we don't know how many records are going to be selected until we select them. This is easily dealt with in the case of the bitmap, which is so economical of storage that we can comfortably allocate a bit for every record in the file; ZIP codes, which take ten bytes apiece, pose a more difficult problem. We need a method of allocation which can provide storage for an unknown number of ZIP codes.
Of course, we could use a simple linked list. In that approach, every time we found a record that matches our criteria, we would allocate storage for a ZIP code and a pointer to the next ZIP code. However, this consumes storage very rapidly, as additional memory is required to keep track of every allocation of storage. When very small blocks of ten bytes or so are involved, the overhead can easily exceed the amount of storage actually used for our purposes, so that allocating 250000 14-byte blocks can easily take 7.5 megabytes or more, rather than the 3.5 megabytes that we might expect.
To avoid this inefficiency, we can allocate larger blocks that can
accommodate a number of ZIP codes each, and keep track of the addresses
of each of these larger blocks so that we can retrieve individual ZIP
codes later. That is the responsibility of the code inside the
statement that compares
ZIP_BLOCK_ENTRIES.7 To understand how this works, let's look
back at the lines in
process that set
to -1 and
current_zip_entry to ZIP_BLOCK_ENTRIES. This
initialization ensures that the code in the "if" will be executed for
the first selected record. We start by incrementing
(to zero, in this case) and setting
zero, to start a new block. Then we allocate storage for a new block of
ZIP codes (
zip_block) and set up pointers to each one (
so that we can copy the ZIP codes from the keys as we select each
Once that bookkeeping is out of the way, we can copy the ZIP code
for the current record into the ZIP block via the
pointer array and increment which pointer in the array will be used
Now we're at the end of the processing for one record in a
processing batch. Once all the records in the batch have been
processed, we fall off the end of the inner loop. At this point, we can
add the number of items we have read in this batch to
If we have read less than a full batch, we must have reached the end of
the file, so we are finished with this phase of processing.
Now that we have all the information needed to sort the keys so that
we will be able to read the records in ZIP code order on the second
pass. However, the sort function,
Megasort (Figure megac.00) requires that same information
in a different form, so we have some preparation to do before we can
Looking at the parameters for this function, we see that the data
to be sorted are described by an array of string pointers (
char **PtrArray), whereas what we have is a number of blocks of
ZIP codes. The record numbers, which the sort function will rearrange
according to the order of the ZIP codes, must be passed as an array of
unsigned *RecNums), rather than as the bitmap in
which they are stored by our search function.
To set up the first argument to
Megasort, we need to
produce an array of pointers to each of the ZIP codes we have stored in
the blocks, which we do in the portion of Figure mail.02 between the function call that
reports the "Pass 1 time" and the call to the Megasort function.
We start by allocating a block of ZIP code pointers (
that can hold all the pointers we will need (
Then, for each block of ZIP codes we have created (
we calculate the address of each ZIP code in the block and store it in
an entry of our
That takes care of the pointers to the ZIP codes. Now we have to
turn the bitmap into an array of record numbers to be rearranged. First
we allocate the array of unsigned values, with
entries. Then we call
testbit (Figure bitfunc.01).
testbitfunction definition (from mail\bitfunc.cpp) (Figure bitfunc.01)
testbit once for every record number in the
file, and every time we find a bit set, we store that record number in
the next available space in the
record_number array. The
function is very similar to
setbit, described above. The
difference is that we don't want to set the bit, but to find out
whether it is already set; therefore, we don't modify the bitmap array.
Now we are finally ready to call
Megasort (Figure megac.00).
The main variables involved are the
which keeps track of the number of strings that have each character in
the current sorting position (the number of A's, B's, etc.), and the
array, which maintains the current position in the output array for
each possible character (where the next A string should go, the next B
string, etc.). We also have to allocate storage to the
variable for a copy of the pointers to the strings, and to the
variable for a copy of the record numbers to be rearranged. All of this
setup code having been executed, the main sorting loop is ready to go.
You may be surprised to see how few lines of code are involved in the main loop. If we disregard blank lines and those that contain only a starting or ending brace, only 17 lines contain any executable code. Yet in many common applications this simple algorithm will outperform all of the complex methods favored by the computer science textbooks.
The outer loop is executed once for each character position in the
sort. That is, if we are sorting a number of ten-character strings, it
will be executed ten times. Its role is to select which character of
the strings to be sorted will be used in the current pass through the
strings. The reason that the loop index
i decreases for
each pass through the loop was discussed above: each character position
is treated as a separate "key", so we have to sort on the least
significant "key" first and work our way up to the most significant
one. Each pass of this sort leaves strings in the same order as they
were in before the current pass if they have the same character in the
The first operation in the main loop is to use the
function to clear the
BucketCount array, which maintains
the counts for each possible ASCII character code. In the first pass
through the input data, we step through
ith character as the current character to
be sorted from each element of the array. The ASCII value of that
m, is used to select the element of
to be incremented, so that when we reach the end of the pointer array,
we will know the number of strings that have that character in the
current position being sorted.
Once all the counts have been accumulated, we can use that
information to fill in the
BucketPosition array, which
indicates the next available slot for each possible character position.
This serves the function of directing traffic from the input pointer
PtrArray to the output one,
Each element in the
BucketPosition array corresponds to
the next output position to be used in
TempPtrArray for a
given character. For example, suppose that the lowest character in the
current position of the array to be sorted is 'A', and there are five
strings that have an 'A' in the current position. In that case, the
initial value of
BucketPosition['A'] would be 0, and the
initial value of
BucketPosition['B'] would be 5, since
the first 'A' key should go into output slot 0 and the first 'B' key
should go into output slot 5.
After filling in the
BucketPosition array, we are
ready to rearrange the pointers to the keys by copying these pointers
into the temporary array
TempPtrArray. That is done in
the second inner loop through all the keys. First, we calculate the
appropriate index into the
BucketPosition array by
retrieving the character from the input string, as in the first inner
loop above. Then we use that index to retrieve the appropriate element
BucketPosition array, which we will use to copy
both the key pointer and the record number of the current record into
the output arrays,
Then we increment the element of
we have used, because the next key that has that particular character
in the current position should go in the next position of the output.
In our example, after the first 'A' key has been copied into the output
BucketPosition['A'] would be incremented to 1,
meaning that the next 'A' key should go into output slot 1. Similarly,
after the first 'B' key has been copied into the output array,
would be incremented to 6, meaning that the next 'B' key should go into
output slot 6.
After copying all of the pointers from
and all of the record numbers from
in sorted order, we copy them back into the original arrays for use in
the next sorting pass, if there is one, or as the final result if we
When we are done with the sort, we can return the sorted keys and
record numbers to
process (Figure mail.02), so that it can retrieve and
print the first and last name and the ZIP code for each customer in
order by their ZIP codes. We then free the dynamic storage used in the
function and return to the main program, which calls
to close the files, then exits.
You may be wondering how much performance improvement we have
obtained by using distribution sorting in our example program,
In order to answer this question, I have created another program called
mailx.cpp that is identical to
than the distribution counting sort. I've included calls to timing
functions in both versions and have run each of them four times with
different numbers of keys to see how they perform; Figures promailx.00 and promail.00 summarize the results of
While the improvement in performance increases with the number of
elements, even at the low end, we've multiplied the performance of the
sorting part of the task by about 43 times. At the high end, the
difference in the sorting speed is a phenomenal 316 to 1, and the
overall performance in that case is more than 50 times what it is using
qsort. Is optimization always that simple?
This would be a good time for me to mention an attempted optimization that actually did not help performance in the final analysis: the idea of "strip mining", which can most simply be described as dividing a file into two parts, one of which is the "key" portion and the other the "data" portion. Theoretically, it should be faster to read just the keys on the first pass of our mailing list program, then read the data on the second pass only for those records that have been selected. Unfortunately, when I tried this "optimization", it actually slowed the program down. I found this quite surprising, because in earlier tests on other machines, it did help somewhat. However, it doesn't help on my current machine, so I've omitted it.
Is there a moral to this real-life optimization story that summarizes both its successes and failures?
I think there are several. First, we've seen that a very simple algorithm can grossly out-perform a standard library function from a well-known compiler vendor. Second, and more important, optimization is a tricky business. The common thread of these lessons is that no matter how carefully you plan a programming task, the unexpected is very likely to occur. You have to be willing to examine all your preconceptions and discard them if necessary in the face of new evidence. Perhaps this willingness, along with the ability to handle the unexpected when it occurs, is the difference between a novice and an expert.
Before we leave this chapter, I should mention that the programs described here do not work properly when compiled with the DJGPP compiler, but produce general protection faults at apparently random times during their execution. This is quite surprising, because they work perfectly with the Microsoft C++ version 5 compiler. Unfortunately, I haven't been able to track down and eliminate the source of this problem; however, at least I can tell you about the problem before you discover it for yourself. As to the cause of the problem: I can only guess that it is caused by bugs in the most recent release of DJGPP.
In this chapter, we have covered the use of a bitmap to keep track of one attribute of a number of records, sorting via the distribution counting sort, and how to gain rapid access to data according to criteria specified at run time. In the next chapter we will examine the arithmetic coding method of data compression.
What modifications to the distribution counting sort would:
(You can find suggested approaches to problems in Chapter artopt.htm).
qsortrequires the user to supply the address of a function that will compare the items to be sorted. While this additional overhead biases the comparison slightly against
qsort, this small disadvantage is not of the same order of magnitude as the difference in inherent efficiency of the two algorithms.
Return to the table of contents
Return to my main page