Previous Up Next

Chapter 2  Analysis of algorithms

Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time and space requirements.

The practical goal of algorithm analysis is to predict the performance of different algorithms in order to guide program design decisions.

During the 2008 United States Presidential Campaign, candidate Barack Obama was asked to perform an impromptu analysis when he visited Google. Chief executive Eric Schmidt jokingly asked him for “the most efficient way to sort a million 32-bit integers.” Obama had apparently been tipped off, because he quickly replied, “I think the bubble sort would be the wrong way to go.”

This is true: bubble sort is conceptually simple but slow for large datasets. The answer Schmidt was probably looking for is “radix sort” (see wikipedia.org/wiki/Radix_sort)1.

So the goal of algorithmic analysis is to make meaningful comparisons between algorithms, but there are some problems:

The good thing about this kind of comparison that it lends itself to simple classification of algorithms. For example, if I know that the run time of Algorithm A tends to be proportional to the size of the input, n, and Algorithm B tends to be proportional to n2, then it would be reasonable to expect A to be faster than B for large values of n.

This kind of analysis comes with some caveats, but we’ll get to that later.

2.1  Order of growth

Suppose you have analyzed two algorithms and expressed their run times in terms of the size of the input: Algorithm A takes 100 n + 1 steps to solve a problem with size n; Algorithm B takes n2 + n + 1 steps.

The following table shows the run time of these algorithms for different problem sizes:

InputRun time ofRun time of
sizeAlgorithm AAlgorithm B
101 001111
10010 00110 101
1 000100 0011 001 001
10 0001 000 001> 1010

At n=10, Algorithm A looks pretty bad; it takes almost 10 times longer than Algorithm B. But for n=100 they are about the same, and for larger values A is much better.

The fundamental reason is that for large values of n, any function that contains an n2 term will grow faster than a function whose leading term is n. The leading term is the term with the highest exponent.

For Algorithm A, the leading term has a large coefficient, 100, which is why B does better than A for small n. But regardless of the coefficients, there will always be some value of n where a n2 > b n.

The same argument applies to the non-leading terms. Even if the run time of Algorithm A were n + 1000000, it would still be better than Algorithm B for sufficiently large n.

In general, we expect an algorithm with a smaller leading term to be a better algorithm for large problems, but for smaller problems, there may be a crossover point where another algorithm is better. The location of the crossover point depends on the details of the algorithms, the inputs and the hardware, so it is usually ignored for purposes of algorithmic analysis, but that doesn’t mean you can forget about it.

If two algorithms have the same leading order term, it is hard to say which is better; again, the answer will depend on the details. So for purposes of algorithmic analysis, functions with the same leading term are considered equivalent, even if they have different coefficients.

An order of growth is a set of functions whose asymptotic growth behavior is considered equivalent. For example, 2n, 100n and n + 1 belong to the same order of growth, which is written O(n) in “Big-Oh notation” and often called “linear” because every function in the set grows linearly with n.

All functions with the leading term n2 belong to O(n2); they are called “quadratic,” which is a fancy word for functions with the leading term n2.

The following table shows some of the orders of growth that appear most commonly in algorithmic analysis, in increasing order of badness.

Order ofName
growth 
O(1)constant
O(logb n)logarithmic (for any b)
O(n)linear
O(n logb n)“en log en”
O(n2)quadratic
O(n3)cubic
O(cn)exponential (for any c)

For the logarithmic terms, the base of the logarithm doesn’t matter; changing bases is the equivalent of multiplying by a constant, which doesn’t change the order of growth. Similarly, all exponential functions belong to the same order of growth regardless of the base of the exponent. Exponential functions grow very quickly, so exponential algorithms are only useful for small problems.

Exercise 1  

Read the Wikipedia page on Big-Oh notation at wikipedia.org/wiki/Big_O_notation and answer the following questions:

  1. What is the order of growth of n3 + n2? What about 1000000 n3 + n2. What about n3 + 1000000 n2?
  2. What is the order of growth of (n2 + n) * (n + 1)? Before you start multiplying, remember that you only need the leading term.
  3. If f is in O(g), for some unspecified function g, what can we say about a f + b?
  4. If f1 and f2 are in O(g), what can we say about f1 + f2?
  5. If f1 is in O(g) and f2 is in O(h), what can we say about f1 + f2?
  6. If f1 is in O(g) and f2 is O(h), what can we say about f1 * f2?

Programmers who care about performance often find this kind of analysis hard to swallow. They have a point: sometimes the coefficients and the non-leading terms make a real difference. And sometimes the details of the hardware, the programming language, and the characteristics of the input make a big difference. And for small problems asymptotic behavior is irrelevant.

But if you keep those caveats in mind, algorithmic analysis is a useful tool. At least for large problems, the “better” algorithms is usually better, and sometimes it is much better. The difference between two algorithms with the same order of growth is usually a constant factor, but the difference between a good algorithm and a bad algorithm is infinite2!

2.2  Analysis of basic operations

Most arithmetic operations are constant time; multiplication usually takes longer that addition and subtraction, and division takes even longer, but these run times don’t depend on the magnitude of the operands. Very large integers are an exception; in that case the run time increases linearly with the number of digits.

Indexing operations—reading or writing elements in a sequence or dictionary—are also constant time, regardless of the size of the data structure.

A for loop that traverses a sequence or dictionary is usually linear, as long as all of the operations in the body of the loop are constant time. For example, adding up the elements of a list is linear:

    total = 0
    for x in t:
        total += x

The built-in function sum is also linear because it does the same thing, but it tends to be faster because it is a more efficient implementation; in the language of algorithmic analysis, it has a smaller leading coefficient.

If you use the same loop to “add” a list of strings, the run time is quadratic because string addition is linear in the length of the strings. The string method join is usually faster because it is linear in the total length of the strings.

As a rule of thumb, if the body of a loop is in O(na) then the whole loop is in O(na+1). The exception is if you can show that the loop exits after a constant number of iterations. If a loop runs k times regardless of n, then the loop is in O(na), even for large k. But if it runs n / k times, the loop is in O(na+1), even for large k.

Most string and tuple operations are linear, except indexing and len, which are constant time. The built-in functions min and max are linear. The run-time of a slice operation is proportional to the length of the output, but independent of the size of the input.

All string methods are linear, but if the lengths of the strings are bounded by a constant—for example, operations on single characters—they are considered constant time.

Most list methods are linear, but there are some exceptions:

Most dictionary operations and methods are constant time, but there are some exceptions:

The performance of dictionaries is one of the minor miracles of computer science. We will see how they work in Section 2.4.

Exercise 2  

Read the Wikipedia page on sorting algorithms at wikipedia.org/wiki/Sorting_algorithm and answer the following questions:

  1. What is a “comparison sort?” What is the best worst-case order of growth for a comparison sort? What is the best worst-case order of growth for any sort algorithm?
  2. What is the order of growth of bubble sort, and why does Barack Obama think it is “the wrong way to go.”
  3. What is the order of growth of radix sort?
  4. What is a stable sort and why might it matter in practice?
  5. What is the worst sorting algorithm (that has a name)?
  6. What sort algorithm does the C library use? What sort algorithm does Python use? Are these algorithms stable? You might have to Google around to find these answers.
  7. Many of the non-comparison sorts are linear, so why does does Python use an O(n logn) comparison sort?

2.3  Analysis of search algorithms

A search is an algorithm that takes a collection and a target item and determines whether the target is in the collection, often returning the index of the target.

The simplest search algorithm is a “linear search,” which traverses the items in the collection in order, stopping if it finds the target. In the worst case it has to traverse the entire collection, so the run time is linear, hence the name.

The in operator for sequences uses a linear search; so do string methods like find and count.

If the elements of the sequence are in order, you can use a bisection search, which is O(logn). Bisection search is similar to the algorithm you probably use to look a word up in a dictionary. Instead of starting at the beginning and checking each item in order, you start with the item in the middle and check whether the word you are looking for comes before or after. If it comes before, then you search the first half of the sequence the same way. Otherwise you search the second half. Either way, you cut the number of remaining items in half.

If the sequence has 1,000,000 items, it will take about 20 steps to find the word or conclude that it’s not there. So that’s about 50,000 times faster than a linear search.

Exercise 3  

Write a function called bisect that takes a sorted list and a target value and returns the index of the value in the list, if it’s there, or None if it’s not.

Or you could read the documentation of the bisect module and use that!

Bisection search can be much faster than linear search, but it requires the sequence to be in order, which might require extra work.

There is another data structure, called a hashtable that is even faster—it can do a search in constant time—and it doesn’t require the items to be sorted. Python dictionaries are implemented using hashtables, which is why most dictionary operations, including the in operator, are constant time.

2.4  Hashtables

To explain how hashtables work and why their performance is so good, I will start with a simple implementation of a map and gradually improve it until it’s a hashtable.

I will use Python to demonstrate these implementations, but in real life you wouldn’t write code like this in Python; you would just use a dictionary! So for the rest of this chapter, you have to imagine that dictionaries don’t exist and you want to implement a data structure that maps from keys to values. The basic operations you have to implement are:

add(k, v):
Add a new item that maps from key k to value v. With a Python dictionary, d, this operation is written d[k] = v.
get(target):
Look up and return the value that corresponds to key target. With a Python dictionary, d, this operation is written d[target] or d.get(target).

For now, I will assume that each key only appears once. The simplest implementation of this interface uses a list of tuples, where each tuple is a key-value pair.

class LinearMap(list):

    def add(self, k, v):
        self.append((k, v))

    def get(self, target):
        for key, val in self:
            if key == target:
                return val
        raise KeyError

add uses list append, which is constant time. get uses a for loop to search the list: if it finds the target key it returns the corresponding value; otherwise it raises a KeyError. So get is linear.

An alternative is to keep the list sorted by key. Then get could use a bisection search, which is O(logn). But inserting a new item in the middle of a list is linear, so this might not be the best option. There are other data structures3 that can implement add and get in log time, but that’s still not as good as a hashtable, so let’s move on.

One way to improve LinearMap is to break the list of key-value pairs into smaller lists. Here’s an implementation called BetterMap, which is a list of 100 LinearMaps. As we’ll see in a second, the order of growth for get is still linear, but BetterMap is a step on the path toward hashtables:

class BetterMap(list):

    def __init__(self, n=100):
        self.add_maps(n)
        
    def add_maps(self, n):
        for i in range(n):
            self.append(LinearMap())

    def find_map(self, k):
        index = hash(k) % len(self)
        return self[index]

    def add(self, k, v):
        m = self.find_map(k)
        m.add(k, v)

    def get(self, k):
        m = self.find_map(k)
        return m.get(k)

The init method uses add_maps to initialize itself with 100 LinearMaps.

When add and get are invoked, they use find_map to figure out which map to put the new item into, or which map to search.

find_map uses the built-in function hash, which takes almost any Python object and returns an integer. A limitation of this implementation is that it only works with hashable keys. Mutable types like lists and dictionaries are unhashable.

Hashable objects that are considered equal will return the same hash value, but the converse is not necessarily true: two different objects can return the same hash value.

find_map uses the modulus operator to wrap the hash values into the range from 0 to len(self), so the result is a legal index into the BetterMap. Of course, this means that many different hash values will wrap into the same index. But if the hash function spreads things out pretty evenly (which is what hash functions are designed to do), then we expect n/100 items per LinearMap on average.

Since the run time of LinearMap.get is proportional to the number of items, we expect BetterMap to be about 100 times faster than LinearMap. The order of growth is still linear, but the leading coefficient is smaller. That’s nice, but still not as good as a hashtable.

Here (finally) is the crucial idea that makes hashtables fast: if you can keep the maximum length of the LinearMaps bounded by a constant, then LinearMap.get is constant time. All you have to do is keep track of the number items and when the maximum (or average) number of items per LinearMap exceeds a threshold, resize the hashtable by adding more LinearMaps.

Here is an implementation of a hashtable:

class HashMap(BetterMap):

    def __init__(self):
        BetterMap.__init__(self, 2)
        self.num = 0

    def add(self, k, v):
        if self.num == len(self):
            self.resize()

        BetterMap.add(self, k, v)
        self.num += 1

    def resize(self):
        pairs = []
        for t in self:
            pairs.extend(t)

        self.add_maps(len(self))
        for k, v in pairs:
            BetterMap.add(self, k, v)

HashMap inherits from BetterMap; it overrides __init__ and add and defines a new method named resize.

__init__ starts with just 2 LinearMaps and initializes num, which keeps track of the number of items.

add checks the number of items and the number of LinearMaps; if they are equal, then the average number of items per LinearMap is 1, so it resizes the HashMap.

resize uses list.extend to make a list of the items in the HashMap. Then it uses BetterMap.add_map to double the number of LinearMaps. Finally, it “rehashes” the items that were already in the HashMap.

Rehashing is necessary because changing the number of LinearMaps changes the denominator of the modulus operator in find_map. That means that some objects that used to wrap into the same LinearMap will get split up (which is what we wanted, right?).

Making the list of items, adding more LinearMaps, and rehashing are all linear, so resize is linear, which might seem bad, since I promised that add would be constant time. But remember that we don’t have to resize every time, so add is usually constant time and only occasionally linear. The total amount of work to run add n times is proportional to n, so the average time of each add is constant time!

To see how this works, think about starting with an empty HashTable and adding a sequence of items. We start with 2 LinearMaps, so the first 2 adds are fast (no resizing required). Let’s say that they take one unit of work each. The next add requires a resize, so we have to rehash the first two items (let’s call that 2 more units of work) and then add the third item (one more unit). Adding the next item costs 1 unit, so the total so far is 6 units of work for 4 items.

The next add costs 5 units, but the next three are only one unit each, so the total is 14 units for the first 8 adds.

The next add costs 9 units, but then we can add 7 more before the next resize, so the total is 30 units for the first 16 adds.

After 32 adds, the total cost is 62 units, and I hope you are starting to see a pattern. After n adds, where n is a power of two, the total cost is 2n − 2 units, so the average work per add is a little less than 2 units. When n is a power of two, that’s the best case; for other values of n the average work is a little higher, but that’s not important. The important thing is that it is constant time!

The following figure shows how this works graphically. Each block represents a unit of work. The columns show the total work for each add in order from left to right: the first two adds cost 1 units, the third costs 3 units, etc.

The extra work of rehashing appears as a sequence of increasingly tall towers with increasing space between them. Now if you knock over the towers, amortizing the cost of resizing over all adds, you can see graphically that the total cost after n adds is 2n − 2.

An important feature of this algorithm is that when we resize the HashTable it grows geometrically; that is, we multiply the size by a constant. It turns out that if you increase the size arithmetically—adding a fixed number each time—the average time per add is linear.

You can download my implementation of HashMap from greenteapress.com/compmod/Map.py, but remember that there is no reason to use it; if you want a map, just use a Python dictionary.

Exercise 4  

Write an implementation of the dictionary interface called TreeMap that uses a red-back tree to perform add and get in log time.

2.5  Summing lists

In my first draft of HashMap.resize I used the built-in function sum to make the list of items:

    def resize(self):
        pairs = sum(self, [])
        self.add_maps(len(self))
        for k, v in pairs:
            BetterMap.add(self, k, v)

This implementation is concise, and it works, but it is what Barack Obama would call “the wrong way to go.”

The implementation of sum looks something like this4:

def badsum(t, init):
    total = init
    for x in t:
        total += x
    return total

The nice thing about this implementation is that it is polymorphic; it works with any type that supports the + operation.

The problem is that when t is a list of lists, this function is quadratic. The reason is that the + operator creates a new list each time through the loop. If you imagine adding one element per iteration, then the first iteration takes 1 unit of work, the second takes 2, the third takes 3, and so on. The total after n iterations is n (n+1) / 2, which is in O(n2).

If resize is quadratic then HashMap.add is linear, so this implementation has a performance error, which means that I implemented an algorithm with an order of growth worse than the algorithm I meant to implement.

Exercise 5  

Measure the run time of sum for a list of lists with a range of sizes, and confirm that it is quadratic. Then run the same experiment using extend and confirm that it is linear.

A simple way to measure the run time of a program is to use the function times in the os module, which returns a tuple of floats indicating the time your process has used (see the documentation for details). I use a function etime, which returns the sum of “user time” and “system time” which is usually what we care about for performance measurement:

import os

def etime():
    """See how much user and system time this process has used
    so far and return the sum."""

    user, sys, chuser, chsys, real = os.times()
    return user+sys

To measure the elapsed time of a function you can call etime twice and compute the difference:

    start = etime()

    # put the code you want to measure here

    end = etime()
    elapsed = end - start

Alternatively, if you use IPython, you can use the timeit command (see ipython.scipy.org).

If an algorithm is quadratic, then you expect the run time, t as a function of input size, n, to look like this:

t = a n2 + b n + c 

Where a, b and c are unknown coefficients. If you take the log of both sides you get:

logt ∼ loga + 2 logn 

For large values of n, the non-leading terms are insignificant and this approximation is pretty good. So if you plot t versus n on a log-log scale, you should see a straight line with slope 2.

Plot your results on a log-log scale and estimate the slope of the line. You might have to experiment to find values of n that are big enough to yield good measurements but not so big that the program runs forever.

Write an implementation of sum using extend. Repeat your measurements and confirm that it is linear.

You can download my solution from greenteapress.com/compmod/listsum.py.

2.6  List comprehensions

One of the most common programming patterns is called a map: it involves traversing a list and performing a computation using each element of the list. The result is a new list that contains the results of the computations.

A dictionary is also sometimes called a map or a “mapping type,” but that is a different sense of the word. I’m sorry if the vocabulary is confusing, but we are stuck with it.

Here is an example that maps the square operator, **2, onto a list of numbers and accumulates a list that contains the square of each item:

    res = []
    for x in t:
        res.append(x**2)

This pattern is so common that Python provides a more concise syntax for it, called a list comprehension 5

    res = [x**2 for x in t]

This expression yields the same result as the previous loop. List comprehensions tend to be fast because the loop is executed in C rather than Python. The drawback is that they can be harder to debug.

List comprehensions can include an if clause that selects a subset of, or “filters”, the items. The following example makes a list of only the positive elements of t:

    res = [x for x in t if x > 0]

The following is an idiom for making a list of tuples, where each tuple is a value and a key from a dictionary:

    res = [(v,k) for k,v in d.iteritems()]

In this case it is safe to use iteritems, rather than items, because the loop does not modify the dictionary; and it is likely to be faster because it doesn’t have to make a list, just an iterator.

It is also possible to nest for loops inside a list comprehension. The following example builds a list of tuples, where each tuple is a pair of different vertices from the list vs:

    res = [(v, w) for v in vs for w in vs if v is not w]

Now that’s pretty concise!

Exercise 6  

Can you think of a way to use a list comprehension to concatenate a list of lists into a single list, as in Exercise 2.5?


1
But if you get a question like this in an interview, I think a better answer is, “The fastest way to sort a million integers is to use whatever sort function is provided by the language I’m using. Its performance is good enough for the vast majority of applications, but if it turned out that my application was too slow, I would use a profiler to see where the time was being spent. If it looked like a faster sort algorithm would have a significant effect on performance, then I would look around for a good implementation of radix sort.”
2
A n goes to infinity.
3
See wikipedia.org/wiki/Red-black_tree.
4
I assume it is actually written in C, but for algorithmic analysis, language doesn’t matter.
5
In this context, the sense of “comprehend” is something like “contain” rather than “understand.” See wikipedia.org/wiki/List_comprehension.

Previous Up Next