To most people a graph is a visual representation of a data set, like a bar chart or an EKG. That’s not what this chapter is about.
In this chapter, a graph is an abstraction used to model a system that contains discrete, interconnected elements. The elements are represented by nodes (also called vertices) and the interconnections are represented by edges.
For example, you could represent a road map with one node for each city and one edge for each road between cities. Or you could represent a social network using one node for each person, with an edge between two people if they are “friends” and no edge otherwise.
In some graphs, edges have different lengths (sometimes called “weights” or “costs”). For example, in a road map, the length of an edge might represent the distance between two cities, or the travel time, or bus fare, and so on. In a social network there might be different kinds of edges to represent different kinds of relationships: friends, business associates, etc.
Edges may be undirected, if they representation a relationship that is symmetric, or directed. In a social network, friendship is usually symmetric: if A is friends with B then B is friends with A. So you would probably represent friendship with an undirected edge. In a road map, you would probably represent a one-way street with a directed edge.
Graphs have many interesting mathematical properties, and there is a branch of mathematics called graph theory that studies them.
Graphs are also useful, because there are many real world problems that can be solved using graph algorithms. For example, Dijkstra’s shortest path algorithm is an efficient way to find the shortest path from a node to all other nodes in a graph. A path is a sequence of nodes with an edge between each consecutive pair.
Sometimes the connection between a real world problem and a graph algorithm is obvious. In the road map example, it is not hard to imagine using a shortest path algorithm to find the route between two cities that minimizes distance (or time, or cost).
In other cases it takes more effort to represent a problem in a form that can be solved with a graph algorithm, and then interpret the solution.
For example, a complex system of radioactive decay1 can be represented by a graph with one node for each nuclide (type of atom) and an edge between two nuclides if one can decay into the other. A path in this graph represents a decay chain.
The rate of decay between two nuclides is characterized by a decay constant, λ, measured in becquerels (Bq) or decay events per second2.
In our best current model of physics, nuclear decay is a fundamentally random process, so it is impossible to predict when an atom will decay. However, given λ, the probability that an atom will decay during a short time interval dt is λ dt.
In a graph with multiple decay chains, the probability of a given path is the product of the probabilities of each decay process in the path.
Now suppose you want to find the decay chain with the highest probability. You could do it by assigning each edge a “length” of −logλ and using a shorest path algorithm. Why? Because the shortest path algorithm adds up the lengths of the edges, and adding up log-probabilities is the same as multiplying probabilities. Also, because the logarithms are negated, the smallest sum corresponds to the largest probability. So the shortest path corresponds to the most likely decay chain.
This is an example of a common and useful process in applying graph algorithms:
We will see other examples of this process soon.
Read the Wikipedia page about graphs at http://en.wikipedia.org/wiki/Graph_(mathematics) and answer the following questions:
Graphs are usually drawn with squares or circles for nodes and lines for edges. In the example below, the graph on the left represents a social network with three people.

In the graph on the right, the weights of the edges are the approximate travel times, in hours, between cities in the northeast United States. In this case the placement of the nodes corresponds roughly to the geography of the cities, but in general the layout of a graph is arbitrary.
To implement graph algorithms, you have to figure out how to represent a graph in the form of a data structure. But to choose the best data structure, you have to know which operations the graph should support.
To get out of this chicken-and-egg problem, I am going to present a data structure that is a good choice for many graph algorithms. Later we will come back and evaluate its pros and cons.
Here is an implementation of a graph as a dictionary of dictionaries:
class Graph(dict):
def __init__(self, vs=[], es=[]):
"""create a new graph. (vs) is a list of vertices;
(es) is a list of edges."""
for v in vs:
self.add_vertex(v)
for e in es:
self.add_edge(e)
def add_vertex(self, v):
"""add (v) to the graph"""
self[v] = {}
def add_edge(self, e):
"""add (e) to the graph by adding an entry in both directions.
If there is already an edge connecting these Vertices, the
new edge replaces it.
"""
v, w = e
self[v][w] = e
self[w][v] = e
The first line declares that Graph inherits from the built-in type dict, so a Graph object has all the methods and operators of a dictionary.
More specifically, a Graph is a dictionary that maps from a Vertex v to an inner dictionary that maps from a Vertex w to an Edge that connects v and w. So if g is a graph, g[v][w] maps to an Edge if there is one and raises a KeyError otherwise.
The __init__ method takes a list of vectices and a list of
edges as optional parameters. If they are provided, it calls
add_vertex and add_edge to add the vertices and edges to
the graph.
Adding a vertex to a graph means making an entry for it in the outer dictionary. Adding an edge makes two entries, both pointing to the same Edge. So this implementation represents an undirected graph (see Exercise 1.5).
Here is the definitions for Vertex:
class Vertex(object):
def __init__(self, label=''):
self.label = label
def __repr__(self):
return 'Vertex(%s)' % repr(self.label)
__str__ = __repr__
A Vertex is just an object that has a label attribute. We will add more attributes later, as needed.
__repr__ is a special function that returns a string
representation of an object. It is similar to __str__ except
that the return value from __str__ is intended to be readable
for people, and the return value from __repr__ is supposed to
be a legal Python expression.
The built-in function str invokes the __str__ method on
an object; similarly the built-in function repr invokes
__repr__.
In this case Vertex.__str__ and Vertex.__repr__ refer to
the same function, so we get the same string either way.
Here is the definition for Edge:
class Edge(tuple):
def __new__(cls, *vs):
return tuple.__new__(cls, vs)
def __repr__(self):
return 'Edge(%s, %s)' % (repr(self[0]), repr(self[1]))
__str__ = __repr__
Edge inherits from the built-in type tuple
and overrides the __new__ method. When you invoke
an object constructor, Python invokes __new__ to create
the object and then __init__ to initialize the attributes.
For mutable objects it is most common to override
__init__ and use the default implementation of
__new__, but because Edges inherit from tuple, they
are immutable, which means that you can’t modify the elements
of the tuple in __init__.
By overriding __new__, we can use the * operator to
gather the parameters and use them to initialize the elements of
the tuple. A precondition of this method is that there should
be exactly two arguments. A more careful implementation would
check.
Here is an example that creates two vertices and an edge:
v = Vertex('v')
w = Vertex('w')
e = Edge(v, w)
print e
Inside Edge.__str__ the term self[0] refers
to v and self[1] refers to w. So the output
when you print e is:
Edge(Vertex('v'), Vertex('w'))
Now we can assemble the edge and vertices into a graph:
g = Graph([v,w], [e])
print g
The output looks like this (with a little formatting):
{
Vertex('w'): {Vertex('v'): Edge(Vertex('v'), Vertex('w'))},
Vertex('v'): {Vertex('w'): Edge(Vertex('v'), Vertex('w'))}
}
If you evaluate this expression you get a dictionary rather than a Graph, so this might not be the best representation of a Graph. But it turns out to be non-trivial to generate a string representation that can recreate the graph.
In this exercise you will write Graph methods that will be useful for many of the Graph algorithms that are coming up.
Download greenteapress.com/compmod/GraphCode.py, which contains the code in this chapter. Run it as a script and make sure the test code in main does what you expect.
Make a copy of GraphCode.py called Graph.py. Add the following methods to Graph, adding test code as you go:
get_edge that takes two
vertices and returns the edge between them if it exists and
None otherwise. Hint: use a try statement.remove_edge that takes an
edge and removes all references to it from the graph.out_vertices that takes a
Vertex and returns a list of the adjacent vertices (the ones
connected to the given node by an edge).out_edges that takes a Vertex
and returns a list of edges connected to the given Vertex.add_all_edges that starts with
an edgeless Graph and makes a complete graph by adding edges
between all pairs of vertices.Test your methods by writing test code and checking the output. Then download greenteapress.com/compmod/GraphWorld.py. GraphWorld is a simple tool for generating visual representations of graphs. It is based on the World class in Swampy, so you might have to install Swampy first: you can download it from thinkpython.com/swampy.
Read through GraphWorld.py to get a sense of how it works. Then run it. It should import your Graph.py and then display a complete graph with 10 vertices.
Write a method named add_regular_edges that starts with an
edgeless graph and adds edges so that every vertex has the same degree
(which the method takes as a parameter). The degree of a node
is the number of edges it is connected to.
To create a regular graph with degree 2 you would do something like this:
vertices = [ ... list of Vertices ... ] g = Graph(vertices, []) g.add_regular_edges(2)
Note that it is not always possible to create a regular graph with a given degree. You should figure out and document the preconditions for this method.
To test your code, you might want to create a file named TestGraph.py that imports Graph.py and GraphWorld.py, then generates and displays the graphs you want to test.
Write a __repr__ method for Graphs that
returns a string representation of the graph that can be evaluated
to construct a new graph with the same structure and the same vertex
labels.
Note: to make this work, you will probably want to override
Vertex.__new__ so that if you try to create a Vertex with a
label that already exists, it returns a reference to the existing
Vertex instead of creating a new one. This is a variation of the
Singleton pattern which you can read about at
http://en.wikipedia.org/wiki/Singleton_pattern.
Create a file named Digraph.py and write a definition for a class called Digraph that inherits from Graph.
in_edges that takes a Vertex v
and returns a list of the directed edges coming into v. Which
is more efficient, in_edges or out_edges?in_vertices that takes a Vertex v and returns a list of the Vertices w connected by an edge
from w to v.You can download my solution to these exercises from greenteapress.com/compmod/Graph.py and greenteapress.com/compmod/Digraph.py
A random graph is just what it sounds like: a graph with edges generated at random. Of course, there are many random processes that can generate graphs, so there are many kinds of random graph. One of the more popular kinds is the Erdős-Rényimodel, denoted G(n,p), which generates graphs with n nodes, where the probability is p that there is an edge between any two nodes.
wikipedia.org/wiki/Random_graph
Create a file named RandomGraph.py and define a class named
RandomGraph that inherits from Graph and provides a method
named add_random_edges that takes a probability p as a
parameter and, starting with an edgeless graph, adds edges at random
so that the probability is p that there is an edge between any two
nodes.
A graph is connected if there is a path from every node to every other node (see wikipedia.org/wiki/Connectivity_(graph_theory).).
There is a simple algorithm to check whether a graph is connected. Start at any vertex and conduct a search (usually a breadth-first-search or BFS), marking all the vertices you can reach. Then check to see whether all vertices are marked.
You can read about breadth-first-search at wikipedia.org/wiki/Breadth-first_search. Searches are based on an operation called visiting: to “visit” a node, you mark it so you can tell later that it has been visited, then visit any unmarked vertices it is connected to.
A breadth-first-search visits nodes in the order they are discovered. It uses a queue or a “worklist” to keep them in order. Here’s how it works:
Write a Graph method named is_connected that returns
True if the Graph is connected and False otherwise.
Paul Erdős was a Hungarian mathematician who spent most of his career (from 1934 until his death in 1992) living out of a suitcase, visiting colleagues at universities all over the world, and authoring papers with more than 500 collaborators.
He was a notorious caffeine addict and, for the last 20 years of his life, an enthusiastic user of amphetamines. He attributed at least some of his productivity to the use of these drugs; after giving them up for a month to win a bet, he complained that mathematics had been set back by a month3.
In the 1960s he and Afréd Rényi wrote a series of papers introducing the Erdős-Rényi model of random graphs and studying their properties.
One of their most surprising results is the existence of abrupt changes in the characteristics of random graphs as random edges are added. They showed that for a number of graph properties there is a threshold value of the probability p below which the property is rare and above which is is almost certain. This transition is sometimes called a “phase change” by analogy with physical systems that change state at some critical value of temperature (see wikipedia.org/wiki/Phase_transition).
One of the properties that displays this kind of transition is connectedness. For a given size n, there is a critical value of p* such that a random graph G(n, p) is unlikely to be connected if p < p* and very likely to be connected if p > p*.
Write a program that tests this result by generating a large number of random graphs for given values of n and p. and computes the fraction of them that are connected.
How does the abruptness of the transition depend on n?
You can download my solution from greenteapress.com/compmod/RandomGraph.py.
If you have read the documentation of Python dictionaries, you might have noticed the methods iterkeys, itervalues and iteritems. These methods are similar to keys, values and items, except that instead of building a new list, they return iterators.
An iterator is an object that provides a method named next that returns the next element in a sequence. Here is an example that creates a dictionary and uses iterkeys to traverse the keys.
>>> d = dict(a=1, b=2) >>> iter = d.iterkeys() >>> print iter.next() a >>> print iter.next() b >>> print iter.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
The first time next is invoked, it returns the first key from the dictionary (the order of the keys is arbitrary). The second time it is invoked, it returns the second element. The third time, and every time thereafter, it raises a StopIteration exception.
An iterator can be used in a for loop; for example, the following is a common idiom for traversing the key-value pairs in a dictionary:
for k, v in d.iteritems():
print k, v
In this context, iteritems is likely to be faster than items because it doesn’t have to build the entire list of tuples; it reads them from the dictionary as it goes along.
But it is only safe to use the iterator methods if you do not add or remove dictionary keys inside the loop. Otherwise you get an exception:
>>> d = dict(a=1) >>> for k in d.iterkeys(): ... d['b'] = 2 ... RuntimeError: dictionary changed size during iteration
Another limitation of iterators is that they do not support index operations.
>>> iter = d.iterkeys() >>> print iter[1] TypeError: 'dictionary-keyiterator' object is unsubscriptable
If you need indexed access, you should use keys. Alternatively, the Python module itertools provides many useful iterator functions.
A user-defined object can be used as an iterator if it
provides methods named next and __iter__.
The following example is an iterator that always returns True:
class AllTrue(object):
def next(self):
return True
def __iter__(self):
return self
The __iter__ method for iterators returns the iterator
itself. This protocol makes it possible to use iterators
and sequences interchangeably in many contexts.
Iterators like AllTrue can represent an infinite sequence. They are useful as an argument to zip:
>>> print zip('abc', AllTrue())
[('a', True), ('b', True), ('c', True)]
For many purposes the easiest way to make an iterator is to write a generator, which is a function that contains a yield statement. yield is similar to return, except that the state of the running function is stored and can be resumed.
For example, here is a generator that yields successive letters of the alphabet:
def generate_letters():
for c in 'abc':
yield c
If you call this function, the return value is an iterator:
>>> iter = generate_letters() >>> print iter <generator object at 0xb7d4ce4c> >>> print iter.next() a >>> print iter.next() b
And you can use an iterator in a for loop:
>>> for letter in generate_letters(): ... print letter ... a b c
A generator with an infinite loop returns an iterator that never terminates. For example, here’s a generator that cycles through the letters of the alphabet:
def alphabet_cycle():
while 1:
for c in string.lowercase:
yield c