Previous Up Next

Chapter 13  Case study: Directed graphs and knots

Rachel Bobbins and Noam Rubin

13.1  Directed Graphs

Imagine that it’s 2 a.m. during finals week, and you’re scrambling to finish a research paper on topica obscura. Your adrenaline jumps when you find a relevant Wikipedia article, with links to more Wikipedia articles! You start clicking away, jumping from page to page in search of facts. An hour later, you realize you’re still clicking, but these are pages you’ve already visited. No matter which link you click, you can’t seem to discover any new pages!

If this has happened to you, you’ve unknowingly (and unfortunately) stumbled upon a knot in Wikipedia. Knots are a unique property of directed graphs. To understand them, it is necessary to have a basic understanding of directed graphs. The graphs in the rest of this book are undirected. In an undirected graph, an edge represents a symmetric relationship between vertices. This abstraction is appropriate for some applications, such as acquaintance networks and transportation networks; but other applications involve asymmetric relationships, for example, links between pages in the World Wide Web.

Wikipedia links are also asymmetric. Page A might link to page B, but page B doesn’t have to include any links to page A. To fully describe the relationship between pages A and B, we need two bits of information: whether A links to B, and whether B links to A . This is the essence of a directed graph. In a directed graph, a directed edge, e = (A,B), is an edge from A to B.

Knots are a unique (and sometimes cruel) property of directed graphs. A knot is a collection of vertices and edges with the property that every vertex in the knot has outgoing edges, and all outgoing edges from vertices in the knot terminate at other vertices in the knot. Thus it is impossible to leave the knot while following any path along its directed edges.

13.2  Implementation

In the scenario above, we imagine finding a knot in Wikipedia; that is, a set of articles with links to each other, but no links to other articles.

We wondered whether such a thing was even possible. Given that there are 3,811,000+ articles on Wikipedia and they all link to different pages, it seemed unlikely that there would be a knot, but we decided to investigate.

We started by extending Graph.py, from Section 2.2, to support directed graphs. The result is DirectedGraph.py, which you can download from thinkcomplex.com/DirectedGraph.py.

When we were designing DirectedGraph, we wanted to preserve the order of growth of Vertex and Arc lookups as they were in Graph. However, we needed to keep track of edges into a vertex and edges from a vertex. It made sense to keep track of these in separate dictionaries. Having two dictionaries takes twice as much memory, but this is a reasonable price to pay for maintaining the speed of lookups.

13.3  Detecting knots

To detect knots in a directed graph, we developed an algorithm based on a breadth-first-search. _bfsknots searches for all the vertices that can be reached from a given starting vertex, and returns these vertices as a set.

  def _bfsknots(self, s):
      # initialize the queue with the start vertex
      queue = [s]
      visited = set()
      on_first_vertex = True
      while queue:

          # get the next vertex
          v = queue.pop(0)

          # skip it if it's already marked
          if v in visited: continue

          # if we're on the first vertex, we're not actually visting
          if v != s or not on_first_vertex: visited.add(v)
          on_first_vertex = False
            
          for x in self.out_vertices(v):
              #if its out vertices have been cached, update visited
              if x in self._knot_cache.keys():
                  visited.update(self._knot_cache[x])
                  visited.add(x)
                    
              #otherwise add it to the queue
              elif x not in self._knot_cache.keys():
                  queue.append(x)

      return visited

We run _bfsknots for every vertex in the graph, and build a dictionary that maps from a vertex to the set of vertices it can reach. That is, for each vertex, V, we know the set of reachable vertices, SV.

If there is a knot in the graph, then for every vertex, V, in the knot the reachable set SV is exactly the set of vertices in the knot.

The function has_knot iterates through each vertex in a graph, and returns True if this condition holds. If it checks the whole graph and does not find a knot, it returns False.

    def has_knot(self):
        """
        Returns true if directed graph has a knot.
        """
        self._knot_cache = {}
        #build the cache of which vertices are accessible from which
        for v in self:
            self._knot_cache[v] = self._bfsknots(v)

        #searches for knot
        for v in self:
            if self._knot_at_v(v):
                return True
        return False

Finally, _knot_at_v checks whether all vertices reachable from V have the reachable set SV.

Exercise 1  

Download thinkcomplex.com/DirectedGraph.py. Read through the file to make yourself familiar with the terminology. Note that edges in DirectedGraph are represented by Arcs.

Determine the order of growth of has_knots experimentally. You can follow the example in Section 3.5. Below are some hints to help you out.

  1. Use DirectedGraph.add_random_edges to generate graphs of different sizes.
  2. For each graph, time how long it takes to check if it has a knot.
  3. On a log-log scale, plot run time versus the number of vertices and run time versus the number of edges.
Exercise 2   Find all the knots in a directed graph.
  1. Write a function, all_knots, that returns a list of all knots in a graph, where each knot is a set of vertices.
  2. Write a function named entry_points that returns a list of all the vertices that serve as entry points into knots.

13.4  Knots in Wikipedia

To find knots in Wikipedia, we selected 558 disjoint subsets of Wikipedia articles, organized by index. For example, the index of articles about neurobiology was used to define one subset, and the index of articles about Zimbabwe was used to define another subset. The 558 subsets contain about 321,000 articles, or 10% of Wikipedia.

Of these subsets we examined, 38% contained at least one knot. So if you are restricted to articles listed on a single index page, there is a substantial chance you will eventually find a knot.

For the complete Wikipedia, the probability is lower because articles can link to articles in other indices. Based on these results, we cannot say, yet, whether the complete Wikipedia has a knot.

When you started your hypothetical research on topica obscura, you might have used journal articles instead of Wikipedia articles. Instead of links, you would have used citations to find other papers.

In that case, you would never find a knot. Why not? Because science articles are published sequentially, and a paper cannot cite a paper that will be created in the future. So the directed graph that represents papers and their citations cannot have a knot.

Like this book?

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.


Think Bayes

Think Python

Think Stats

Think Complexity


Previous Up Next