CIS 9615. Analysis of Algorithms

Basic Graph Algorithms

 

1. Graphs and their representations

A graph consists of edges (links) among vertices (nodes), and represents relations among data items.

Tree structures and linear structures are special cases of graph structures.

Formally, a graph G = (V, E), where V is a set of vertices, and E is a set of edges (from one vertex to another vertex), so |E| ≤ |V|2. The "size" of G is usually represented by both |V| and |E|, with respect to which the computational complexity of graph algorithms is represented.

A graph may either be directed or undirected, depending on whether the relation is symmetric or not.

A graph is connected if there is a path from every vertex to every other vertex. Specially, a directed graph is strongly connected if there is a (directed) path from every vertex to every other vertex; otherwise, it is weakly connected if the graph is connected after all directed edges are turned into undirected edges.

A graph is weighted if every edge has a number associated.

The two common ways to represent a graph is by an adjacency matrix and by an adjacency list. The former is better for dense graphs, while the latter is better for sparse graphs.

09-01 (72K)

For graphs with additional properties, special-purpose representation may be more efficient. Example: assignment 3.

 

2. Search

In graphs, various search (traversal) algorithms systematically visit every vertex by following the edges.

Breadth-First Search (BFS): starting from a source vertex, visits its neighbors layer-by-layer with increasing distance. The algorithm represents a graph using adjacency list, with a queue used to hold the nodes under processing. A "color" is attached to every node to indicate its status: to be processed (white), being processed (gray), have been processed (black). For each node, its predecessor and distance are recorded (on the path from the source node).

09-02 (33K)

Example:

09-03 (67K)

Under the assumption that all nodes can be reached from the source node, the running time of BFS is Θ(V + E), since it spends constant amortized cost on each node (enqueue/dequeue) and each link (follow the adjacency list).

As a by-product, this algorithm also generates a search tree with s as root, and find the shortest path between s and every reachable vertex in G. In line 13, if color[v] is not WHITE, then a cycle (loop, circle) is detected.

If in search, the algorithm tries to go as deep as possible in each step, then it is a Depth-First Search (DFS) algorithm.

The above BFS algorithm can be changed into a DFS algorithm by simply changing the queue into a stack, each time only processing one successor of the node at the top of the stack, and backtracking to a previous node if the top one has been fully processed.

The following is a recursive DFS, which uses two time-stamps, d and f, to record the time of color changing for each vertex. This information will be used in the other algorithms to be introduced later.

09-04 (41K)

Example: Figure 22.4

The running time of DFS is also Θ(V + E), though this algorithm does not assume the connectivity of the graph, so may produce a forest consisting of multiple trees.

The ascendant-descendant relation between nodes in the forest corresponds to the inclusion of their time intervals.

 

3. Topological sort

A topological sort of a directed acyclic graph ("dag") G = (V, E) is a linear ordering of all its vertices such that if G contains an edge (u, v), then u appears before v in the ordering. If the graph is cyclic, then no linear ordering is possible.

The following algorithm uses DFS to do topological sorting:
09-05 (16K)
Please note that it is the finishing order, not starting order, of the nodes that matters. Example.

The algorithm takes Θ(V + E) time.

The algorithm can be revised to detect cycle: see Lemma 22.11.

A more natural solution for topological sorting is to repeatedly remove a node that has no predecessor (or successor).

 

4. Minimum spanning trees

For an undirected and connected graph G = (V, E), its subgraph G' = (V, T) is a spanning tree of G if G' is connected, and |T| = |V| - 1. If G is also weighted, then T is a minimum spanning tree if it has the minimum value of ∑w(u, v) [for (u, v) ∈ T].

Kruskal's algorithm each time adds an edge that have the least w, and connect two previously unconnected subgraphs. The algorithm uses a set to represent a subtree, and Find-Set(u) identifies the set in which u belongs.

09-06 (26K)

Kruskal's algorithm is a greedy algorithm, and since it processes each edge in order, it takes O(E lg E) time, which can also be written as O(E lg V), given |E| < |V|2.

Prim's algorithm grows a MST by adding vertices one by one, starting from a root r.

09-07 (22K)

For each vertex v, key[v] is the minimum weight of any edge connecting v to a vertex in the tree. In line 7, Extract-Min(Q) takes the vertex with the lowest key value out of Q. Example:
09-08 (97K)

Prim's algorithm also takes O(E lg V) time. See the textbook for details.