Algorithm-BFS-DFS
BFS & DFS
Terminology
Graph
We first give the definitions of graphs! $G(V,E)$ is a set of vertices $V$ and a set of Binary Relations: $E \subseteq V \times V$.
We don’t consider graphs outside the simple graph, like edges of $(u,u)$.
We define $\left | E \right |$ as the numbers of edges in the set $E$. Obviously, $\left | E\right | \le C_{\left | V \right |}^{2} = O(\left | V\right |^2 )$ for undirected, and $\left | E\right | \le 2\times C_{\left | V \right |}^{2} = O(\left | V\right |^2 )$ for directed.
Neighbor Sets/Adjacencies
- The outgoing neighbor set of u ∈ V is $Adj^+(u) = {v ∈ V | (u, v) ∈ E}$
- The incoming neighbor set of u ∈ V is $Adj^−(u) = {v ∈ V | (v, u) ∈ E}$
- The out-degree of a vertex u ∈ V is $deg^+(u) = |Adj^+(u)|$
- The in-degree of a vertex u ∈ V is $deg^−(u) = |Adj^−(u)|$
- For undirected graphs, $Adj^−(u) = Adj^+(u)$ and $deg^−(u) = deg^+(u)$
- Dropping superscript defaults to outgoing, i.e., $Adj(u) = Adj^+(u)$ and $deg(u) = deg^+(u)$
Adjacency list
A Set data structure to map $u$ to $Adj(u)$. Like every node in the tree needs to store the parent and child node, every elements need to store all its neighbors!
- Common to use direct access array or hash table for $Adj$, since want lookup fast by vertex.
- Common to use array or linked list for each $Adj(u)$ since usually only iteration is needed.
Paths
- A path is a sequence of vertices $p = (v_1, v_2, . . . , v_k)$ where $(v_i, v_{i+1}) ∈ E, \ \forall 1 ≤ i < k$.
- A path is simple if it does not repeat vertices.
- The length ($p$) of a path p is the number of edges in the path
- The distance $δ(u, v)$ from $u ∈ V$ to $v ∈ V$ is the minimum length of any path from $u$ to $v$, i.e., the length of a shortest path from $u$ to $v$. (by convention, $δ(u, v) = ∞ $ if $u$ is not connected to $v$)
Model Graph Problems
We can construct a shortest path tree based on the graph with $O(\left |V \right |)$ size. To explain it more clearly, given $a \in V$, then the shortest path tree of $a$ is a tree whose root node is $a$ and contains all elements that are reachable to $a$ in the graph. Moreover, the way from the root node $a$ to the node $b$ is just the shortest path from $a$ to $b$ in the graph.
Breadth First Search
How to return a shortest path from source vertex $s$ for every vertex in graph? We can define $P(x)$ as the second to last vertex on a shortest path from $s$, or you can just imagine it as the parent of $x$ in the constructed shortest path tree of source vertex $s$.
- Define $P(s) == null$.
- Set of parents comprise a shortest paths tree with $O(|V|)$ size!
- Math principles: Assume vertex $u$ appears in the shortest path from the source vertex $s$ to the vertex $x$, which is more remote. Then the shortest of vertex $u$ is just a subset of this path!
- In the BFS algorithm, we need to compute $δ(s, v)$ and $P(v)$ for all $v \in V$.
Compute level sets $L_i = {v | v ∈ V \ \text{and} \ d(s, v) = i}$ (i.e., all vertices at distance $i$)
The key to BFS is to categorize vertices by levels. For vertices at the same level, continuously traverse until all are visited, then proceed to the next level. Obviously, we know the shortest distance from $s$ to $s$ is 0. Then we can grow the tree based on it and find the shortest path for all vertex in the graph.
In real world simulations, we can use queue to store different levels of vertices, which means no vertices in $L_i$ is touched until all vertices in $L_{i-1}$ is traversed.
Time complexity analysis: $O(|V| + |E|)$. (Linear Time!)
Depth First Search
Single source reachability
- Graph Path Problems
- Single Pair Reachability(G,s,t)
- Single Source Reachability(G,s)
- Single Pair Shortest Path(G,s,t)
- Single Source Shortest Paths(G,s) (SSSP)
If we don’t need to solve the SSSP problem, we can use another algorithm! Intuition: Visit outgoing adjacencies recursively, but never revisit a vertex.
Of course, we can use the shortest path tree to solve a harder problem (SSSP), but it takes linear time! We can actually take less time by using the BFS algorithms.
Algorithms
For DFS algorithms, we use recursion to reach to the deepest depth and then traceback to the previous height.
Before finishing the algorithms, we define $P(u)=v$, there exists a path from $u$ to $v$.
There maybe some problems! For example, if there exists $(u_1, v)$ and $(u_2, v)$ at the same time, definitely we can say $P(v) = u_1$ and $P(v) = u_2$. But what $P(v)$ actually is the last vertex that is assigned to $P(v)$. Then can this algorithm correctly finds all the reachable vertexes?
Let’s do the proof! Claim: DFS visits $v$ and correctly sets $P(v)$ for every vertex $v$ reachable from $s$.
We use induction, induct on $k$, where $k$ means vertices that can be reachable from source $s$ with the $k$ steps.
- $k = 0$ : is just the source vertex $s$!
- Inductive step: Consider vertex v with $δ(s, v) = k’ + 1$. (The minimum length of the two vertexes)
- Consider vertex $u$, the second to last vertex on some shortest path from $s$ to $v$.
- By induction, since $δ(s, u) = k’$, DFS visits $u$ and sets $P (u)$ correctly.
- While visiting u, DFS considers $v ∈ Adj(u)$.
- Either $v$ is in $P$ , so has already been visited, or $v$ will be visited while visiting $u$.
- In either case, v will be visited by DFS and will be added correctly to $P$.
Depth First Search: $O(|E|)$. (The number of edges)
Connectivity of undirected graphs: Full DFS
Obviously, use DFS in one single vertex $s$ cannot reach all vertex in the node. Thus, if we want to reach all vertex in the graph (Graph Traverse), we can actually use Full-DFS algorithms.
Iterate for all vertexes in the loop:
- Choose an arbitrary unvisited vertex $s$, use BFS/DFS to explore all vertices reachable from $s$.
- If the vertex $s$ is reached, then we don’t need to use DFS($s$) (let s to be the source node) again!
- Time complexity: $O(|V|+|E|)$.
Full DFS can be used to detect how many connected components (for undirected graphs) or strongly connected components (for directed graphs) exist in a graph. Each time a new DFS is initiated, it indicates the discovery of a new connected component.
Topological Sort
Several Definitions
- DAGs (Directed Acyclic Graph): a directed graph that contains cycles.
- Of course, a Binary Tree is a DAG😂.
- A Topological Order of a graph $G = (V, E)$ is an ordering $f$ on the vertices such that: every edge $(u, v) ∈ E$ satisfies $f(u) < f(v)$.
- A topological order only appears in the DAGs, or it will cause problems! (iff)
- Assume $(u,v)\in E$ and $(v,u) \in E$, which means $f(u) < f(v)$ and $f(v) < f(u)$. Contradiction!
- A Finishing Order is the order in which a Full-DFS finishes visiting each vertex in G.
Claim
The reverse of finishing order is just the topological order of the graph.
Basic principle: We need to prove the ending of function $visit(u)$ is after the ending of the function $visit(v)$. In Full DFS, every vertex will only be visited once, and the existence of $(u,v)$ enables that:
In the first exploration of $u$, for the recursive function, the ending of function $visit(v)$ is before $visit(u)$. ($u$ is in the lower stack frame.) (Unless that $v$ has been explored previously, which is obviously right).
Cycle detection
How can we find a graph is Acyclic?
Consider the full-DFS algorithms, if reverse finishing order for Full-DFS is not a topological order, then G must contain a cycle. Check if G is acyclic: for each edge (u, v), check if v is before u in reverse finishing order.
Claim: If $G$ contains a cycle, Full-DFS will traverse an edge from $v$ to an ancestor of $v$.
To return such a cycle, maintain the set of ancestors along the path back to s in Full-DFS. (Or just the stack frame for the recursion function!)
Lecture Notes for CS1952 SJTU
DFS的实现和应用
- DFS需要搜索到所有的节点需要保证图是联通的。(实际上这是充分必要条件)
- DFS可以使用递归实现,也可以手动使用栈来实现。(本质上是一样的)
- BFS使用队列实现。
Applications
无向图的连通性
使用BFS或者DFS搜索如果只会生成一棵树,说明这棵树是联通的。当然,也可以判断这个图的联通分量。
欧拉回路
- 奇点个数为$n$, if $n > 2$, then fail
- $n =2$: pass
- $n = 0$: pass
不存在$n = 1$的情况,因为根据握手定理:在任何无向图中,所有顶点的度数之和等于图中边数的两倍。
即,我们需要对一张图找到一条路径,使其:
- 起点和终点重合
- 遍历了图中的每一个点一次
- 不存在重复路径
如何检查欧拉回路的存在性?可以使用深度优先搜索。
换句话说,使用DFS找欧拉回路的过程相当于单次DFS搜索的一部分。在一般的DFS搜索中,递归到达最大深度会向上回溯,而欧拉回路不需要向上回溯的过程(因为他需要保证搜索相邻节点一定是有边直接相连的)
而是开启下一个DFS过程。(找出路径上另外一个尚有未被访问的边的节点)
接着,如果成功了,即找到了一个子环路,把这个子环路插入到上一级的环路中即可。
Code
1 |
|
哈密尔顿回路和旅行商问题
哈密尔顿回路:在无向图中找到一个回路,该回路通过每个节点一次且仅一次。
详见未来算法:旅行商问题。
有向图的强联通 (Kosaraju Algorithm)
- 从任意结点开始对有向图$G$进行深度优先搜索,得到一个深度优先森林。
- 将节点遍历:按照生成树的次序 + 对每一棵树上的节点进行后序遍历
- 将图$G$逆向得到$G_r$。
- 从编号最大的节点开始深度优先搜索$G_r$,得到的森林就是原图的强联通分量。
Thm
- $x$ is the root for the tree which contains $v$ in $G_r$, then $\exists x \to v \in G_r \text{ and } v \to x \in G$.
- $x$ is the root for the tree which contains $w$ in $G_r$, then $\exists x \to w \in G_r \text{ and } w \to x \in G$.
Then it suffices to prove:
- If $v$ and $w$ are in the same BFS tree for $G_r$, then $\exists v \to w \text{ and } w \to v \in G$.
Hence, we only need to prove:
$x$ is the root for the tree which contains $v$ in $G_r$, then $\exists x \to v \in G_r \text{ and } v \to x \in G$.
- For $x$ is the root node, there exists a path from $x$ to $v$ in $G_r$.
- For $x$ is the root node, the index for $x$ is greater than $v$. Hence, all work of $v$ will be finished before $x$.
- Hence, there exists a path from $v$ to $x$. ($v$ finishes before $x$)