数据结构与算法笔记
PREFACE
Reference Book:
- Data Structures and Algorithm Analysis in C++ 4th Edition
- Data Structures and Algorithm Analysis in Java 3rd Edition
📓 This note would present code in multiple languages
(C++
, Java
and Python
) where they
are tested well to ensure their running. Make sure you have commanded at
least one of them first, or it will be hard for you to read.
Copyright: WRITTEN BY RYKER ZHU in Shanghai under CC BY-NC-SA
Unless otherwise noted, all graphics and images in this post were drawn by the author and are protected by copyright under the Creative Commons agreement. Any commercial use of the works is strictly prohibited. By copying or adapting this article, you are agreeing to and accepting the restrictions of the terms and conditions.
If you find any errors, please contact me via email and I will make corrections as soon as possible.
Tips: You can click on the table of contents on the right sidebar of the web page to navigate.Overview
C++ Details
Lvalues, Rvalues and References
As the textbook defines,
Lvalue: expression that identifies a non-temporary object;
Rvalue: expression that identifies a temporary object or or is a value not associated with any object.
📓 If a variable has a name, then regardless of whether it is modifiable, it is a lvalue.
Click Here to know more about lvalues and rvalues
Example
1 |
|
Neither the constant 666
nor the expression
variable + 1
are lvalues since they are temporary results
of expressions, and they reside in some temporary register for the
duration of computation.
1 |
|
Not all lvalues can be assigned to but except for this special case, initially, lvalues meant "values suitable for left-hand-side of assignment".
1 |
|
All lvalues that aren't arrays, functions or of incomplete types can be converted to rvalues.
1 |
|
Rvalues cannot be converted into a lvalue reference but can be const lvalue reference. (That is why pass-by-constant is so common in C++)
1 |
|
C++11 standard supports rvalue reference and it is useful for detecting whether an object is temporary or persistent and especially with move semantics.
1 |
|
Generic Classes and Templates
Both Java and C++ provides "template" but their mechanisms differ
from each other. Java employs Type Erasure (since all
classes written in Java extends from java.lang.Object
)
while C++ template member functions would only be created when they are
called.
1 |
|
Same class written in Java:
1 |
|
Algorithm Analysis
Mathematical Analysis
Some contents are just directly copied from another post: Note of Introduction to Algorithms
Running Time Calculation
General Rules
for
loopsThe running time of a
for
loop is at most the running time of the statements inside the for loop (including tests) times the number of iterations.Nested loops
The total running time of a statement inside a group of nested loops is the running time of the statement multiplied by the product of the sizes of all the loops.
Consecutive statements
These just add (which means that the maximum is the one that counts).
if
/else
The running time of an if/else statement is never more than the running time of the test plus the larger one of the running times of two braches.
Logarithms in the Running Time
General rules to distinguish between \(O(n)\) and \(O(\log n)\)
- An algorithm is \(O\left(\log{n}\right)\) if it takes constant \(O\left(1\right)\) time to cut the problem size by a fraction (which is usually \(\dfrac12\)).
- An algorithm is \(O\left(n\right)\) if constant time is required to merely reduce the problem by a constant amount (such as to make the problem smaller by 1).
Algorithms | Running Time |
---|---|
Binary Search | Logarithm |
Euclidean Algorithm |
Lists, Stacks and Queues
flowchart LR;
logical(Logical Structure) --- linear(Linear Structure) & nonlinear(Non-linear Structure)
linear --- List & Stack & Queue & String & Array
nonlinear --- Tree & Graph
click Tree href "https://devexzh.github.io/2023/Note_Of_Data_Structures_And_Algorithms/#trees" _blank
click Graph href "https://devexzh.github.io/2023/Note_Of_Data_Structures_And_Algorithms/#graph-algorithms" _blank
click Stack href "https://devexzh.github.io/2023/Note_Of_Data_Structures_And_Algorithms/#stacks" _blank
click Queue href "https://devexzh.github.io/2023/Note_Of_Data_Structures_And_Algorithms/#queues" _blank
Stacks
A stack is a list with the restriction that insertions and deletions can be performed in only one position, namely, the end of the list, called the top.
Implementations
Array or vector based
Features: easy, convenient but easy to cause errors.
- Stack Underflow — An error called when an item is popped from a stack, but the stack is empty.
- Stack Overflow — An error called when an item is pushed onto a stack, but the stack is full.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41#define STACK_MAX_SIZE 128
template<typename ElementType>
class Stack {
private:
ElementType *base, *top;
int stacksize;
public:
Stack() : base(new ElementType[STACK_MAX_SIZE]),
top(base),
stacksize(STACK_MAX_SIZE) {
if(!base) throw "Stack Overflow";
}
~Stack() noexcept {
delete[] base;
}
void push(const ElementType& value) {
if(top - base == stacksize)
throw "Full Stack";
*top = value;
++top;
}
ElementType& pop() {
if(top == base)
throw "Empty Stack";
ElementType* e = top;
--top;
return *e;
}
bool isEmpty() const noexcept {
return top == base;
}
int size() const noexcept {
return top - base;
}
};Linked-list based
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51template<typename ElementType>
class Stack {
private:
class StackNode {
friend Stack;
private:
ElementType data;
StackNode* next;
} *top;
public:
Stack() noexcept : next(nullptr) {}
~Stack() noexcept {
StackNode *p = top;
while(p->next) {
delete p;
p = p->next;
}
}
void push(const ElementType& value) noexcept {
StackNode* p = new StackNode;
p->data = value;
p->next = top;
top = p;
}
ElementType& pop() {
if(!top) throw "Stack Underflow";
ElementType&& e = top->data;
StackNode* p = top;
top = top->next;
delete p;
return e;
}
bool isEmpty() const noexcept {
return !top;
}
int size() const noexcept {
if(!top) return 0;
StackNode* p = top;
int count = 1;
while(p->next) {
p = p->next;
++count;
}
return count;
}
};
Standard Libraries Support
C++ | Java | |
---|---|---|
Standard Library | #include <stack> std::stack<T> |
package java.util; Stack<E> extends Vector<E> |
Check if is empty | bool empty() const |
boolean empty() |
Number of elements | size_type size() const |
int size() (inherited from Vector<E> ) |
Top element access | T& top() |
E peek() |
Push | void push(const T&) |
E push(E) |
Pop | void pop() |
E pop() |
Queues
Like stacks, queues are lists. With a queue, however, insertion is done at one end whereas deletion is performed at the other end.
Implementations
Array or vector based
When array is employed to serve as the base container of queue, it would sometimes encounter the problem of overflow (the pointer
rear
points out of range, as is depicted as the fourth figure above). A better way to solve it is to consider the array as a circular one (i.e. circular array), as is shown below.There still exists a problem: how to deal with
rear == front
since both empty \(Q_e\) and full queue \(Q_f\) satisfy the condition, as is shown above. There are three methods to wrestle with it:- Add an extra boolean field to distinguish whether it is full or empty;
- Add an extra integer field to keep the record of number of elements;
- Reserve an index intentionally.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35#define MAX_QUEUE_SIZE 128
template<typename ElementType>
class Queue {
private:
ElementType *base;
int front, rear;
public:
Queue() : font(0), rear(0),
base(new ElementType[MAX_QUEUE_SIZE]) {}
void enqueue(const ElementType& value) {
if((rear + 1) % MAX_QUEUE_SIZE == front)
throw "Full Stack, cannot enqueue";
base[rear] = value;
// Simply increment rear might cause overflow
rear = (rear + 1 == MAX_QUEUE_SIZE ? 0 : rear + 1);
}
ElementType& dequeue() {
if(front == rear)
throw "Empty Queue, Nothing to dequeue";
ElementType&& e = base[front];
front = (front + 1 == MAX_QUEUE_SIZE ? 0 : front + 1);
return e;
}
int size() const noexcept {
return (rear - front + MAX_QUEUE_SIZE) % MAX_QUEUE_SIZE;
}
bool isEmpty() const noexcept {
return front == rear;
}
};Linked-list based
Trees
A tree is a collection of \(N\) nodes.
\(N=0\), it is empty;
\(N\gt 0\), it is consists of
- a distinguished node \(r\), called the root;
- zero or more nonempty (sub)trees \(T_1,\,T_2,\,\cdots,\,T_k\), each of whose roots are connected by a directed edge from r.
Properties
Depth: The length of the unique path from the root to a node \(n_i\). (\(\text{Level}=\text{depth}+1\))
Height: The length of the longest path from a node \(n_i\) to a leaf.
- The height of a tree is equal to the height of the root.
Binary Trees
Properties
- A binary tree has \(\textcolor{orange}{\text{at most}}\) \(2^d\) nodes with depth \(d\);
- A binary tree has \(\textcolor{orange}{\text{at most}}\) \(2^{h-1}-1\) nodes with height \(h\);
- For all binary tree \(T\), if it has \(n_0\) leaves and \(n_2\) nodes with degree of \(2\), then \(n_0=n_2+1\).
Full Binary Trees 🆚 Complete Binary Trees
- A full binary tree (sometimes proper binary tree) is a tree in which every node other than the leaves has two children.
- A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.
Implementation (C++)
Sequential storage
1
2template<typename ElementType>
using BinaryTree<ElementType> = std::vector<ElementType>;Linked-list storage
1
2
3
4
5
6
7
8
9
10
11
12
13
14template<typename ElementType>
class BinaryTree {
public:
BinaryTree() : left_child(nullptr), right_child(nullptr) {}
~BinaryTree() {
delete left_child;
delete right_child;
left_child = right_child = nullptr;
}
private:
BinaryTree* left_child, right_child;
ElementType data;
};or add an extra field
parent
, a pointer to its parent1
2
3
4
5
6
7
8
9
10
11
12
13
14template<typename ElementType>
class BinaryTree {
public:
BinaryTree() : left_child(nullptr), right_child(nullptr), parent(nullptr) {}
~BinaryTree() {
delete left_child;
delete right_child;
left_child = right_child = nullptr;
}
protected:
BinaryTree* left_child, right_child, parent;
ElementType data;
};
Tree Traversals
Preorder Traversal | Inorder Traversal | Postorder Traversal |
---|---|---|
If the tree is empty, then simply return. | ||
|
|
|
Implementation (C++)
1 |
|
Applications
Mathematical notations
Preorder Traversal:
Also called Polish notation (PN) |
|
Inorder Traversal: | |
Postorder Traversal:
Also called Reverse Polish notation (RPN) |
Construct a binary tree
1 |
|
Copy a binary tree
1 |
|
Calculate the depth
1 |
|
Calculate the number of nodes
1 |
|
Calculate the number of leaves
1 |
|
Binary Search Trees
The property that makes a binary tree into a binary search tree is that for every node \(X\),
- the values of all the items in its left subtree are smaller than the item in \(X\);
- and the values of all the items in its right subtree are larger than the item in \(X\).
Implementation (C++)
1 |
|
✴ Treaps
AVL Trees
An AVL (Adelson-Velskii and Landis) tree is a binary search tree with a balance condition.
An AVL tree is identical to a binary search tree, except that for every node in the tree, the height of the left and right subtrees \(\mathrm{H}_L,\,\mathrm{H}_R\) can differ by at most 1, i.e. \(\left|\mathrm{H}_L - \mathrm{H}_R\right|\leq1\).
Adjustments for unbalanced AVL trees
✴ Red-Black Trees
Red–black trees are isomorphic to 2-3 trees, meaning that they are equivalent data structures.
To understand red-black trees, hence, we should first know what 2-3 trees are:
By artificially making the right key in the 3-nodes of the 2-3 tree point to the left key and setting the color of the left key to red, we get a transformed 2-3 tree; then we just need to make the red node the left child of the black node, and we get a red-black tree.
Splay Trees
The basic idea of the splay tree is that after a node is accessed, it is pushed to the root by a series of AVL tree rotations.
Splay tree is another kind of binary search tree, with the property that the most recently searched or added content will be moved to the root of the tree, so that when the same content is searched for next time, the speed can be improved, because the tree search starts from the root, and the closer to the root, the faster it will be found.
The act of moving the most recently searched item to the root is called splay. It is important to note that after splaying, the tree may not necessarily be a balanced tree, but may become a skewed tree at worst.
✴ Top-Down Splay Trees
Suffix Trees
✴ k-d Trees
✴ Pairing Heaps
Heaps
Sorting
Hashing
Hash Table is a data structure that maps keys into their location of storage where the mapping is called a hash function \(\mathrm{Location}=\mathrm{Hash}\left(key\right)\).
Terminologies:
- Collision: two keys hash to the same value, i.e. \({key}_1\neq{key}_2\) but \(\mathrm{Hash}\left({key}_1\right)=\mathrm{Hash}\left({key}_2\right)\);
- Those colliding keys are called synonyms.
Hash Function
Principles of designing a "good" hash function:
- As simple as possible to increase the speed of processing of mapping;
- Addresses should be distributed as evenly as possible to minimize wasted space.
Several hash functions:
Division Modulo Method
\[\mathrm{Hash}\left(key\right)=key\;\mathrm{mod}\;p\]
The hash function divides the value \(key\) by \(p\) and then uses the remainder obtained. It is best suited that \(p\) is less than or equal to the size of the hash table and is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.
Mid Square Method
Folding Method
Methods to solve collisions
Open Addressing
Basic Idea: If there exists collision, try the next alternative cells until an empty cell is found.
Linear Probing
Collision Function: \(f\left(i\right)=i\)
Quadratic Probing
Collision Function: \(f\left(i\right)=\left(-1\right)^{i-1}i^2\)
Separate Chaining
The Disjoint Sets Class
The equivalence class of an element \(a\in{S}\) is the subset of \(S\) that contains all the elements that are related to \(a\).
- An equivalence relation is a relation \(R\) that satisfies three properties: reflexive, symmetric, transitive, denoted by \(\sim\).
- To decide if \(a\sim b\), we need only to check whether \(a\) and \(b\) are in the same equivalence class.
Implementation (C++)
1 |
|
Graph Algorithms
A graph \(G=\left(V,\,E\right)\) consists of a non-empty set of vertices \(V\), and a set of edges \(E\). Each edge is a pair \(\left(v,\,w\right)\), where \(v,\,w\in{V}\).
- If the pair is ordered, then the graph is directed (also referred to as digraphs).
- Sometimes an edge has a third component, known as either a weight or a cost.
Path: a sequence of vertices \(v_1,\,v_2,\,\cdots,\,v_N\) such that \(\left(v_i,\,v_{i+1}\right)\in E\) for \(1\leq i\lt N\)
- Length: the number of edges on the path, which is equal to \(N−1\).
- Loop : an edge \(\left(v,\,v\right)\) from a vertex to itself.
Representation
Adjacency Matrix
Undirected Graph | Adjacency Matrix |
---|---|
\(\begin{array}{c}\begin{array}{c}\\v_1\\v_2\\v_3\\v_4\\v_5\\\end{array}\begin{array}{c}\begin{matrix}v_1&v_2&v_3&v_4&v_5\end{matrix}\\\begin{pmatrix}\,0\,&\,1\,&\,0\,&\,1\,&\,0\,\\1&0&1&0&1\\0&1&0&1&1\\1&0&1&0&0\\0&1&1&0&0\end{pmatrix}\end{array}\end{array}\) |
Edges Without Weight
- For each edge \((v_i,\,v_j)\), set
\(A[v_i][v_j]\) to
true
(i.e.1
); - Otherwise the entry in the array is
false
(i.e.0
).
Edges With Weight
- For edges with weight associated, set \(A[v_i][v_j]\) equal to the weight;
- Use either a very large (i.e. \(+\infty\)) or a very small weight (or simply \(0\)) as a sentinel to indicate nonexistent edges.
Features
- The adjacency matrix of undirected graph is symmetric;
- The degree of vertex \(v_i\) is the summation of the i-th row (column).
Directed Graph | Adjacency Matrix |
---|---|
\(\begin{array}{c}\begin{array}{c}\\v_1\\v_2\\v_3\\v_4\\\end{array}\begin{array}{c}\begin{matrix}v_1&v_2&v_3&v_4\end{matrix}\\\begin{pmatrix}\,0\,&\,1\,&\,1\,&\,0\,\\0&0&0&0\\0&0&0&1\\1&0&0&0\end{pmatrix}\end{array}\end{array}\) |
- The i-th row: arc ended with \(v_i\) (outdegree edge)
- The j-th column: arc started with \(v_j\) (indegree edge)
Pros versus Cons
Pros
- Easy to check whether there exists an edge between two vertices;
- Easy to calculate degrees of any vertex.
Cons
- High Space Requirement \(\Theta\left(\left|V\right|^2\right)\) (where \(\left|V\right|\) stands for the cardinality of the set of vertices);
- Only suitable when the graph is dense: \(\left|E\right|=\Theta\left(\left|V\right|^2\right)\) (otherwise it is called sparse and most of these entries would contain zero.);
- Hard to insert and remove vertices;
- Time-costly to calculate how many edges in the graph \(O\left(\left|V\right|^2\right)\).
Implementation (C++)
1 |
|
Adjacency List
Graph | Corresponding Adjacency List |
---|---|
Features
- Adjacency lists are not unique;
- Undirected graphs double the space usage (if it has \(e\) edges, then it has to store \(2e\) nodes) and are thus suitable for storing sparse graphs;
- Less space is needed: \(O\left(\left|E\right|+\left|V\right|\right)\)
Implementation (C++)
1 |
|
Improvements
flowchart LR;
adjlist(Adjacency List)---udg(Undirected Graph) & dg(Directed Graph)
udg -->|Hard to calculate the degrees of vertices| ol(Orthogonal List)
dg -->|Each edge doubles the storage| am(Adjacency Multilist)
Iteration
Depth First Search
1 |
|
Time complexity: \(O(n^2)\) since rows of every vertex would be iterated.
Breadth First Search
Applications
Minimum Spanning Tree (MST)
A spanning tree is a sub-graph of an undirected connected graph, which includes all the vertices (the reason why it is called spanning) of the graph with a minimum possible number of edges. If a vertex is missed or the graph is not acyclic, then it is not a spanning tree.
Minimum Spanning Tree (MST) of an undirected graph is a spanning tree with the lowest total cost among all the spanning trees.
Properties:
Let \(N=\left(V,\,E\right)\) denote a connected network (i.e. graph with weights), and \(U\) denote a non-empty subset of the set of vertices \(V\). There must exist a MST that includes the edge \(\left(u,\,v\right)\) where \(u\in U,\;v\in V-U\).
In other words, during the process of constructing spanning trees, the vertices:
- Either in the vertex set \(U\) that has already been in the spanning tree (i.e. visited);
- Or in the vertex set \(V-U\) that has not been in the spanning tree yet (i.e. not visited).
Shortest-Path Algorithms
flowchart LR;
spa(Shortest-Path) --- apsp(All-pairs shortest-path) --> f(Floyd's)
spa --- spsp(Single pair shortest-path) --> d(Dijkstra's)
Dijkstra's Algorithm
Dijkstra (/ˈdikstrɑ/ or /ˈdɛikstrɑ/)
- Version 1: Brutal Force Implemented
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66const int Infinity = 0x3f3f3f3f; // Represent ∞
typedef unsigned int index_t; // Index type
template<typename VertexType, typename ArcType>
void Dijkstra(const AdjacencyMatrix<VertexType, ArcType> &graph
const index_t& source) {
/**
Preparation for searching
*/
unsigned vertices = graph.vertexCount();
// The array that holds the shortest distance from source to i
int* distances = new int[vertices];
// Indicate whether the vertex i has been visited
bool* visited = new bool[vertices];
// Initialize all the elements in the array with default values
memset(distances, 0x3f, sizeof(distances));
memset(visited, false, sizeof(visited));
// Distance of source vertex from itself is always 0
distances[source] = 0;
/**
Utility function
*/
// Utility function to find the minimum distance value
auto mimimumDistance = [&distances, &visited, vertices] -> index_t () {
unsigned min = Infinity;
index_t min_index = 0;
for(unsigned i = 0; i < vertices; ++i) {
if(!visited[i] && distances[i] < min) {
min = distances[i];
min_index = i;
}
}
return min_index;
};
/**
Core body of Dijkstra's Algorithm
*/
for(index_t i = 0; i < vertices - 1; ++i) {
// Select the minimum distance vertex
index_t vertex_index = mimimumDistance();
// Mark the vertex at given index as visited
visited[vertex_index] = true;
// Update the distance value of the adjacent vertices
// of the selected vertex
for(index_t j = 0; j < vertices; ++j)
if(
// The vertex has not been visited yet
!distances[j] &&
// There exists an arc from the selected to j
// since 0 represents a non-existent edge
graph.at(vertex_index, j) &&
// Infinity stands for a unreachable vertex
distances[vertex_index] != Infinity &&
// The total cost from source to j through the
// selected vertex is smaller than the current
// cost of vertex at j
distances[vertex_index] +
graph[vertex_index][j] < distances[j]
)
distances[j] = distances[vertex_index] +
graph[vertex_index][j];
}
}
Floyd's Algorithm
Directed Acycline Graph
A directed acyclic graph (DAG) is often used to describe the process of a project or system; a project can be divided into a number of sub-projects, and once these sub-projects are completed, the entire project can be completed.
According to whether vertices or arcs are used to represent a DAG, it can be categorized into:
- AOV nets (Activity On Vertex)
- AOE Nets (Activity On Edge)
If there exists a directed path from \(i\) to \(j\), then \(i\) is the predecessor of \(j\) and \(j\) is the successor of \(i\). If \((i,\,j)\) is a directed edge of the net, then \(i\) is the immediate predecessor of \(j\) and \(j\) is the immediate successor of \(i\).
Topological Sorting
Under the premise that there are no loops in the AOV net, we arrange all the activities into a linear sequence such that if an arc \((i,j)\) exists in the AOV network, then \(i\) must come before \(j\) in the sequence, and linear sequences with this property are known as topologically ordered sequences, and the corresponding algorithm for topologically ordered sequences is known as topological sorting.
Algorithm
- Select a vertex without any predecessors and then print it out;
- Remove the selected vertex and arcs that end with it;
- Unless all vertices are printed or there is no vertex that do not have a predecessor, repeat the process 1 and 2.
⚠ Note that the topological sorted sequences are not unique!
Application of topological sorting
Construct a topologically ordered sequence of its vertices for a directed graph, and if all the vertices in the net are in its topologically ordered sequence, then the AOV network must be free of loops.