Book Notes
Contiguous vs. Linked Data Structures
- Contiguously-allocated structuresare composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.
- Linked data structuresare composed of distinct chunks of memory bound together bypointers, and include lists, trees, and graph adjacency lists.
Arrays
Advantages of contiguously-allocated arrays include:
- Constant-time access given the index– Because the index of each element maps directly to a particular memory address, we can access arbitrary data items instantly provided we know the index.
- Space efficiency– Arrays consist purely of data, so no space is wasted with links or other formatting information. Further, end-of-record information is not needed because arrays are built from fixed-size records.
- Memory locality– A common programming idiom involves iterating through all the elements of a data structure. Arrays are good for this because they exhibit excellent memory locality. Physical continuity between successive data accesses helps exploit the high-speedcache memory on modern computer architectures.
The downside of arrays is that we cannot adjust their size in the middle of a program’s execution.
Actually, we can efficiently enlarge arrays as we need them, through the miracle of dynamic arrays. The apparent waste in this procedure involves the recopying of the old contents on each expansion. Thus, each of thenelements move only two times on average, and the total work of managing the dynamic array is the sameO(n) as it would have been if a single array of sufficient size had been allocated in advance! The primary thing lost using dynamic arrays is the guarantee that each array access takes constant time in the worst case.
Pointers and Linked Structures
The relative advantages of linked lists over static arrays include:
- Overflow on linked structures can never occur unless the memory is actually full.
- Insertions and deletions aresimplerthan for contiguous (array) lists.
- With large records, moving pointers is easier and faster than moving the items themselves.
while the relative advantages of arrays include:
- Linked structures require extra space for storing pointer fields.
- Linked lists do not allow efficient random access to items.
- Arrays allow better memory locality and cache performance than random pointer jumping.
Exercises
1
A common problem for compilers and text editors is determining whether the parentheses in a string are balanced and properly nested. For example, the string ((())())() contains properly nested pairs of parentheses, which the strings )()( and ()) do not. Give an algorithm that returns true if a string contains properly nested and balanced parentheses, and false if otherwise. For full credit, identify the position of the first offending parenthesis if the string is not properly nested and balanced.
#include <string> using std::string; #include <stack> using std::stack; bool BalancedParentheses(string parentheses, int *pos) { stack<int> stk; const int kLeftPar = 1; int i; for (i = 0; i < parentheses.size(); ++i) { if (parentheses[i] == '(') { stk.push(kLeftPar); } else { if (stk.empty()) { *pos = i; return false; } stk.pop(); } } if (!stk.empty()) { *pos = --i; return false; } return true; }
2
Write a program to reverse the direction of a given singly-linked list. In other words, after the reversal all pointers should now point backwards. Your algorithm should take linear time.
struct Node { int value; struct Node *next; Node(int in_value, struct Node* in_next) : value(in_value), next(in_next) { } }; void ReverseLinkedList(Node **head) { if (!head || *head == NULL) { return; } Node *prev, *p, *next; prev = *head; p = prev->next; prev->next = NULL; while (p != NULL) { next = p->next; p->next = prev; prev = p; p = next; } *head = prev; }
3
We have seen how dynamic arrays enable arrays to grow while still achieving constant-time amortized performance. This problem concerns extending dynamic arrays to let them both grow and shrink on demand.
(a) Consider an underflow strategy that cuts the array size in half whenever the array falls below half full. Give an example sequence of insertions and deletions where this strategy gives a bad amortized cost.
(b) Then, give a better underflow strategy than that suggested above, one that achieves constant amortized cost per deletion.
- 容量是 6 的数组,当有 3 个元素是,insertion,然后 delete。它不断收缩和扩展容量。
- 当元素个数是总个数的 1/4 时,把容量收缩成 1/2。
4
Design a dictionary data structure in which search, insertion, and deletion can all be processed inO(1) time in the worst case. You may assume the set elements are integers drawn from a finite set 1,2, .., n, and initialization can take O(n)time.
因为元素个数是有限集合中的数,用 bit array 表示每个数。
5
Find the overhead fraction (the ratio of data space over total space) for each of the following binary tree implementations on n nodes:
(a) All nodes store data, two child pointers, and a parent pointer. The data field requires four bytes and each pointer requires four bytes.
(b) Only leaf nodes store data; internal nodes store two child pointers. The data field requires four bytes and each pointer requires two bytes.
- 所有点都一样: 4/(4+4*3) = 1/4
- 满树中,若页节点个数是 n,那么内部节点个数是 n-1, 4*n/(4*n + 4*(n-1)) = n/(2n-1)
6
Describe how to modify any balanced tree data structure such that search, insert, delete, minimum, and maximum still take O(logn) time each, but successor and predecessor now take O(1) time each. Which operations have to be modified to support this?
在树节点中添加指向 successor 和 predecessor 的指针。不影响操作 search, minimum, 和 maximum。只需在 insert 和 delete 操作相应更新指向 successor 和 predecessor 的指针。
7
Suppose you have access to a balanced dictionary data structure, which supports each of the operations search, insert, delete, minimum, maximum, successor, and predecessor in O(logn) time. Explain how to modify the insert and delete operations so they still take O(logn) but now minimum and maximum take O(1) time. (Hint: think in terms of using the abstract dictionary operations, instead of mucking about with pointers and the like.)
存储 max 和 min 这两个数。
- insert 时,新元素与这个两数对比并相应更新。
- delete 时,若是 min 元素被 delete,用它的 successor 更新;若是 max 元素被 delete,用它的 predecessor 更新。
8
Design a data structure to support the following operations:
- insert(x,T) – Insert item x into the set T.
- delete(k,T) – Delete the kth smallest element from T.
- member(x,T) – Return true iff x∈T.
All operations must take O(logn) time on an n-element set.
Balanced binary tree.
9
A concatenate operation takes two sets S1 and S2, where every key in S1 is smaller than any key in S2, and merges them together. Give an algorithm to concatenate two binary search trees into one binary search tree. The worst-case running time should be O(h), where h is the maximal height of the two trees.
S1 中的所有元素小于 S2,用 O(logn)的时间找出 S2 的最小元素,然后 S1 成为它的左子树,S2 成为它的右子树,组成新的搜索树。
10
In the bin-packing problem, we are given n metal objects, each weighing between zero and one kilogram. Our goal is to find the smallest number of bins that will hold the n objects, with each bin holding one kilogram at most.
- The best-fit heuristicfor bin packing is as follows. Consider the objects in the order in which they are given. For each object, place it into the partially filled bin with the smallest amount of extra room after the object is inserted.. If no such bin exists, start a new bin. Design an algorithm that implements the best-fit heuristic (taking as input the n weights w1,w2, …, wn and outputting the number of bins used) in O(nlogn)time.
- Repeat the above using the worst-fit heuristic, where we put the next object in the partially filled bin with the largest amount of extra room after the object is inserted.
使用 BST。主要找到能容纳这个元素的最小 bin,若所有 bin 都小于这个元素大小,就插入一个新的。
min_node = NULL; while node != NULL: if (node->weight >= w && node->left < w) { min_node = node; break; } else if (node->left >= w) { node = node->left; } else { node = node->right; } if (min_node == NULL) { bst->insert(new node(w)); } else { bst->delete(min_node); min_node->weight -= w; bst->insert(min_node); }
最大堆使用。每次选最大容量的 bin。若最大 bin 小于这个元素大小,就插入一个新的。
11
Suppose that we are given a sequence of n values x1,x2, …, xn and seek to quickly answer repeated queries of the form: given i and j, find the smallest value in xi,…,xj.
(a) Design a data structure that uses O(n2) space and answers queries in O(1) time.
(b) Design a data structure that uses O(n) space and answers queries in O(logn) time. For partial credit, your data structure can use O(nlogn) space and have O(logn) query time.
- n*n 的矩阵,i,j 中存的就是 i-j 的最小元素。
- 使用Cartesian tree或Treap。
12
Suppose you are given an input set S of n numbers, and a black box that if given any sequence of real numbers and an integer k instantly and correctly answers whether there is a subset of input sequence whose sum is exactly k. Show how to use the black box O(n) times to find a subset of S that adds up to k.
R = S for i = 1 to n: if bb(R/{si}) is True: R = R / {si}
13
Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:
• Add(i,y)– Add the value y to the ith number.
• Partial-sum(i)– Return the sum of the first i numbers
There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.
建立叶节点数n的 balanced binary tree,n个叶节点依次存储 A[1..n],书的内节点存储子树的和。
- Add(i,y),比较i与 n/2,决定左子树还是右子树,依次遍历到叶节点,并增加相应子树和,最后找到第i个元素相加。
- Partial-sum(i),比较i与 n/2,决定左子树还是右子树,每当遍历右子树,加上左子树的和,最后到叶节点,得到总的和。
14
Extend the data structure of the previous problem to support insertions and deletions. Each element now has both a key and a value. An element is accessed by its key. The addition operation is applied to the values, but the elements are accessed by its key. The Partial sum operation is different.
- Add(k,y)– Add the value y to the item with key k.
- Insert(k,y)– Insert a new item with key k and value y.
- Delete(k)– Delete the item with key k.
- Partial-sum(k)– Return the sum of all the elements currently in the set whose key is less than y,
The worst case running time should still be O(nlogn) for any sequence of O(n) operations.
建立以 key 排序的平衡搜索二叉树,并每个节点中添加一个左子树和的值。
- Add(k,y):随着搜索 key k,依次加左子树和,最后 key k 加上 y。
- Insert(k,y):随着搜索 key k 插入位置,依次加左子树和,最后插入 key k 的元素。
- Delete(k)::随着搜索 key k,依次减少左子树和,最后删除 key k 元素。
- Partial-sum(k):随着搜索 key k,依次加上左子树的和(因为左边的元素是小于的元素)。
15
Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e. , constant time, independent of the total number of integers stored). Assume that 1≤X≤n and that there are m+n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.
与Programming Pearls的 Column 课后题一样。
建立两个数组 A[1..n],B[1..m]和一个表示元素个数的变量 k。
- insert X: k = k + 1,A[X] = k, B[k] = X;
- search X: return (A[X] <= k) && B[A[X]] == X;
- delete X: 把 A[X]与末端 A[B[k]]交换,A[B[k]] = A[X], B[A[X]] = B[k]; k = k - 1;
18
What method would you use to look up a word in a dictionary?
Hash Table.
19
Imagine you have a closet full of shirts. What can you do to organize your shirts for easy retrieval?
以颜色排序,并二分搜索查找。
20
Write a function to find the middle node of a singly-linked list.
struct Node { int value; Node *next; }; Node* FindMidNode(Node *head) { Node *p, *q; p = head; q = head; i = 0; while (p != NULL) { i++; p = p->next; if (i == 2) { q = q->next; i = 0; } } return q; }
21
Write a function to compare whether two binary trees are identical. Identical trees have the same key value at each position and the same structure.
struct Node { int value; Node *left; Node *right; }; bool CompareBinaryTree(Node *head_m, Node *head_n) { if (head_m == NULL && head_n == NULL) { return true; } if (head_m == NULL || head_n == NULL) { return false; } return (head_m->value == head_n->value) && CompareBinaryTree(head_m->left, head_n->left) && CompareBinaryTree(head_m->right, head_n->right); }
22
Write a program to convert a binary search tree into a linked list
struct Node { int value; Node *next; }; struct TNode { int value; TNode *left; TNode *right; TNode(int value_in) { value = value_in; left = NULL; right = NULL; } }; void InsertToList(Node **head, int value) { Node *new_node = new Node; new_node->value = value; new_node->next = *head; *head = new_node; } void ConvertTreeToList(const TNode *root, Node **head) { if (root == NULL) { return; } ConvertTreeToList(root->right, head); InsertToList(head, root->value); ConvertTreeToList(root->left, head); }
23
Implement an algorithm to reverse a linked list. Now do it without recursion.
void ReverseLinkedList(Node **head) { if (!head || *head == NULL) { return; } Node *prev, *p, *next; prev = *head; p = prev->next; prev->next = NULL; while (p != NULL) { next = p->next; p->next = prev; prev = p; p = next; } *head = prev; }
24
What is the best data structure for maintaining URLs that have been visited by a Web crawler? Give an algorithm to test whether a given URL has already been visited, optimizing both space and time.
Hash Table.
26
Reverse the words in a sentence—i.e., “My name is Chris” becomes “Chris is name My.” Optimize for time and space.
- reverse 每个单词;
- reverse 整句。
void Reverse(char *begin, char *end) { char temp; while (begin < end) { temp = *begin; *begin = *end; *end = temp; begin++; end--; } } void ReverseWords(char *str) { char *word_begin; word_begin = NULL; char *p; p = str; while (*p != '\0') { if (word_begin == NULL && *p != ' ') { word_begin = p; } if (word_begin != NULL && (*(p+1) == ' ' || *(p+1) == '\0')) { Reverse(word_begin, p); word_begin = NULL; } ++p; } Reverse(str, p - 1); }
27
Determine whether a linked list contains a loop as quickly as possible without using any extra storage. Also, identify the location of the loop.
利用两个指针,一个快指针和一个慢指针,快的每次都比慢的多前进一个节点,如果存在 loop,快的总会与慢的相重叠。
loop 的起始点:
- 当快的与慢的指针相重叠时,验证有 loop,之后慢的指针不动,通过快的指针计算出 loop 的长度
- 重新从链表头开始,快指针比慢指针先前进 loop 的长度距离,提增慢和快指针,直到第一次相遇,相遇点就是 loop 的起始点。
28
You have an unordered array X of n integers. Find the array M containing n elements where Mi is the product of all integers in X except for Xi. You may not use division. You can use extra memory. (Hint: There are solutions faster than O(n2).)
对数组 X 扫描 2 次计算出如下 2 组数组:
$$ \begin{align} P_{0} = 1; P_{k}=X_{k}P_{k-1}=\prod_{i=1}^{k}X_{i} \newline Q_{n+1} = 1; Q_{k}=X_{k}Q_{k+1}=\prod_{i=k}^{n}X_{i} \end{align} $$所以得到 M:
$$ \begin{align} M_{i} = P_{i-1} Q_{i+1}, i\in[1,n] \end{align} $$29
Give an algorithm for finding an ordered word pair (e.g., “New York”) occurring with the greatest frequency in a given webpage. Which data structures would you use? Optimize both time and space.
Hash Table.