Algorithm Design Manual Chapter 4

| Comments

Book Notes

4.3 Heapsort: Fast Sorting via Data Structures

Where in the Heap?

Problem: Given an array-based heap on n elements and a real number x, efficiently determine whether the kth smallest element in the heap is greater than or equal to x. Your algorithm should be O(k) in the worst-case, independent of the size of the heap. Hint: you do not have to find the kth smallest element; you need only determine its relationship to x.

Solution: There are at least two different ideas that lead to correct but inefficient algorithms for this problem:

  1. Call extract-minktimes, and test whether all of these are less thanx. This explicitly sorts the firstkelements and so gives us more information than the desired answer, but it takes O(klogn) time to do so.
  2. The kth smallest element cannot be deeper than the kth level of the heap, since the path from it to the root must go through elements of decreasing value. Thus we can look at all the elements on the first k levels of the heap, and count how many of them are less thanx, stopping when we either find k of them or run out of elements. This is correct, but takes O(min(n,2k-1)) time, since the top k elements have 2k-1 elements.

An O(k) solution can look at only k elements smaller than x, plus at most O(k) elements greater than x. Consider the following recursive procedure, called at the root with i= 1 with count=k:

int heap_compare(priority_queue *q, int i, int count, int x)
  if ((count <= 0) || (i > q->n)) return(count);
  if (q->q[i] < x) {
    count = heap_compare(q, pq_young_child(i), count-1, x);
    count = heap_compare(q, pq_young_child(i)+1, count, x);

If the root of the min-heap is ≥ x, then no elements in the heap can be less than x, as by definition the root must be the smallest element. This procedure searches the children of all nodes of weight smaller than x until either (a) we have found k of them, when it returns 0, or (b) they are exhausted, when it returns a value greater than zero. Thus it will find enough small elements if they exist.

But how long does it take? The only nodes whose children we look at are those < x, and at most k of these in total. Each have at most visited two children, so we visit at most 3k nodes, for a total time of O(k).

4.5 Mergesort: Sorting by Divide-and-Conquer

Mergesort is a great algorithm for sorting linked lists, because it does not rely on random access to elements as does heapsort or quicksort. Its primary disadvantage is the need for an auxilliary buffer when sorting arrays. It is easy to merge two sorted linked lists without using any extra space, by just rearranging the pointers. However, to merge two sorted arrays (or portions of an array), we need use a third array to store the result of the merge to avoid stepping on the component arrays

4.9 Binary Search and Related Algorithms

int binary_search(item_type s[], item_type key, int low, int high)
  int middle; /* index of middle element */
  if (low > high) return (-1); /* key not found */
  middle = (low+high)/2;
  if (s[middle] == key) return(middle);
  if (s[middle] > key)
    return (binary_search(s,key,low,middle-1));
    return (binary_search(s,key,middle+1,high));

Counting Occurrences

This algorithm runs inO(lgn+s), wheresis the number of occurrences of the key. This can be as bad as linear if the entire array consists of identical keys. A faster algorithm results by modifying binary search to search for the boundary of the block containing k, instead of kitself. Suppose we delete the equality test

if (s[middle] == key) return(middle);

from the implementation above and return the index low instead of −1 on each unsuccessful search. All searches will now be unsuccessful, since there is no equality test. The search will proceed to the right half whenever the key is compared to an identical array element, eventually terminating at the right boundary. Repeating the search after reversing the direction of the binary comparison will lead us to the left boundary. Each search takes O(lgn) time, so we can count the occurrences in logarithmic time regardless of the size of the block.

One-Sided Binary Search

Now suppose we have an array A consisting of a run of 0’s, followed by an unbounded run of 1’s, and would like to identify the exact point of transition between them. Binary search on the array would provide the transition point in lgn tests, if we had a bound non the number of elements in the array. In the absence of such a bound, we can test repeatedly at larger intervals (A[1], A[2], A[4], A[8], A[16],...) until we find a first nonzero value. Now we have a window containing the target and can proceed with binary search. This one-sided binary search finds the transition pointpusing at most 2lgp comparisons, regardless of how large the array actually is.

Square and Other Roots

First, observe that the square root ofn≥1 must be at least 1 and at most n. Let l = 1 and r = n. Consider the midpoint of this interval, m=(l+r)/2. How does m2 compare to n? If n≥m2 , then the square root must be greater than m, so the algorithm repeats with l=m. If n<m2 , then the square root must be less than m, so the algorithm repeats with r=m.

Suppose that we start with values l and r such that f(l)>0 and f(r)<0. If f is a continuous function, there must exist a root between l and r. Depending upon the sign of f(m), where m=(l+r)/2, we can cut this window containing the root in half with each test and stop soon as our estimate becomes sufficiently accurate.

4.10 Divide-and-Conquer

divide-and-conquer recurrences of the form T(n)=aT(n/b)+f(n)

1. If $f(n) = O(n^{log_{b}^{a-\epsilon}})$ for some constant $\epsilon > 0$, then $T(n) = \Theta(n^{log_{b}^a})$.
2. If $f(n) = O(n^{log_{b}^{a}})$, then $T(n) = \Theta(n^{log_{b}^a}lgn)$.
3. If $f(n) = O(n^{log_{b}^{a+\epsilon}})$ for some constant $\epsilon > 0$ and if $af(n/b) \leq cf(n)$ for some $c<1$, then $T(n) = \Theta(f(n))$.



The Grinch is given the job of partitioning 2n players into two teams of n players each. Each player has a numerical rating that measures how good he/she is at the game. He seeks to divide the players as unfairly as possible, so as to create the biggest possible talent imbalance between team A and team B. Show how the Grinch can do the job in O(nlogn) time.

用个 O(nlogn)的排序算法对 2n 个队根据实力排序,前 n 个作为一队,后 n 个作为一队。


For each of the following problems, give an algorithm that finds the desired numbers within the given amount of time. To keep your answers brief, feel free to use algorithms from the book as subroutines. For the example,S={6,13,19,3,8}, 19−3 maximizes the difference, while 8−6 minimizes the difference.

(a) Let S be an unsorted array of n integers. Give an algorithm that finds the pair x, y∈S that maximizes|x−y|. Your algorithm must run in O(n) worst-case time.

(b) Let S be a sorted array of n integers. Give an algorithm that finds the pair x, y∈S that maximizes |x−y|. Your algorithm must run in O(1) worst-case time.

(c) Let S be an unsorted array of n integers. Give an algorithm that finds the pair x, y∈S that minimizes |x−y|, for x ≠ y. Your algorithm must run in O(nlogn) worst-case time.

(d) Let S be a sorted array of n integers. Give an algorithm that finds the pair x, y∈S that minimizes |x−y|, for x ≠ y. Your algorithm must run in O(n) worst-case time.

  • (a) 扫描 S 一次获得最小和最大值.
  • (b) 取首尾数。
  • (c) O(nlogn)的算法排序,扫描排序好的 S,获得最小差的相邻元素对。
  • (d) 扫描排序好的 S,获得最小差的相邻元素对。


Take a sequence of 2n real numbers as input. Design an O(nlogn) algorithm that partitions the numbers intonpairs, with the property that the partition minimizes the maximum sum of a pair. For example, say we are given the numbers (1,3,5,9). The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)). The pair sums for these partitions are (4,14), (6,12), and (10,8). Thus the third partition has 10 as its maximum sum, which is the minimum over the three partitions.

  1. O(nlogn)的算法排序
  2. start = 0;
    end = 2n - 1;
    while (start < end) {
      pair(S[star], S[end]);


Assume that we are given n pairs of items as input, where the first item is a and the second item is one of three colors (red, blue, or yellow). Further assume that the items are sorted by number. Give an O(n) algorithm to sort the items by color (all reds before all blues before all yellows) such that the numbers for identical colors stay sorted. For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should become (3,red), (9,red), (1,blue), (4,blue), (6,yellow).

  1. 创建 3 个分别存储 red,blue,yellow 的数组;
  2. 扫描 input,依次按颜色装入不同的数组;
  3. 分别从 red,blue,yellow 的数组中输出结果。


The mode of a set of numbers is the number that occurs most frequently in the set. The set (4,6,2,4,3,1) has a mode of 4. Give an efficient and correct algorithm to compute the mode of a set of n numbers.

  • O(nlogn): O(nlogn)排序,扫描 Set 一遍得到频率最大的数。
  • O(n): 使用 hash map 扫描一遍存储数字频率,扫描 hash map 得到频率最大数。


Given two sets S1 and S2 (each of size n), and a number x, describe an O(nlogn) algorithm for finding whether there exists a pair of elements, one from S1 and one from S2, that add up to x. (For partial credit, give a Θ(n2) algorithm for this problem.)

  1. 从 S1 中减去 n,O(nlogn)排序 S1 和 S2,然后能否找出相同的元素(binary search 或扫描比较)。
  2. Sort and Scan
    sort S1 in O(nlogn)
    sort S2 in O(nlogn)
    begin = 0;
    end = n - 1;
    while (begin < n && end >=0) {
              if ((S1[begin] + S2[end]) < X) {
              else if ((S1[begin] + S2[end]) > X) {
              } else {
                  return true;
    return false;
  3. Binary Search
    • O(nlogn)排序 S1
    • X-S2[i]去 binary search 排序好的 S1,是否找到元素。


Outline a reasonable method of solving each of the following problems. Give the order of the worst-case complexity of your methods.

  1. You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.
  2. You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each company.
  3. You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

都使用 Hash Table,O(n)


Given a set of S containing n real numbers, and a real number x. We seek an algorithm to determine whether two elements of S exist whose sum is exactly x.

  • Assume that S is unsorted. Give an O(nlogn) algorithm for the problem.
  • Assume that S is sorted. Give an O(n) algorithm for the problem.

(1): Binary search

sort S in O(nlogn);
for (int i = 0; i < n; ++i) {
        binarysearch S[i] in S[i+1,n]


sort S in O(nlogn);
i = 0;
j = n - 1;
while (i < j) {
  if (s[i] + s[j] < X) {
  } else if (s[i] + s[j] > X) {
  } else {


i = 0;
j = n - 1;
while (i < j) {
  if (s[i] + s[j] < X) {
  } else if (s[i] + s[j] > X) {
  } else {


Give an efficient algorithm to compute the union of sets A and B, where n = max( | A | , | B | ). The output should be an array of distinct elements that form the union of the sets, such that they appear more than once in the union.

  • Assume that A and B are unsorted. Give an O(nlogn) algorithm for the problem.
  • Assume that A and B are sorted. Give an O(n) algorithm for the problem.
  • O(nlogn)对A和B排序,然后用 2 的 O(n)的方法。
  • 若A和B以升序排序
    set U to empty;
    int i = 0;
    int j = 0;
    while (i < na && j < na) {
      if (A[i] < B[j]) {
        add A[i] into U;
      } else (A[i] > B[j]) {
          add B[j] into U;
      else {
         add A[i] into U;
    if (i < na) {
      while (i < na) {
        add A[i] into U;
    if (j < nb) {
      while (j < nb) {
       add B[j] into U;


Given a set S of n integers and an integer T, give an O(nk − 1logn) algorithm to test whether k of the integers in S add up to T.

  1. O(nlogn)对数组排序
  2. (k-1)个数的组合有 nk-1,并计算 k-1 个数的和 sum
  3. 用 binary search 在数组中搜索 T-sum


Design an O(n) algorithm that, given a list of n elements, finds all the elements that appear more than n / 2 times in the list. Then, design an O(n) algorithm that, given a list of n elements, finds all the elements that appear more than n / 4 times.

Hash Table 可以解决。或

Find the elements that appear more than n / 2 times

数组中最多有一个数超过重复 n/2 次,并且排序后的第 ceiling(n/2)个数必定是这个数。

  1. method1
    • 利用BFPRT 以 O(n)的复杂度找到第 ceiling(n/2)个小数;
    • 扫描数组,计数这个数的重复数是否大于 n/2.
  2. method2
    #include <stack>
    using std::stack;
    bool FindMoreThanHalf(int *array, int n, int *res) {
      stack<int> stk;
      int i;
      for (i = 0; i < n; ++i) {
        if (stk.empty()) {
        } else {
          if (stk.top() == array[i]) {
          } else {
      if (stk.empty()) {
        return false;
      int candidate = stk.top();
      int times = 0;
      for (i = 0; i < n; ++i) {
        if (array[i] == candidate) {
      if (times > n / 2) {
        *res = candidate;
        return true;
      return false;

Find the elements that appear more than n / 4 times

  • method1
    1. 利用BFPRT 以 O(n)的复杂度找到中间数,验证中中间数是否重复 n/4(O(n));
    2. 以中间元素划分数组为两部分(O(n));
    3. 在上下半部分 n/2 中重复 n/4 次数的元素,同第一个问题一样找(O(n));
  • method2
    1. 初始 3 个空的槽,想对应的槽的 3 个计数为 0;
    2. 对于数组中每个元素:
      • 若等于其中任何一个槽中数,增加计数;
      • 若有槽空,放入这个槽,并计数为 1;
      • 否则,对所有槽内数的计数减 1
    3. 对槽内剩下的数,扫描一遍数组,计算它们重复次数是否符合要求。


Devise an algorithm for finding the k smallest elements of an unsorted set of n integers in O(n + klogn).

  1. O(n)的复杂度建立一个最小堆;
  2. 连续 k 次取出最小值,最后得到第 k 个最小值。


You wish to store a set of n numbers in either a max-heap or a sorted array. For each application below, state which data structure is better, or if it does not matter. Explain your answers.

  1. Want to find the maximum element quickly.
  2. Want to be able to delete an element quickly.
  3. Want to be able to form the structure quickly.
  4. Want to find the minimum element quickly.
  5. 都开销 O(1)。
  6. 若知道删除的地方,max-heap 花费 O(logn),sorted array 花费 O(n)。若不知道删除的地方,max-heap 花费 O(n)查找,删除花费 O(logn); sorted array binary search 花费 O(logn),删除花费 O(n)。
  7. max-heap 花费 O(n);sorted array 花费 O(logn)。
  8. max-heap 花费 O(n);sorted array 花费 O(1)。


Give an O(nlogk)-time algorithm that merges k sorted lists with a total of n elements into one sorted list. (Hint: use a heap to speed up the elementary O(kn)-time algorithm).

  1. 扫描 k 组 sorted lists 组成一个大小 k 的 min-heap;
  2. 从 min-heap 中取出最小值放入结果 list。


(a) Give an efficient algorithm to find the second-largest key among n keys. You can do better than 2n − 3 comparisons. (b) Then, give an efficient algorithm to find the third-largest key among n keys. How many key comparisons does your algorithm do in the worst case? Must your algorithm determine which key is largest and second-largest in the process?

  • 找第二大元素:大小为 2 个的数组初始化为第一二个元素,之后每个元素与这数组对比,剔除最小的,最后数组内 2 个元组对比得到最大和第二大元素,一共比较 2(n-2)+1=2n-3,找出第二大元素。
  • 找第三大元素:同样已大小为 3 的数组,最后比较数 3(n-3)+2=3n-7。

Random Selection可以找出任意的第几大值,平均时间复杂度:O(n),比较次数将是 n 的倍数,最坏时间复杂度可以达到:O(nlogn)。

Tournament Algorithm找第二大元素比较次数 O(n+logn);找第 k 个最大元素,比较次数为 O(n+klogn)。


Use the partitioning idea of quicksort to give an algorithm that finds the median element of an array of n integers in expected O(n) time. (Hint: must you look at both sides of the partition?)

unsigned int seed = time(NULL);
int randint(int m, int n) {
  return m + rand_r(&seed) / (RAND_MAX / (n + 1 - m) + 1);

void RandomSelectionK(int *array, int l, int u, int k) {
  if (l >= u) {
  swap(array[l], array[randint(l, u)]);
  int pivot = array[l];
  int i = l;
  int j = u + 1;
  while (true) {
    do {
    } while (i <= u && array[i] < pivot);
    do {
    } while (array[j] > pivot);
    if (i > j) {
    swap(array[i], array[j]);
  swap(array[l], array[j]);
  if (j < k) {
    RandomSelectionK(array, j + 1, u, k);
  } else if (j > k) {
    RandomSelectionK(array, l, j - 1, k);


The median of a set of n values is the $\lceil n/2 \rceil$ th smallest value.
1. Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case?
2. Suppose quicksort were always to pivot on the $\lceil n/3 \rceil$ th smallest value of the current sub-array. How many comparisons would be made then in the worst case?

f(n) = 2*f(n/2) + n ==> f(n) = 2k * f(n/2k) + kn = (n+2)logn f(n) = f(n/3) + f(2n/3) + n ==> f(n) = O(nlogn)


Suppose an array A consists of n elements, each of which is red, white, or blue. We seek to sort the elements so that all the reds come before all the whites, which come before all the blues The only operation permitted on the keys are

  • Examine(A,i) – report the color of the ith element of A.
  • Swap(A,i,j) – swap the ith element of A with the jth element.

Find a correct and efficient algorithm for red-white-blue sorting. There is a linear-time solution.

2 次扫描。

  • 第一次:把 red 和 white 当成一样,用 quick 的 partition 分开与 blue。
  • 第二次:只区分 red 和 white 的子区间。


Stable sorting algorithms leave equal-key items in the same relative order as in the original permutation. Explain what must be done to ensure that mergesort is a stable sorting algorithm.

在合并时元素相等时选 index 小的元素在前。


Show that n positive integers in the range 1 to k can be sorted in O(nlogk) time. The interesting case is when k < < n.

We seek to sort a sequence S of n integers with many duplications, such that the number of distinct integers in S is O(logn). Give an O(nloglogn) worst-case time algorithm to sort such sequences.

balanced binary search tree.


Let A[1..n] be an array such that the first \(n-\sqrt n\) elements are already sorted (though we know nothing about the remaining elements). Give an algorithm that sorts A in substantially better than nlogn steps.

+ $O(\sqrt{n}log(\sqrt{n})$ 排序后面的 $\sqrt{n}$ 个元素。
+ O(n)去 mergesort 前半部分和后半部分。


Assume that the array A[1..n] only has numbers from \(\{1,\ldots, n^2\}\) but that at most loglogn of these numbers ever appear. Devise an algorithm that sorts A in substantially less than O(nlogn).

和 23 一样,用 balanced binary search tree,树的高度不超过 loglogn,最后的复杂度 O(n*logloglogn)。


Let P be a simple, but not necessarily convex, polygon and q an arbitrary point not necessarily in P. Design an efficient algorithm to find a line segment originating from q that intersects the maximum number of edges of P. In other words, if standing at point q, in what direction should you aim a gun so the bullet will go through the largest number of walls. A bullet through a vertex of P gets credit for only one wall. An O(nlogn) algorithm is possible.

  1. 以q为中心点,顺时针旋转,P中所有边随着顺时针旋转都有一个起始点 (head)和结束点(end),计算它们的极角(polar angle); O(n)
  2. 对所有 head 和 end 按照 angle 大小排序,若相等,head 在前; O(nlogn)
  3. 扫描这个排序好的队列,遇到 head 加 1,遇到 end 减 1,最后算出这个区间的最大值。O(n)


A company database consists of 10,000 sorted names, 40% of whom are known as good customers and who together account for 60% of the accesses to the database. There are two data structure options to consider for representing the database:

  1. Put all the names in a single array and use binary search.
  2. Put the good customers in one array and the rest of them in a second array.

Only if we do not find the query name on a binary search of the first array do we do a binary search of the second array. Demonstrate which option gives better expected performance. Does this change if linear search on an unsorted array is used instead of binary search for both options?

  • single array: log10000=4
  • two array: 0.6*log4000+0.4*(log4000+log6000) = 5.11

single array is better.

  • single array: 10000
  • two array: 0.6*4000+0.4*6000 = 6400

two array is better.


Suppose you are given an array A of n sorted numbers that has been circularly shifted k positions to the right. For example, {35,42,5,15,27,29} is a sorted array that has been circularly shifted k = 2 positions, while {27,29,35,42,5,15} has been shifted k = 4 positions.

  1. Suppose you know what k is. Give an O(1) algorithm to find the largest number in A.
  2. Suppose you do not know what k is. Give an O(lgn) algorithm to find the largest number in A. For partial credit, you may give an O(n) algorithm.
if (k == 0) {
  return A[n-1];
} else {
  return A[k-1];
int FindLargestNumber(int *array, int l, int h) {
  if (array[l] < array[h]) {
    return array[h];
  if (l == h) {
    return array[h];
  int mid;
  mid = (l + h) / 2;
  if ((mid + 1 <= h) && array[mid] > array[mid + 1]) {
    return array[mid];
  if ((mid - 1 >= l) && array[mid - 1] > array[mid]) {
    return array[mid - 1];
  if (array[mid] < array[h]) {
    return FindLargestNumber(array, l, mid - 1);
  } else {
    return FindLargestNumber(array, mid + 1, h);


Consider the numerical 20 Questions game. In this game, Player 1 thinks of a number in the range 1 to n. Player 2 has to figure out this number by asking the fewest number of true/false questions. Assume that nobody cheats.

  1. What is an optimal strategy if n is known?
  2. What is a good strategy if n is not known?
  1. binary search.
  2. 2i随机选一个 i,若数小,增加到 2i+1,若大就二分搜索。


Suppose that you are given a sorted sequence of distinct integers . Give an O(lgn) algorithm to determine whether there exists an i index such as ai = i. For example, in { − 10, − 3,3,5,7}, a3 = 3. In {2,3,4,5,6,7}, there is no such i.

bool CheckEqualIndex(int *array, int l, int h) {
  while (l <= h) {
    int mid = (l + h) / 2;
    if (array[mid] > (mid + 1)) {
      h = mid - 1;
    } else if (array[mid] < (mid + 1)) {
      l = mid + 1;
    } else {
      return true;
  return false;


Suppose that you are given a sorted sequence of distinct integers , drawn from 1 to m where n < m. Give an O(lgn) algorithm to find an integer that is not present in a. For full credit, find the smallest such integer.

int FindMissingElement(int *array, int l, int h) {
  while (l <= h) {
    int mid = (l + h) / 2;
    if (array[mid] > (mid + 1)) {
      h = mid - 1;
    } else if (array[mid] <= (mid + 1)) {
      l = mid + 1;
  return l + 1;


Let M be an n*m integer matrix in which the entries of each row are sorted in increasing order (from left to right) and the entries in each column are in increasing order (from top to bottom). Give an efficient algorithm to find the position of an integer x in M, or to determine that x is not there. How many comparisons of x with matrix entries does your algorithm use in worst case?


bool FindElement(int **array, int x, int n, int m, int *pos_x, int *pos_y) {
  int row = 0, col = m - 1;
  while (row < n && col >= 0) {
    if (array[row][col] == x) {
      *pos_x = row;
      *pos_y = col;
      return true;
    } else if (array[row][col] > x) {
    } else {
  return true;


Consider an n*n array A containing integer elements (positive, negative, and zero). Assume that the elements in each row of A are in strictly increasing order, and the elements of each column of A are in strictly decreasing order. (Hence there cannot be two zeroes in the same row or the same column.) Describe an efficient algorithm that counts the number of occurrences of the element 0 in A. Analyze its running time.

int CountZero(int **array, int n) {
  int row = n - 1, col = n - 1;
  int count = 0;
  while (row >=0 && col >= 0) {
    if (array[row][col] == 0) {
    } else if(array[row][col] > 0) {
    } else {
  return count;


If you are given a million integers to sort, what algorithm would you use to sort them? How much time and memory would that consume?

  1. 一个整数4字节,109*4=4G,需要 4G 的内存,可以用快排等 O(nlogn)的排序算法.
  2. 用 bitmap,需要 109/8=128M 的内存.
  3. 若内存有限,就 external merge sort,利用外部存储多进行几次.


Describe advantages and disadvantages of the most popular sorting algorithms.

Merge sort:

  • 优点:适合链表,适合外排.
  • 缺点:需要多余的内存来保存合并的数据.

Insertion/Selection sort:

  • 优点:简单实现.
  • 缺点:太慢,当数据很大时,运行不实际.

Heap sort:

  • 优点:不需要递归,适合大数据.
  • 缺点:时常慢于 merge sort 和 quick sort.

Quick sort:

  • 优点:很快.
  • 缺点:递归,最坏情况比较慢.


Implement an algorithm that takes an input array and returns only the unique elements in it.



You have a computer with only 2Mb of main memory. How do you use it to sort a large file of 500 Mb that is on disk?

利用external merge sort.


Design a stack that supports push, pop, and retrieving the minimum element in constant time. Can you do this?

只有一个 stack 办不到.如果两个 stack,可以利用另外一个 stack 存储最小值.


Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—i.e., the snippet with smallest number of words in it. You are given the index positions where these words occur in the document, such as word1: (1, 4, 5), word2: (3, 9, 10), and word3: (2, 6, 15). Each of the lists are in sorted order, as above.

  1. 选取每个字母 index 的首个元素作为起始方案。
  2. 如何改进它的长度:a.增加最小的位置,b.减小最大的位置,这里只能增加最小的位置。
  3. 用 heap 来保存位置,每次取出最小的位置为 O(logk).

复杂度:O(nlogk),n 是所有字母的位置个数,k 是字母个数。这里 k=3,所以 O(n)

#include <queue>
using std::priority_queue;
#include <utility>
using std::make_pair;
using std::pair;
#include <vector>
using std::vector;
#include <algorithm>
using std::max;
using std::min;
#include <limits>
using std::numeric_limits;

int FindSmallestSnippet(vector<vector<int> > &index_positions) {
  // max-priority, select smallest position, use -index_positions[i][j], (i,j)
  priority_queue<pair<int, pair<int ,int> > > queue;
  int max_pos = 0; // the max pos of  the snippet
  int i;
  for (i = 0; i < index_positions.size(); ++i) {
    int pos = index_positions[i][0];
    max_pos = max(max_pos, pos);
    queue.push(make_pair(-pos, make_pair(i, 0)));
  int smallest_len = numeric_limits<int>::max();
  while (queue.size() == index_positions.size()) {
    int min_pos = -queue.top().first;
    smallest_len = min(smallest_len, max_pos - min_pos + 1);
    int word_pos = queue.top().second.first;
    int index = queue.top().second.second;
    if (index < index_positions[word_pos].size()) {
      int next_pos = index_positions[word_pos][index];
      max_pos = max(max_pos, next_pos);
      queue.push(make_pair(-next_pos, make_pair(word_pos, index)));
  return smallest_len;


You are given 12 coins. One of them is heavier or lighter than the rest. Identify this coin in just three weighings.

  1. 分 3 组,每组 4 个,其中两组称重,若相等,重的在第三组.若不等,重的在重的那一组.
  2. 重的那组分 4 组,每组 1 个,第一组和第二组称重,谁重就重的那个.
  3. 若 step2 相等,剩下第三组和第四组称重,谁重就重的那个.