Empirical analysis algorithm example


Empirical analysis. If the running time of our program (approximately) obeys a power law T(n) ~ an b , we can use a doubling hypothesis to estimate the coefficients a and b. Tilde notation. We say that f(n) ~ g(n) if f(n)/g(n) converges to 1 as n gets large. This is a general concept about mathematical functions and is not restricted to running time, memory, or any other specific domain. Cost model. For theoretical analyses of running time in COS 226, we will assume a cost model, namely that some particular operation (or operations) dominates the running time of a program. Then, we express the running time in terms of the total number of that operation as a function of the input size. To simplify things, we usually give this frequency count in tilde notation. Order of growth. If we have two functions f(n) and g(n), and f(n) ~ c g(n) for some constant c > 0, we say the order of growth of f(n) is g(n). Typically g(n) is one of the following functions: 1, log n, n, n log n, n 2 , n 3 , or 2 n . Worst-case order of growth isn't everything. Just because one algorithm has a better order of growth than other does not mean that it is faster in practice. We will encounter some notable counterexamples, including quicksort vs. mergesort. Memory analysis. Know how to calculate the memory utilization of a class with the 64-bit memory model from the textbook. Theoretical and empirical analysis. Hypotheses generated through theoretical analysis (or guesswork like our power law assumption) should be validated with data before being fully trusted.

Recommended Problems

C level

  1. Textbook 1.4.4
  2. Fall 2011 Midterm, #2

B level

  1. Textbook 1.4.5
  2. Spring 2012 Midterm, #1
  3. For each of the functions shown, give the best order of growth of the running time.

(b) For each recurrence relation, what is the running time for each T(N) (use tilde notation)? Answers

Algorithm 1: c N Algorithm 2: c log N Algorithm 3: c N log N
d. N3 log N

A level

  1. The code below operates on bacterial genomes of approximately 1 megabyte in size.
int N = Integer.parseInt(args[0]); String[] genomes = new String[N]; for (int i = 0; i < N; i++) < In gfile = new In("genomeFile" + i + ".txt"); genomes[i] = gfile.readString(); >for (int i = 1; i < N; i++) < for (int j = i; j >0; j--) < if (genomes[j-1].length() >genomes[j].length()) exch(genomes, j-1, j); else break; > >
  1. What is the theoretical order of growth of the worst case running time as a function of N?
  2. A table of runtimes for the program above is given below. Approximate the empirical run time in tilde notation as a function of N. Do not leave your answer in terms of logarithms.
N Time (s) 1 0.15 2 0.14 4 0.19 8 0.41 16 0.85 32 1.66 64 3.38
  1. N 2 . Reading in the genomes is linear time. The for loops are just insertion sort, which is N 2 in the worst-case (where the break statement never occurs).
  2. 0.05N (ok if left in terms of a fraction)
  3. Multiple acceptable answers:
    1. The input may not be a worstcase input (e.g. already sorted)
    2. The time to read in a 64 megabytes of genomes is much larger than the time to perform 64 2 string length compares and 64 2 swaps of 8 byte references.
    3. Partial credit: N is too small / time is too short