A randomized algorithm flips coins during its execution to determine what to do next. When considering a randomized algorithm, we usually care about its expected worst-case performance, which is the average amount of time it takes on the worst input of a given size. This average is computed over all the possible outcomes of the coin flips during the execution of the algorithm. See ExpectedValue for more information about computing expectations.

1. Example

Suppose we are promised that a particular value x appears somewhere in an array A, and we want to find it. Let's suppose also that the most expensive part of the computation is examining elements of A (because they are stored on Jupiter, say), and our goal is to minimize the number of elements we look at. Here is a deterministic algorithm for finding x:

Find(A, x):
  for i = 1 to length(A):
    if A[i] = x:
      return i

As in real life, this algorithm finds x in the last place it looks. In the worst case, this is at index n = length(A), requiring it to examine all n elements of the array.

Now here's a randomized version:

Find(A, x):
  let unprobed be an array of n = length(A) elements, initialized to true
  let done = false
  while not done:
    choose index uniformly at random from 1..n
    if unprobed[index] = true:
      let y = A[index]
      unprobed[index] = false
      if y = x:
        done = true
  return index

If the random number generator is badly broken, this algorithm will not even terminate---consider what happens if index is always 1. But under the assumption that each possible value of index is chosen at each step with probability 1/n, we can calculate both how many passes through the loop the algorithm does on average and how many probes it does.

First let's do number of passes through the loop. This is an example of the CouponCollector problem. Let Xi be the number of passes until we get an unprobed index after having previously probed i locations. It's clear that X0 is always 1, but what happens for larger i? Each time we pick a new index, we have probability i/n of getting one we've already used, in which case we have to start again, having made no progress. But the rest of the time we are done (with this step at least). Since it costs 1 probe to do the trial, we have

and we can solve for

The expected total number of iterations is

This is worse than the deterministic algorithm, but we might not mind if it reduces the expected number of (very expensive) probes. So what is that?

By symmetry, every time we do a probe, our chances of picking the unused index that points to x is just one over the number of remaining indices. Let Yi be the expected number of probes when there are i unprobed positions. Then

a recurrence that bottoms out at Y1 = 1. It's not at all obvious what the solution to this recurrence is, so we try forward induction:

i

Yi

1

1

2

1 + (1/2)*1 = 3/2

3

1 + (2/3)*(3/2) = 2

4

1 + (3/4)*(2) = 5/2

5

1 + (4/5)*(5/2) = 3

and at this point we might guess Yi = (i+1)/2. This clearly works for Y1; plugging it into the recurrence gives

as claimed.

Setting i to n we get Yn = (n+1)/2. This is only a constant factor faster than the deterministic algorithm, but if we really have to send five astronauts and a deranged supercomputer to Jupiter each time we fetch an element of A, that constant factor might be worth it.

There's a shorter way to get this result, but it requires slightly more intuition about the process: we are effectively sampling all n locations in a random order. Given this order, x is equally likely to be in the 1st, 2nd, 3rd, ... nth place we look. So the expected number of places we look at is Sigmai=1 to n (1/n) i = (1/n)*(n(n+1)/2) = (n+1)/2, as before.

2. Randomization vs average-case analysis

There is a subtle but important difference between randomization (as done above) and average-case analysis, which is when we consider the expected performance of a deterministic algorithm running on inputs drawn randomly from some distribution (e.g. all random permutations of the input array A). The difference is where the randomness comes from and who controls it. In a randomized algorithm, the randomness is internal, and the input is still assumed to be the worst possible. While it may be that our random-number generator is defective in some way, we can guard against this by choosing a good enough generator (e.g., an appropriate function of the output of a Geiger counter next to a sample of uranium inside a thick lead box). But most inputs are likely to deviate from randomness or at least uniformity in subtle ways---so when we make assumptions about the randomness of the input, we may be very strongly constraining how our algorithm can be used.

3. More examples

See MotwaniAndRaghavan for many more examples.


CategoryAlgorithmNotes

RandomizedAlgorithms (last edited 2007-12-25 23:42:08 by localhost)