1. Synchronous leader election in rings

See AttiyaWelch chapter 3 or LynchBook Chapter 3 for details.

Basic ideas:

1.1. Details of LCR

Formally, we'll let the state space for each process i be { non-leader, leader } × ID × { 0, 1 } where the first component says if i is a leader, the second is the highest id that i has seen so far, and the third is 1 if a new id has been received by not sent. We assume that i denotes i's position rather than its id, which we'll write as idi.

Using the notation of LynchBook, the transition rules are:

We also specify the start state starti as (non-leader, i, 1).

1.1.1. Proof of correctness

We now claim that when the protocol terminates (after exactly N rounds), no messages are sent, every process has the leader's id in its second field, and the leader knows it's the leader. Since this algorithm is pretty simple, we can do so formally by writing down an induction hypothesis that describes the state stateki of process i after k steps exactly.

Claim: For k < N, stateki = (non-leader, max { idj | j = i-k .. i }, b } where b = 1 if and only if max idj = idi-k.

Proof: By induction on k. The base case is the start state; here for each i we have idi = max { idj | j = i } trivially and everything else works. For higher k, we have

Now consider two cases depending on whether i-1 sends a message at step k or not:

Case 1

i-1 sends a message, which occurs when max { idj | j = i-k .. i-1 } = id(i-1)-(k-1) = idi-k. Then sendi-1(...) = idi-k. By assumption of unique ids, idi-k <> idi so either idi-k > max { idj | j = i-k+1 .. i } the second case of the transition rule holds, and we have stateki = (non-leader, idi-k = max { idj | j = i-k .. i} , 1); or idi-k < max(...), the third case holds, and we have state^ki = (non-leader, max { idj | j = i-k .. i }, 0). In either case the induction hypothesis holds for k and i.

Case 2

i-1 doesn't send a message, which occurs when max { idj | j = i-k .. i-1 } <> idi-k. Then i doesn't update its state, and we have stateki[2] = statek-1i[2] = max(i-k+1..i) = max(i-k..i) and the rest goes through.

[Comment: the rest of the proof would be a lot cleaner if we adopted the convention max(i-k..i) = max { idj | j = i-k .. i } to begin with.]

From the Claim, we know that the state after k = N-1 rounds is one in which statek-1i = (non-leader, max(i-N+1..i), idi+1 = max(i-N+1..i)) for all i. It is not hard to see that this puts max(1..N) on all nodes, and the only node holding an outgoing message is node i-1 where i = max(1..N). It follows that on the next step, i-1 sends i to i, i declares itself leader, and the protocol terminates.

1.1.2. Performance

It's immediate from the correctness proof that the protocols terminates after exactly N+1 rounds.

To count message traffic, observe that each process sends at most 1 message per round, for a total of O(N2) messages. This is a tight bound since if the ids are in decreasing order N N-1 N-2 ... 1, then no messages get eaten until they hit N.

1.2. Details of Hirschbirg and Sinclair

Basically the same as for LCR but both the protocol and the invariant get much messier. To specify the protocol, it may help to think of messages as mobile agents and the state of each process as being of the form (local-state, { agents I'm carrying }). Then the sending rule for a process becomes ship any agents in whatever direction they want to go and the transition rule is accept any incoming agents and update their state in terms of their own internal transition rules. An agent state for LCR will be something like (original-sender, direction, hop-count, max-seen) where direction is R or L depending on which way the agent is going, hop-count is initially 2k when the agent is sent and drops by 1 each time the agent moves, and max-seen is the biggest id of any node the agent has visited. An agent turns around (switches direction) when hop-count reaches 0.

To prove this works, we can mostly ignore the early phases (though we have to show that the max-id node doesn't drop out early, which is not too hard). The last phase involves any surviving node probing all the way around the ring, so it will declare itself leader only when it receives its own agent from the left. That exactly one node does so is immediate from the same argument for LCR.

To prove the message bound, compute the sum of all active agents per phase times messages sent as above.

1.3. Lower bounds

We'll do the easiest case: an Omega(N log N) lower bound on messages for synchronous-start comparison-based algorithms in bidirectional synchronous rings. For full details see LynchBook Section 3.6, AttiyaWelch Section 3.4.2, or the original paper in JACM by Frederickson and Lynch.

Basic ideas:

Now we just need to build a ring with a lot of order-equivalent neighborhoods. For N a power of 2 we can use the bit-reversal ring e.g. 000 100 010 110 001 101 011 111. For N not a power of 2 we look up Frederickson and Lynch or Attiya et al. In either case we get Omega(N/k) order-equivalent members of each equivalence class after k active rounds, giving Omega(N/k) messages per active round, which sums to Omega(N log N).

For non-comparison-based algorithms we can still prove Omega(N log N) messages for time-bounded algorithms, but it requires Ramsey_theory. See AttiyaWelch Section 3.4.2 or LynchBook Section 3.7. Here "time-bounded" means that the running time can't depend on the size of the ID space. The intuition is that for any fixed protocol, if the ID space is larger enough, then (using Ramsey theory) there exists a subset of the ID space where the protocol acts like a comparison-based protocol. So the existence of a O(f(N))-message time-bounded protocol implies the existence of an O(f(N))-message comparison-based protocol, and from the previous lower bound we know f(N) is Omega(N lg N). Note that time-boundedness is necessary: we can't prove the lower bound for non-time-bounded algorithms because of the i*N trick.

2. Synchronous leader election in general networks

Basic assumptions:

Mechanism is the same as in the ring: biggest ID wins. Assuming the network is strongly-connected is essential: otherwise biggest-ID node might not be able to talk to anybody. Diameter bound can be dropped with effort but allows for a simple flooding algorithm as in the ring.

What if we don't know the diameter bound? Then we need much more machinery. Simplest algorithm is to have each node initiate a DistributedBreadthFirstSearch algorithm but give up if it sees a larger ID than its own.

3. Asynchronous leader election in rings

See also LynchBook §15.1. We'll use AsynchronousMessagePassing model based on IOAutomata.

Assumptions:

Requirements: exactly one process eventually executes a leader output action. (With some tinkering we can get non-leaders to announce non-leader, but we'll keep things simple.)

Actions for the leader-election-in-a-unidirectional-ring system:

3.1. A simple algorithm for a unidirectional ring

Here's the code for a simplified version of LCR for an asynchronous system. Note that this is even further stripped down than the AsynchLCR algorithm of LynchBook §15.1.1, since we don't bother building a queue of outgoing IDs, but instead send the largest ID seen so far.

Here is the code for pi:

states
(myid, maxid, unsent, status) initially (myid, myid, true, waiting)
transitions
send(i,i+1,m)
precondition
m = maxid and unsent = true
effect
unsent := false
recv(i,i+1,m)
effect
  • if m > maxid: maxid := m, unsent = true

  • if m = myid and status = waiting: status := winning
  • if m < maxid: no effect

leader(i)
precondition
status = winning effect: status = won
tasks
  • all output actions in one task

The system is the composition of N of these pi together with N universal reliable FIFO channels to connect them; we will refer to the channel from i to i+1 as channel[i]. We now wish to show that it solves leader election assuming all initial IDs are distinct. The intuition is the same as for the original synchronous version of LCR: the max ID happily cruises around the ring until it returns to its source—which then declares itself leader—while all other ID's are eaten.

3.1.1. Correctness proof

The simplest way to express this intuition is by writing some invariants. Below, we let max be the maximum ID and imax be such that myidimax = max.

Max-spread invariant

There is a process i such that maxidj = max for all j in [imax,i], maxidj < max for all j not in [imax,i], and maxid does not appear in channel[j].queue for j not in [imax,i]. Furthermore, at least one of the following holds:

  1. unsenti = true.

  2. channel[i].queue contains max.
  3. statusimax <> waiting.

Eaten-message invariant

If j is in the range [imax,i) then maxidj <> i and channel[j].queue does not contain i.

It is not hard to see that both invariants hold in the initial state (for the max-spread invariant we are looking at case (1) with i=imax). To show that both invariants continue to hold requires a painful and exhaustive case analysis that we will omit here.

That nobody but imax can declare itself leader follows from the eaten-message invariant together with the further observation that statusi = waiting until recv(i-1,i,idi) occurs, which is forbidden by the invariant when i <> imax.

3.1.2. Termination

To show termination, observe that if case (1) of the max-spread invariant holds for some i, eventually i must issue send(i,i+1,max). At this point case (2) holds, and eventually channel[i] must issue recv(i,i+1,max) (possibly after delivering some finite number of messages that were clogging the queue when send(i,i+1,max) occurred). This leads either back to case (1) with i increased by 1, or to case (3), depending on whether max was delivered to some i+1 <> imax or to imax. If case (3) occurs, then we must start with statusimax = winning and it's simply a matter of time before leader(imax) is issued.

3.1.3. Time analysis

For a time analysis, we can repeat the termination argument with explicit delay bounds. The time between case (1) for i and case (2) for i is at most l, where l is the bound on the time between enabling a send action and executing it. The time between case (2) for i and case (1) for i+1 is at most nd, where d is the maximum channel delay, since there are at most N messages clogging up the channel[i] queue (proving this rigorously requires yet another invariant that each id appears either in a single message or as maxid of a single process with unsent = true). Altogether we must wait at most nl + N2d time in the worst case.

However, this analysis gives up too much in assuming that every channel will be full of junk. In fact, we can argue that send(i+r,i+r+1,idi) occurs no later than time r(l+d)+l (if it occurs at all) and recv(i+r,i+r+1,idi) occurs no later than time (r+1)(l+d). The proof is by induction on r: for r = 0, send(i,i,idi) occurs no later than time l, since l is the upper bound on i's task, and if send(i,i,idi) occurs it must be the first message sent by i, so recv(i,i+1,idi) occurs no later than time l+d. For larger r, we have that recv(i+r-1,i+r,idi) occurs no later than time r(l+d), so the next send from i+r either discards idi or sends idi at time no later than r(l+d)+l. Similarly once send(i+r,i+r+1,idi) occurs, we have that any earlier value is delivered by time r(l+d), so idi if still present at this time must be the first in the queue, and so is delivered no later than r(l+d)+l+d = (r+1)(l+d).

3.1.4. Message complexity

Message complexity is similar to LCR, since we can construct an execution that is effectively synchronous and thus works identically to LCR. This gives a worst-case message complexity of O(N2). We can do better.

3.2. Peterson's algorithm for the unidirectional ring

See LynchBook §15.1.3. This gets O(N lg N) message complexity in all executions.

The basic idea (2-way communication version): Start with N candidate leaders. In each of at most lg N asynchronous phases, each candidate probes its nearest neighbors to the left and right; if its ID is larger than the IDs of both neighbors, it survives to the next phase. Non-candidates act as relays passing messages between candidates. As in Hirschberg and Sinclair, the probing operations in each phase take O(N) messages, and at least half of the candidates drop out in each phase. The last surviving candidate wins when it finds that it's its own neighbor.

To make this work in a 1-way ring, we have to simulate 2-way communication by moving the candidates clockwise around the ring to catch up with their unsendable counterclockwise messages. Peterson's algorithm does this with a two-hop approach that is inspired by the 2-way case above; in each phase k, a candidate effectively moves two positions to the right, allowing it to look at the ids of three phase-k candidates before deciding to continue in phase k+1 or not. Here is a very high-level description; it assumes that we can buffer and ignore incoming messages from the later phases until we get to the right phase, and that we can execute sends immediately upon receiving messages. Doing this formally in terms of I/O automata means that we have to build explicit internal buffers into our processes, which we can easily do but won't do here (see LynchBook pp 483-484 to see how to do this the right way.)

Candidate algorithm:

    phase := 0
    current := myid

    while true do
        send probe(phase, current)
        wait for probe(phase, someid)
        uid2 := someid
        send probe(phase, someid)
        wait for probe(phase, someid)
        uid3 := someid

        if uid2 = current:
            I am the leader!
        else if uid2 > current and uid2 > uid3:
            current := uid2
            phase := phase + 1
        else
            switch to relay algorithm

Relay algorithm:

    upon receiving probe(somephase, someid):
        send probe(somephase, someid)

Note: the phase arguments in the probe messages are useless if one has FIFO channels, which is why LynchBook doesn't use them. Note also that the algorithm does not elect the process with the highest ID, but the process that is incubating the sole surviving candidate in the last phase.

Proof of correctness is essentially the same as for the 2-way algorithm. For any pair of adjacent candidates, at most one of their current IDs survives to the next phase. So we get a sole survivor after lg N phases. Each process sends or relays at most 2 messages per phases, so we get at most 2 N lg N total messages.

3.3. A simple randomized O(N lg N)-message algorithm

Run LCR where each id is constructed by prepending a long random bit-string to the real id. This gives uniqueness (since the real id's act as tie-breakers) and something very close to a random permutation on the constructed id's. When we have unique random id's, a simple argument shows that the i-th largest id only propagates an expected N/i hops, giving a total of O(N HN) = O(N log N) hops. Unique random id's occur with high probability provided the range of the random sequence is >> N2.

The downside of this algorithm compared to e.g. Peterson's is that knowledge of N is required to pick random id's from a large enough range. It also has higher bit complexity since Peterson's algorithm is sending only IDs (in the official version) without any random padding.

3.4. Lower bound on message complexity

Short version: reduce to the synchronous case and get Omega(N log N). Longer version: avoid Ramsey theory by exploiting asynchrony, which gives the adversary substantially more power. See LynchBook §15.1.4 if you are interested—I don't think we will do this one in class.


CategoryDistributedComputingNotes

LeaderElection (last edited 2008-01-25 16:03:42 by JamesAspnes)