CIS307 - Homework 5: A Replicated, Consistent Store

Assignment given: December 2, 2008
Due Date: December 14, by 10pm

You can implement this program in C, or Java, or Python, or whatever you prefer.

We want to replicate a "store". This is normally done so that

  1. we distribute the load on more than one site
  2. the distance from client to local store is reduced,
  3. we have increased reliability.
For simplicity we consider only writes to the stores, never reads.

We need to execute the write requests in the same order at all the stores otherwise the results could be different.

We will assume that the store is a single file store.txt, say, containing initially the characters 0123456789 repeated 100 times. This file is replicated at 3 sites. Clients perform at random write operations consisting of choosing at random a value x equal to 0,1,..,9 and then writing starting at position x*100 the character x repeated 100 times.

Each copy of the file is managed by a process called Agent. One of these agents is designated as a Primary [and we assume clients and other agents know which it is].

A write operation goes as follows:

  1. The client sends the write requests to all agents
  2. The agents cache the request and inform the client that they are ready to do the operation
  3. The client send the write request to the primary
  4. The primary
    1. assigns a sequential number to the operation
    2. carries it out locally in the right order
    3. sends the authorization to write and in which order to the other agents
  5. The agents receive the authorization and carry out operation in right order
  6. Agents inform primary of success/failure of the operation
  7. The primary informs the client of success or failure listing the agents that succeeded.

A client process executes a loop

    loop
	generate a write request at random
	print value of x [0,1,..,9]
	execute the write
	display the result (success/failure)
	delay
    forever
A few client processes should run at the same time.

An agent with very small probability (say ~1%) will print a termination message and terminate without completing a write request. As you can imagine, agents start as being 3, then as they terminate, they become fewer, but the program should still work. The clients should contact only the agents that are active. The program terminates when all agents terminate, at that time all stores should have the same content.

In a more realistic program we would worry about the primary failing and what to do when agents fail.

Network communications are in TCP. Choose ports and timeouts as you think best.

Submit your homework to the TA including a good README file explaining your program..