CIS 307: Introduction to Distributed Systems, Middleware and Client-Server and Peer-to-Peer Systems

[SISD, ..], [Distributed Systems], [Middleware], [Client-Servers], [Daemons], [Peer-to-Peer], [Interactions]

SISD, SIMD, MISD, MIMD

A classification often used for computer systems's hardware is due to Flynn:

SISD: Single Instruction, Single Data: it is the traditional computer, where a single instruction is executed at a time on scalar individual values.
SIMD: Single Instruction, Multiple Data: a Vector Machine. It can, for example add in parallel the corresponding elements of two vectors.
MISD: Multiple Instructions, Single Data: Not a popular choice!
MIMD: Multiple Instructions, Multiple Data, i.e. the concurrently executing instructions are each with their own arguments. MIMD are distinguished by some into Multiprocessor Systems in the case that they have processors that share memory, and Multi-Computers, i.e. systems that consist of computers that communicate through switches and not shared memory.
Another terminology used to characterize these systems is:
- Shared-Nothing, where each processor has its own private memory and disks and communicates with other processors only thorugh messages.
- Shared-Disks, where each processor has its own memory, but all processors share the same disks.
- Shared-Global, where each processor has some private memory plus some globally shared memory. And:
- Shared, where all memory is shared by all processes. [These systems are often called Tightly-Coupled.]
Clearly there is a limit to how much memory sharing can take place. The maximum speed of signals in practice is 200,000 kilometers per second, i.e. 20 centimeters (less than a foot) per nanosecond. Thus computers that share memory must be fairly close to each other.
An intermediate position, becoming popular, is to have Clusters of processors communicating by messages with other clusters.
Finally you may hear of UMA systems where you have Uniform Memory Access and of NUMA systems, where you have Non-Uniform Memory Access.

We do not consider these systems to be Distributed Systems (well, sharing-nothing clusters could be considered distributed systems). A distributed system [for us] consists of computer systems that are connected by networks. Those other systems are parallel computers. [Of course, there are other kinds of "things" that exhibit parallelism: think of superscalar computers (for instance Pentium). Here you have a single "big" instruction containing "little" regular instructions that are executed simultaneously. And think of pipelining where instructions are executed as a sequence of stages, with stages of different instructions executed concurrently in different parts of the pipeline. ]

Distributed Systems

Distributed systems [i.e. applications and systems that involve intercommunicating software that runs at a number of computer systems] are intended to meet a number of requirements such as:

Openness

Middleware

Distributed systems are developed on top of existing networking and operating systems software. They are not easy to build and maintain. To simplify their development and maintenance a new layer of software, called Middleware, is being developed, following agreed standards and protocols, to provide standard services such as naming (association of entities and identifiers, directory services (maintaining the association of attributes to entities) persistence, concurrency control, event distribution, authorization, security. Examples of such services that you may be familiar with are the Distributed Name Service (DNS) with associated protocol, the Internet Network Time service with the Network Time Protocol (NTP), transaction servers, etc. The following diagram indicates the architecture of distributed systems. Notice that the box we have called networking is itself decomposed into layers as we will see when we discuss networking issues.

Client-Server Architecture

We receive snail mail thanks to the post office service. But to us the service is provided by a specific mailman, a server. As users, our interest is in services, to identify what services are available and where are the servers that support each service. So usually we need an "Identification/Discovery or Naming Service" that will provide us with the identities of the available services, their properties, and their servers [and we need to know at least one server of the "identification/discovery service"]. The problem of characterizing services, in terms of who they are, what they do, and where they are accessible is of great current interest. A number of efforts address this problem [Here is a list of advanced activities in this area. Two more traditional approaches to the problem are UDDI, WSDL. Directory/naming services, such as LDAP, often are at the basis of identification/discovery services.]
In this section we discuss various aspects of servers and their implementation in Unix.

A typical way in which people organize distributed applications is represented by the Client-Server Model. In this model a program at some node acts as a Server which provides some Service, things like a Name Service, a File Service, a Data Base Service, etc. The Client sends requests to the Server and receives replies from it. The Server can be Stateful or Stateless, i.e. may remember information across different requests from the same client. The Server can be local, i.e. on the same system as its clients, or remote, i.e. accessed over a network. The Server can be iterative or concurrent. In the former case it processes one request at a time, in the latter it can service a number of requests at the same time, for example by forking a separate process (or creating a separate thread) to handle each requests. In a concurrent server, the processes or threads that handle the individual requests are called slaves (or workers) of the server. These slaves may be created "on demand" when a request is received, and deleted when the request has been handled. Or they may be "preallocated" into a pool of slaves when the server is started, available to handle future requests. And the size of this pool can be made adaptive to the load on the server. [Other servers are called multiplexed; they are concurrent without using forks or threads, using instead the "select" function to respond to multiple connected sockets.] [Pai in the FLASH web server, uses a different terminology for concurrent servers. He talks of Multi-Process (MP) architecture, Multi-Threaded (MT) architecture, and of Single-Process-Event-Driven (SPED) architecture. He then introduces a variation of the SPED architecture. This reference is very worth while for its use of all sorts of clever OS mechanisms for improving the performance of a server.]

Also the Client could be iterative or concurrent, depending if it can make requests to different servers sequentially or concurrently.

Often the interaction from the client to the server is using Remote Procedure Calls (RPC), or using Remote Method Invocation (RMI), or using CORBA, SOAP, web services,... Note that the way we interact with an object, by calling one of its methods, is a typical client-server interaction, where the object is the server. Thus distributed objects fit within the client-server architecture. An objects, at different times, may play the role of server or the role of client. In fact an object can act at one time as a server (responding to a request) and as a client (requesting service from a third object as part of implementing its own service). [Care must be taken to avoid loops in server requests: P asks a service from Q that asks a service from R that asks a service from P. Such loops result in deadlocks.]

In the Client-Server Architecture the initiative lays with the client. The server is always there at a known endpoint [usually an IPaddress+port] waiting for requests, it only responds to requests [the requests are pulled from the server as the web server responding to a browser's request], it never initiates an interaction with a client [though in some situation, say for a wire-service, the client may remain connected to the server and this server will push content to the client at regular intervals; also, in email, the mail server pushes the available mail message to the mail clients, and email clients push to mail servers new messages]. A Peer-to-Peer Architecture instead emphasizes that the interaction initiative may lay with any object and all objects may be fairly equivalent in terms of resources and responsibilities. In the client-server architecture usually most of the resources and responsibilities lay with the servers. The clients (if Thin Clients) may just be responsible for the Presentation Layer to the user and for network communications.

A few more words about Stateful vs Stateless servers. A stateless server will tend to be more reliable since, if it crashes, the next interaction will be accepted directly, not rejected , since it is not part of an interruped session. Also, the stateless server will require less resources in the server since no state needs to be preserved across multiple requests. Hence the stateless server is both more reliable and more scalable than a stateful server. A stateful server, in turn, is usually with higher performance since the state information can be used to speed up processing of requests.

A further distinction when talking of the "state" of a server, is between Hard-state and Soft-state. Hard-state is true state, in the sense that if it is lost because of a crash, we are lost: the server interaction has to be fully re-initialized. Instead soft-state if lost it can be reconstructed. In other words, it is a Hint that improves the performance of a server when it is available. But, if lost, the state can be recomputed. For example, the NFS server could maintain for each recent request information on user, cursor position, etc. Then if another request arrives from that user for that file, this state information can be used to speed up access. If this 'state" information is lost or becomes stale, then it can be reconstructed because an NSF request carries all the information required to satisfy it. Note that a server with soft-state is really just a stateless server.

The stateful/stateless distinction is used also when talking of protocols used in the interaction client/server. So we talk of Stateful or Stateless Protocols. HTTP, the protocol used for www interactions, is stateless, and so is the web server. Also the Network File System (NFS) protocol is stateless. It deals with accessing files made available by a server on a network. All requests are made as if no Open file command was available, thus the requests must explicitly state in the request the name of the file and the position of the cursor. Instead the File Transfer Protocol (FTP) is stateful since a session is established and requests made in context.

Though HTTP is stateless, it can be made stateful by having the clients preserve the state information and sending it to the server at the time requests are made. This is the role played by cookies. When the client sends a request, it can provide the state information (the cookie) with the Cookie header. When the server responds, it places in the header, with the Set-Cookie header, the state, appropriately updated [For a dark view of the role of cookies, see CookieCentral. Part of the danger is that where normally the cookie is specific to the interaction between a specific client and a specific server, first-party cookie, it is also possible to have third-party cookies, that is cookies that collect information across multiple servers and give it to a hidden server.]. The fact that the state is kept in the client makes the server more scalable since no space has to be devoted to the clients's state. Also, if the server crashes, then the client's state is not affected. For a technical discussion of cookies and their use in state management see rfc2109 (in particular, examine the two examples of client-server interaction). Remember, cookies are manipulated at the server but stored at the client.

When servers require the power of many computer systems two architectures are common.
One architecture uses vertical distribution, arranging computers on a number of tiers, usually three. This is the 3-tier architecture: The tier where user interactions take place, the presentation tier; The tier where user requests are received and analysed, the application tier; The tier where data is preseved and manipulated, the data tier. Upon receiving a user request appropriate interactions take place between the application tier and the data tier. They result in the development within the application tier of the response to be sent to the presentation tier. [As an example the user at the presentation tier may request the best fare for a trip from a specified location to another within a range of dates. The application tier will do the leg work of anayzing data base information and give the answer to the presentation tier. This functionality of the application tier has been tapped for the direct user use. In general it would be nice to make that same functionality available as a web service to generic programs. That requires all sorts of standards and protocols to describe the service made available and how to request such service (and who has the right to do so).]
The other architecture uses horizontal distribution between a number of equivalent, fully functional servers. User requests are routed, usually in round-robin fashion, to one of the available servers. This architecture is often used by heavily loaded web servers.

Daemons

If I check what are the program running on my Unix system (joda.cis.temple.edu) I see a number of programs [see their definitions using the man pages]:

kloadsrv Kernel module load server

xntpd Network Time Protocol daemon

cron The system clock daemon

nfsiod Local NFS compatible asynchronous I/O daemon

proplistd Daemon that services remote NFS compatible property list requests

nfsd Remote NFS compatible server

mountd Daemon that services remote NFS compatible mount requests

rpc.lockd Network lock daemon

rpc.statd Network status monitor daemon

auditd Audit daemon

httpd HTTP server

inetd Internet super-server

telnetd The DARPA telnet protocol server

binlogd Binary event-log daemon

portmap DARPA port to RPC program number mapper

advfsd Daemon that obtains system information for the AdvFS GUI

lpd Line printer daemon

syslogd Daemon that logs system messages

sendmail Mail SMTP server

xdm X Display Manager

imapd Daemon for the Internet Mail Access Protocol (IMAP)

sshd Secure shell daemon

All these programs are special kinds of servers called daemons. A Daemon is a server that operates as a background job [with a number of restrictions on the way it uses resources], it is just a server which has been programmed more carefully so as to be more reliable and to run in the background. As the list above indicates, a substantial portion of the system's functionality is provided as a daemon.

Comer and D.Stevens discuss at length how to build a server/daemon in a Unix environment. Here we paraphrase thoughts (and full paragraphs, and code) from their Chapter 30: "Practical Hints and Techniques for Unix Servers". Some thoughts are from R.Stevens. The discussion will be demonstrated with a server that implements in the Unix Domain an Echo service. We provide also a simple client.

Here are the Comer-D.Stevens recommendations [for each we have a pointer to its expression in the HTML version of the server's code]:

Operate the server as a background processes with as parent the init process [code]
We just fork the current process, terminate the parent. Continue with the child which assumes init (pid = 1) as parent. We like init as parent because init prevents its children from becoming zombies. If, given a normal program moo, we execute with "moo &", then the resulting job is in the background, but it can easily be brought to the foreground with the "fg" command.
Close all open descriptors of the parent.[code]
This is done to save system resources.
Detach from controlling terminal.[code]
This is done so that terminal signals are not directed to the server. It is achieved by becoming session leader, which detaches the controlling terminal.
Move to a Known Safe Directory.[code]
It is done so that we know where to look for core images and so that we do not keep active without reason the current directory. [If you have root privileges, you may want to use the "chroot" command. That selects for the current process and its descendants a new root directory. Say that we chroot to "/home/gpi/", then the current process and its descendants will have "/" as equivalent to "/home/gpi/" and will be restricted to the files in the subtree starting at /home/gpi/". Great for security.]
Put the server in a new Process Group.[code]
This is done so that the server does not receive signals directed to its parent.
Reset the standard files to /dev/null.[code]
In this way, if we use functions that use standard files, they can be used without effect.
Make sure that only one copy of the server is running.[code]
This is so otherwise the multiple copies interfere with each other. [We use locks on file "/tmp/unixechoserver.pid".]
Save the pid of the server in a standard file.[code]
This is done so that we can easily send signals to the server, or terminate it. We store the pid in "/tmp/unixechoserver.pid". Then we can terminate the server with
```
	kill -9 `cat /tmp/unixechoserver.pid`
```
Set the umask.[code]
In this way the files created by the server will have the restrictions specified by the umask.
Ignore extraneous signals.[code]
Use syslog to handle error messages.[code]
In our case, since the standard user has no access to the syslog configuration, we have redirected the standard error file to a log file, /tmp/unixechoserver.log.
Wait for child processes to terminate.[code]
So they will not become zombies.

Peer-to-Peer Architecture

Let's start with two examples of peer-to-peer (P2P) systems.
Napster is a system for sharing files, usually audio files, between different systems. These systems are peers of each other in that any of them may request a file hosted by another system. The system has a central service used for the discovery of desired files, i.e. for helping a peer identify peers where are available desired files.
Instant Messaging is the capability of a system to send a message to another currently running system. Again a central service may be used to determine which systems are currently present as potential targets for delivery of an instant message.
Notice that in both cases we have systems, peers, that have all similar capabilities and can provide services to, or interact with, each other. And we have a central service that helps the peers find each other. This central element [which is available in the form of traditional server] improves the performance of the system as a whole, but it is not strictly necessary since the peers could find each other through a search started from neighbors and propagated as an oil spot (Gnutella and Freenet for instance do not require a central service).

Now we have a general idea of what are peer-to-peer systems. They seem to use two basic ideas: a service to determine which user agents [this means any entity intended to be able to exchange messages] are currently online = a presence service; and a service for delivering immediately a message to a currently active user agent = an instant message service.

Interactions

We have seen two important distictions on the interactions taking place between two interacting systems

stateful vs stateless, and
push vs pull.

Here some other important distinctions

synchronous vs asynchronous: whether the communicating entities must be both present and active (as when we talk on the phone), or they do not be (as with snail-mail, or on the phone with answering machine)
time-assured vs time-insensitive: whether are are or not deadlines on the delivery of messages
best-effort vs reliable: for example, snail-mail is best-effort, i.e. the post service does all it can to deliver the mail, but it may be lost; instead federal express type services make sure that there is delivery receipt
message vs stream, that is the participants send and receive full messages or they send and receive sequences of characters without apriori subdivisions. "message" usually is in a connectionless interaction, where the sender and receiver interact only for the duration of the individual message. "stream" instead is in a connection oriented interaction, where sender and receiver exchange data for a time (from when the connection is set up to when it is torn down).

Finally for interaction there is a famous Robustness Principle enunciated by Jon Postel: "Be liberal in what you accept, and conservative in what you send." , that is, follow all the rules when sending out information, and when you receive it, try to understand it even if the sender did not follow all the rules.

kloadsrv	Kernel module load server
xntpd	Network Time Protocol daemon
cron	The system clock daemon
nfsiod	Local NFS compatible asynchronous I/O daemon
proplistd	Daemon that services remote NFS compatible property list requests
nfsd	Remote NFS compatible server
mountd	Daemon that services remote NFS compatible mount requests
rpc.lockd	Network lock daemon
rpc.statd	Network status monitor daemon
auditd	Audit daemon
httpd	HTTP server
inetd	Internet super-server
telnetd	The DARPA telnet protocol server
binlogd	Binary event-log daemon
portmap	DARPA port to RPC program number mapper
advfsd	Daemon that obtains system information for the AdvFS GUI
lpd	Line printer daemon
syslogd	Daemon that logs system messages
sendmail	Mail SMTP server
xdm	X Display Manager
imapd	Daemon for the Internet Mail Access Protocol (IMAP)
sshd	Secure shell daemon