CIS307: Data Transmission

Computing takes place mainly in offices, factories, and homes. Local Area Networks (LANs) connect workstations in a building or campus or factory. Soon enough LANs will also connect the equipment in private homes. These islands of communication need to be interconnected: using telephone lines (provided by the common carriers), cable, and satellite communications. In time mobile computing will become commonplace and then also wearable computers will be integrated in the communication network. The information exchanged is data, voice, video. Our interest is mainly in data (digital).

Transmission Media: Copper Wire (Coaxial Cable, Twisted Pair), Glass Fiber, Microwave, Radio, Infrared. These are the media usually used in communication channels. In considering different kinds of media we are concerned, among other things, with the amount of data that can be transmitted through the medium per unit of time, the power required for the transmission, its attenuation rate, the distorsion, the error rate, the cost, and the security.

Simplex/Duplex: If in a channel transmission can take place in only one direction, we say that the transmission is simplex. If the transmission can take place in both directions, but not at the same time, we say that the transmission is half-duplex, and if it can take place at the same time in both directions it is full-duplex.

Multiplexing: We often want to share a communication channel between more than two pairs of communicating entities. This can be done by Time Division Multiplexing (the channel is used in different time slots by different pairs of entities) or by Frequency Division Multiplexing (communicating pairs use different carriers that can propagate simultaneously through the channel). [There is also, among others, Statistical Multiplexing.] We call multiplexor the device that inserts two or more signals into the communication channel, and demultiplexor the device that separates the signals out.

Digital/Analog Signals: The signals we consider are usually electromagnetic waves. They propagate at the speed of light (300,000Km/sec) in the vacuum or slightly slower (200,000Km/sec) in materials. Signals are analog i.e. they are represented as a continuous line usually with a variety of levels. Digital signals are analog signals that can be approximated as having two levels, i.e. they are square waves. Baseband Transmission takes place on a communication medium, usually a local area network, using only one communication channel. The signal is digital and directly inserted in the channel as pulses. For example the Ethernet that we use in our LANs uses baseband technology - a single communication channel is shared by all communication parties. Also our regular telephone service uses baseband communication on the line. Broadband Transmission instead uses analog signals and multiple communication channels share the communication medium, usually modulating various carriers. For example the cable that you may have at home for both television and your computer uses broadband technology - television and computer use different channels in the medium. Similarly if your telephone line supports DSL it uses broadband technology with a channel for normal phone service and channels for the computer communication. A codec (coder/decoder) is a device that transforms analog signals to/from binary signals (for example, phone, where the analog signal representation of voice is transformed for transmission on the high speed digital network into a digital signal). For instance the conversion from analog to binary signal can be done using PCM (Pulse Code Modulation). See below the Nyquist Sampling Theorem. Also, codec is used for a component tha compresses/decompresses data, for example mpeg data.

Fourier Analysis: Any signal can be expressed as the sum of sinusoidal signals. When desirable, we can focus on the individual sinusoidal signal instead of the original signal. We talk of the period of the sinusoidal signal (the time it takes to complete 360 degrees of the signal), of its wavelength (the space covered by the signal during one period), and of its frequency (the number of periods that fit in a second). The period is measured in seconds, the wavelength in meters, and the frequency in Hertz. We have

SignalVelocity = wavelength * frequency

Bandwidth: If we consider the signals that propagate through a physical medium, there will be one with highest frequency and one with lowest frequency. The difference between these two frequencies is the bandwidth of the medium. [Signals with frequencies outside of this range are with negligible power, where power is voltage times intensity.] For example for phone communication we use frequencies between 300Hz and 3300Hz, for a bandwidth of 3000Hz. Normally a telephone channel is allocated 4KHz.

Digital/Analog Data: Analog data is real (continuous) data, digital data is binary data. The conversion between binary data and analog signals is done by modems (modulator/demodulator). Some modems use 4 wires (2 to transmit/modulate and 2 to receive/demodulate) to connect two computers, each computer with its own modem. Most modems are dialup modems. A computer or terminal can use a dialup modem to connect to the phone network. This modem has only two wires and it has equipment for making and receiving phone calls. The two wires are used to transmit and to receive (using different frequencies). Cable modems are used to connect a computer to a TV cable network. Data is transmitted usually at different speeds from the cable to the computer and viceversa. Data rates in the Mbps are possible. Essentially one TV channel (6 MHz) per direction is shared by all the users connected to the cable.

Modulation: Modulation/Demodulation is the conversion between binary data and analog signals. Usually the binary data is used to modify some characteristics of a sinusoidal signal, the carrier. We can represent the binary data by modifying the amplitude of the carrier(Amplitude Modulation), or by modifying its frequency (Frequency Modulation), or its phase (Phase Shift Modulation).

Data Rate: The number of bits transmitted per unit of time through the physical medium (also called throughput). Some examples of data rates:

    Transmission System   	 Data Rate      Comment
    ================================================================
    Telephone Twisted Pair       33.6kbps       4Khz telephone channel
    Cable Modem                  500Kbps        CATV cable - shared
                                 up to 4Mbps  
    ADSL - Twisted Pair          64-640Kbps out     coexists with phone
				 1.536-6.144Mbps in
    Radio LAN in 2.4Ghz band     2Mbps           IEEE802.11 wireless LAN
    Ethernet - Twisted Pair      10Mbps         Few hundred feet
    Fast Ethernet - Twisted Pair 100Mbps        same
    Optical Fiber                2.4Gbps-9.6Gbps Using single wavelength
                                 < 10Tbps        Using multiple wavelengths 
                                                 (DWDM = Dense Wavelength Division 
                                                         Multiplexing)

Propagation Delay: The time it takes a signal to propagate from the sender to the receiver. It is the distance divided by the propagation speed (light).

Transmission Delay, the time it takes to transmit a message. It is the size of the message in bits divided by the data rate (measured in bps) of the channel over which the transmission takes place.

Queueing Delay. At a node a message is received and, if the node is an intermediate node (router), it is scheduled for transmission (Store-and-Forward). The packet may have to wait if there are packets ahead of him. This is the queueing delay. Note that if there are 3 same size packets ahead in the queue, the delay will be 3 * transmission delay.

Notice that we can use the M|M|1 response time formula to estimate the sum of the Transmission plus Queueing delay on a line. We consider the line as a Single Server queueing system with service time equal to the Transmission Delay Tt. We assume that the interarrival time to the line is Ta. Then the Transmission+Queueing delay is:

          Tt              Tt
    --------------- = ---------
    1 - utilization   1 - Tt/Ta
So if the utilization is 80%, then the response time is 5 Tt, i.e. the Queueing delay is 4.2 Tt.

Round-Trip Delay (Round-Trip Time = RTT): Time delay from instant we start a transmission of a message and the instant the acknowledgement for the message is received. It is at least equal to 2*PropagationDelay + TransmissionDelay + Queueingdelay. If there are n intermediate nodes there will be an additional n*(TransmissionDelay+QueueingDelay) delay. Of course there is also a processing delay. We will assume that it is negligible relative to the other delays.

Delay-Throughput Product: It represents the number of bits in transit between the sender and the receiver. It is the product of the propagation delay times the data rate. So in Ethernet if the sender is 200 meters away from the receiver, the propagation delay is 1 microsecond, thus, since data rate is 10Mbps, there are 10 bits in transit: no more at sender, not yet at receiver. It tells us how much data must be transmitted before the receiver starts getting it. If we are communicating coast to coast on a one gigabit channel, then the Delay-Throughput product is, since propagation time is about 20ms, 2.5MB. The Delay-Throughput product has an impact on the size of the buffers used at the transmitter and receiver. The buffer size should be at least as big as the delay-throughput product otherwise some data may get lost (if receiver's buffer is smaller) or the communication medium (the pipe) may become unfilled (if transmitter's buffer is smaller). Another aspect of this product: suppose that data at the receiver arrives too fast and we want to indicate to the sender that it should slow down; then we need a buffer at the receiving end that is at least 2*Delay-Throughput Product (what was in the pipe when the receiver decides, plus what transmitted while signal propagates from receiver to sender).

Bit-Length: The length of a one-bit signal. It can be easily understood with an example. We are in communication channel where the data-rate is 10Mbps. That means that one bit is transmitted in 1/107 seconds (this is the time-to-transmit-one-bit). Since signals propagate in a medium at about 200,000km/s, ie 2*108 m/s, the bit-length will be 10-7 * 2 * 108 meters, that is, 20 meters. In general,

    Bit-Length = SpeedOfLight/DataRate
The larger the bit-length of a channel, the slower it is that channel. The relationship between Delay Throughput product and Bit-length is
    DelayThroughoutProduct = L/BitLength
where L is length of the channel.

Nyquist Sampling Theorem: The maximum data rate D (in bps) of a communication channel (Channel Capacity)with bandwidth B where we can recognize K levels in the signal is
            D = 2 B log2(K)
and if the K levels are encoded using m bits we obtain the channel capacity
            D = 2 B log2(2m) = 2 m B

Another statement of the Sampling Theorem says that we can reconstruct a sinusoidal signal by two samples. Thus if the bandwidth of a signal is B, ie. it has B frequencies, then we can reconstruct the signal from 2*B samples of the signal. The sampling theorem is the foundation for PCM since it tells how to sample the analog signal. For example, to binary encode a voice signal, for example a telephone voice channel (4KHz), by Nyquist we need 8000 samples. We choose [see below about Signal-to-Noise ratios] to recognize 256 amplitude levels (8 bits). Thus a voice channel requires 8000 bytes per second, one byte every 125 microseconds.

In theory, in the absence of noise, according to Nyquist's Theorem it is possible to obtain codes that have as high a data rate as desired. Shannon has also derived a theorem (Channel Coding Theorem) to determine the optimal capacity of a noisy channel as a function of the bandwidth of the channel and of the Signal-to-Noise Ratio. i.e. the ratio between the power of the signal and the power of the noise (power of a signal is proportional to the square of the amplitude of the signal):

    Capacity (in bps) = 1/2 * SamplingRate * log2(1 + Signal-to-Noise-Ratio)
that is, from Nyquist's Sampling Theorem, with SamplingRate = 2*Bandwidth
    Capacity (in bps)  = Bandwidth * log2(1 + Signal-to-Noise-PowerRatio)
Signal-to-Noise ratios are normally measured in decibels.
Given the ratio S/N, its measure in decibels is 10*log10(S/N), where the log is in base 10. Since the log10(2) is about 0.3, 3db measures in decibels the Signal-to-Noise ratio of 2, which is the smallest signal to noise ratio that our ears can detect. Phone companies guaranty a Signal-to-Noise ratio of 48 dbs on phone lines. That means that 10 * log10(Signal-to-Noise-ratio) is 48. Since 48 = 16 * 3 = 16 * 10 * log10(2) = 10 * log10(216) we have that S/N >= 216, and if instead of comparing power we compare amplitudes, the ratio between the amplitude of the signal and the amplitude of the noise is 28 (remember that the power ratio grows like the square of the amplitude ratio). Going back to the Nyquist Theorem, we get that we can recognise safely 256 different levels, just as we expected.

In general, if we want to recognize 2m different levels in the Nyquist's Theorem, we need a channel with a Signal-to-Noise ratio d in decibels of

    d = 10 * log10((2m)2) = 
	10 * log10(22m) = 
	2m * 10 * log10(2) = 6m
Viceversa, if we are given a channel with Bandwidth B and a Signal-to-Noise ratio of d decibels, the data rate D of that channel will be
    D = 2 m B = (d/3) * B 

For example if we want to transmit stereo sound with a bandwidth of 20Khz and a Signal-to-Noise ratio of 96 decibels, we will need 2 monoaural channels each with bandwidth 20Khz and Signal-to-Noise ratio of 96 decibels. 96 decibels require, as we saw above, 96/6 = 16 bits to represent the required levels. Thus the monoaural data rate is ~ (96/3) * 20000 = 640 Kbps and the stereo is 1.28 Mbps. If we were to record this sound on a CD for 70 minutes (the usual CD), we would require a capacity of 2 * 70 * 60 * 640Kb = 2 * 70 * 60 * 80KB = 672MB
Decibels are used in all sorts of situations. You may have heard that a 130 decibel sound will hurt your ears [130 decibels means that the ratio between this sound and the smallest sound that ear can hear is 130db, i.e. 3*43db = 43 * 10 * log10(2) = 10 * log10(243), i.e. a sound that is 243 stronger than the smallest sound we can hear.)
And decibels are used to measure the attenuation of signals as they propagate. For example some light frequencies in an optic fiber attenuates at 0.2 db per kilometer. That means 3dbs will be lost in 15km, that is that the power of the signal will be halved in 15km. Attenuation in twisted pair and coaxial cable can be much greater. For example 3db per km at 10Khz, and much higher at higher frequencies.

Elementary Coding Theory: Suppose that we need to transmit a set A of symbols. We will encode the symbols of A with binary patterns using a code K. Say that a symbol a of A is coded as K(a) with len(K, a) bits. If p(a) is the probability of transmitting the symbol a, then the average number of bits used by K to encode a symbol of A is

	the sum of p(a)*len(K,a) for all a in A
For example, if we are using an alphabet with 4 symbols, we can encode these symbols using two bits. But if we know that the first symbol is used 90% of the time, the second 8%, and the other two each 1%, then if we encode them as 0, 10, 110, and 111 (this is a form of Huffman Coding) then the average length of a symbol becomes 0.9*1+0.08*2+0.01*3+0.01*3 = 1.12 bits, clearly better than 2 bits.

We may wonder what is the best that a code can do to minimize the average length of the code of a symbol of A. The answer is given by the Entropy of the probability of the symbols of A:

	the sum of p(a)*log2(1/p(a)) for all a in A
Asyncronous and Synchronous Transmission: RS-232 is an example of asyncronous transmission, that is of transmission where data is not transmitted continuously, instead it is sent as individual characters, where each character has information to identify its start and its end. It is a simple and cheap way of transmitting information but it has a high overhead (in RS-232 two out of nine bits are overhead, i.e. 22%). In synchronous transmission, whether bit-oriented or character-oriented, each bit occurs at a predictable position. A transmitted block is started and terminated by a well defined delimiter (bit stuffing or byte stuffing - see next section) and in between them the data is transmitted in sequence. Synchronous transmission is more complex but with lower overhead (thus more efficient in terms of utilization of the communication channel) than asynchronous transmission. Example of synchronous codes are BISYNC (for byte oriented bodies) and HDLC (for bit oriented bodies).