Naming in the Internet: Routers, IP Addresses

CIS 307: Structure and Naming in the Internet

Routers, IP Addresses, Private Addresses And Networkd Address Translation (NAT), LAN Addresses

Routers

A router is a box (often a regular computer) with (at least) two ports (i.e. interfaces), used to connect possibly dissimilar networks and help packets go from a source to a destination. It differs from bridges since it operates at the network level. [It will also use different addresses. For example a bridge may use Ethernet addresses while a router uses IP addresses.] It does all the transformations that may be required by the transfer of packets across the networks it connects.
A router is concerned with where to send packets next as they move from a source to a destination through a set of interconnected networks.
It is convenient to distinguish two activities:

Forwarding (or Switching): The process taking place at a router when it receives a packet and has to decide where to send it next on the basis of its destination and of information available at the router. And
Routing: The process through which routers receive and elaborate the information that they will need in the forwarding process. Usually this information is gathered and transmitted using protocols called routing protocols, and elaborated using algorithms. And usually the gathered information takes the form of routing (or forwarding) tables.

Unfortunately common terminology does not always preserve this distinction, and too often one sees "routing" used in place of "forwarding".

Routing (we really mean forwarding) could be of three kinds (at least!):

Source Routing, where the decision on what intermediate nodes to cross is made before a packet is sent; then the packet knows from the beginning where to go next after arriving at an intermediate node.
Virtual Circuit Routing, where a connection is established before the first packet is sent. Then each packet as it travels will contain as destination the id of a virtual circuit (this id may change as the packet moves across the network) and each intermediate node will contain a table with 4 entries, an entry-port, an entry-virtual-circuit-id, an exit-port, and an exit-virtual-circuit-id. Routing at a node determines the port the packed arrived from and its virtual circuit, then it will send the packet forward with the exit virtual circuit id as new destination and using the exit port as indicated by the table.
[Packet] Routing, where each packet is individually routed in accordance to a next-hop routing table. In these tables, for a given destination, there is usually a single next-hop. Even when there are more than one next-hop, the decision on where to go next is not done on the basis of source of the packet, but on the basis of some form of cost.

We will only consider packet routing.

Routing tables contain information that will indicate for each packet on the basis of its final destination (usually an IP address) where to go next (next-hop forwarding - the address of the next router). If there is no explicit indication of how to get to some destination, a default next-hop will be used. Cycles can exist in the graph that has routers as nodes and links as edges. Routing tables function also in the presence of cycles since packets have a Time-To-Live (TTL) field that is used to limit the number of hops they can go through. It is important that routing tables not be too large.

Evaluation of Routing Algorithms:

Route quality (optimality): network utilization, path length, delay, bandwidth, communication cost, reliability
Overhead (simplicity): control messages, processing, state (i.e. memory required)
Speed of convergence to best routes
Robustness: Responsiveness to topology changes

Routing characteristics:

Centralized/decentralized
Static/Dynamic
Location of decisions (hop-by-hop[decision at each node]/Source-routing[decision at source])
Frequency of decision (per packet, per session, per topology change)
Single Path/Multipath: The routing algorithm may provide alternative routes to be taken to avoid congestion, or improve throughput, ..
Flat or Hierarchical: i.e. all routers are at the same level, or routing takes place at two levels, one to get to the general area, the other to navigate the local neighborhood.
Protocol: Information distribution and route computation algorithm

IP Addresses

Names such as temple.edu are called domain (or network) names and names such as joda.cis.temple.edu are called host names. Domain names and host names are mapped to IP addresses using the Domain Name System (DNS). IP Version 4 addresses, also called IPv4 addresses, or just IP addresses, are 32 bit integers. [IPv6, which is the new version of IP, and which we do not study, uses an IP address with 128 bits.] They are normally written as 4 small integers representing the bytes of the number separated by periods (dotted decimal notation). For example 155.247.182.1 is an IP address. Each IP address consists of two portions, a network identifier and a host identifier. IP addresses are now allocated by IANA and soon will be by ICANN.
There are 5 classes of IP addresses:

Class A: The network identifier is 1 byte and the host identifier is 3 bytes. The network identifier will start with a 0 bit. For example 126.46.31.87 is a class A address. The network identifier is 126, often written 126/8 to stress that it is 8 bits.
Class B: The network identifier is 2 bytes and the host identifier is 2 bytes. The network identifier will start with the bits 10. For example 155.247.170.2 is a class B address. The network identifier is 155.247/16.
Class C: The network identifier is 3 bytes and the host identifier is 1 byte. The network identifier will start with the bits 110. For example 200.77.88.91 is a class C address. The network identifier is 200.77.88/24.
Class D: It starts with the bits 1110 and it is used as a multicast address. For example 225.65.90.3 is a class D address. [unicast = sending from a source to a specific destination; broadcast = sending from a source to every destination within a network; multicast = sending from a source to a set of destinations.]
Class E: It starts with bits 1111 and it is currently not in use.

A number of IP addresses have a standard meaning:

+------------+------------+----------+-------------------------------+
| Network    | Host       | Type of  | Purpose                       |
| Identifier | Identifier | Address  |                               |
+------------+------------+----------+-------------------------------+
| all 0s     |  all 0s    | this     | Used during bootstrap to      |
|            |            | computer | ask for own's IP address      |
+------------+------------+----------+-------------------------------+
| Network    |  all 0s    | specified| The specified network,        |
| Identifier |            | network  | independent of its hosts      |
+------------+------------+----------+-------------------------------+
| Network    |  all 1s    | specified| Broadcast address for the     |
| Identifier |            | network  | specified network.            |
+------------+------------+----------+-------------------------------+
| all 1s     |  all 1s    | local    | Broadcast to local network    |
|            |            | network  | only (limited broadcast)      |
+------------+------------+----------+-------------------------------+
| 127        | anything   | loopback | Testing of TCP/IP while not   |
|            |            |          | using the network(loopback)   |
+------------+------------+----------+-------------------------------+

IP addresses are associated to host interfaces, not directly to hosts. In other words, each network interface of a computer system has its own IP address: the map from hosts to IP addresses is one-to-many. In turn a particular host may have more than one host name, though one of the host names is called the canonical name of the host, thus also the map from IP addresses to host names is one-to-many. Mappings between IP addresses and host (and domain) names are managed by DNS. On Unix you can find out about these mappings using the command:

    %  nslookup ip_address-or-host_name

Forwarding with a Simple Routing Table

Here is a portion of a (real) routing table:

	Destination      Gateway            Interface
	================================================
	155.247.71/24    155.247.71.60      ln0
	127.0.0.1        127.0.0.1          lo0
	default          155.247.71.1       ln0
	================================================

155.247.71/24 is the name of the local network, packets to it should be sent out through the interface ln0 to the IP address 155.247.71.60. Notice the notation "../24". That means that we are interested only in the first 24 bits of this address. The consequence is that if we are trying to reach 155.247.71.83, that will match the entry 155.247.71/24. 127.0.0.1 is the loopback address, we can use it to test our networking software even without a network: it is sent through the interface lo0. For any other destination, the packet will be sent to IP address 155.247.71.1 through the interface ln0. The routing table of a Unix machine can be obtained with the command

    % netstat -rn

By the way, if you want to know what are the interfaces and their chracteristics of your computer you can use

    % ifconfig -a

For example on my machine I find 3 interfaces, ln0, sl0, and lo0. I can then find more about each interface with, for example,

    % ifconfig -I ln0

In general, if T is a routing table with entries with fields [destination, gateway, interface], and D is the destination, then we execute the program:

	for each row R of the routing table T
	    if (D == T[R].destination) //equality is for the bits significant 
                                       //in T[R].destination
		send packet to T[R].gateway through interface T[R].interface;
		return;
	send packet to T[default].gateway through its interface.

Routing and routing tables are more concerned with reaching networks that with reaching hosts. So in the routing table the destination will denote a network, not an host. Once one reaches the correct network, the local system will worry about local delivery [think of delivery to a host on a LAN, the last step involves translation from IP address to physical address and transmission on the shared medium].

Routing algorithms, i.e. algorithms used to exchange the information needed for computing routing tables, are implemented using routing protocols. Examples of such protocols are RIP (Routing Information Protocol), OSPF(Open Shortest Path First), BGP (Border Gateway Protocol). IRDP (ICMP Router Discovery Protocol) is used to identify routers and to report their identity. The packets exchanged in the routing protocols are called routing packets and they contain control information, i.e. they are overhead.
In a different set of notes we will study the routing algorithms used in conjunction with the OSPF and RIP routing protocols.

Subnetting

The granularity of IP address classes leads often to poor utilization of the address space and to limited ability to address subgroups within a network. The solution is to use Subnetting. Assume that we have a class B network like 155.247. We can partition the host space into 10 bits for subnet id and 6 bits for host id. Thus we have 1024 subnets each with up to 62 hosts (64 - 1 network - 1 broadcast). Subnetting is based on the use of masks. In our example, the subnet mask is 255.255.255.192. The bitwise AND of an IP address with the submask will result in the subnet identity. In our example, if we have the IP address 155.247.182.98, then the subnetwork id, also called extended-network-prefix, is 155.247.182.64 and the subnet is known as 155.247.182.64/26 to stress that it uses 26 bits, leaving the remaining 6 bits for the host-id (which is 34). Notice that from far away packets will go to the network 155.247/16, and once there packets will go to the specific subnet, and from there to the intended host. [The address 155.247.71/24 we encountered earlier, means that the class B network 155.247 is split into 256 subnets each with 254 IP host addresses. In other words, it is as if the class B network was split into class C networks.]

To account for subnetting a routing tables T takes the form:

[subnet-id, subnet-mask, next-hop] where the subnet-id is uniquely defined for a network (i.e. all the subnets of a network share the same mask, i.e. they have the same number of bits). The next-hop is the port (interface) of the router through which the current packet should be forwarded.

Then when an IP address A has to be routed the algorithm used is:

   For each row i of routing table T
       Let D = T[i].subnet-mask BitwiseAnd IP;
       If (D == T[i].subnet-id) then
       {
          Forward packet to T[i].next-hop;
          return;
       }
   Forward packet to default;

Normally, routing moves packets across the internet until the packet arrives to the destination network. Then the packet is directed to a specific host. With subnetting it becomes possible to route packets across the internet to arrive to a specific subnet, and then to move within the subnet to a specific host.

Classless Inter Domain Routing (CIDR)

The ideas of masks and subnetting have been generalized to allow more complex partitions of networks than the one we have just discussed. In particular, variable length subnet masks have been used. This is done with the Classless Inter Domain Routing (CIDR). Now the masks used in routing can be of any size and in matching IP addresses one aims for the longest match. For example, suppose that in a routing table we have a row for the network 1101011110110 and a row for the network 11010111101 then, if we are looking for the destination 11010111101101111110010111010010 we will use the first row since it matches the given destination and it is more specific than the second row. CIDR helps reduce two kinds of problems: the fact that IP addresses are not efficiently allocated using the class oriented schema; and the fact that routing table may grow to be very large. For example, if an ISP controls four Class C addresses:

    200.77.0/24
    200.77.1/24
    200.77.2/24
    200.77.3/24

then these four addresses can be aggregated into a single address

    200.77.00/22

thus requiring a single entry in routing tables instead of four [this is a form of supernetting, the inverse of subnetting].
But what if this ISP has only the networks 200.77.0/24, 200.77.1/24, 200.77.2/24 and another ISP has 200.77.3/24? We can still use aggregation: the first ISP uses the address 200.77.00/22. The second ISP uses 200.77.3/24. Then the entry for 200.77.3/24 is tested before the entry 200.77.00/22. Thus the first ISP, will use the second entry (200.77.00/22 does not match 200.77.3/24) while the second ISP will use the first entry (200.77.03/24 of course matches 200.77.03/24).

Private Addresses And Networkd Address Translation (NAT)

An enterprise may have networks that are mainly intended for internal use, i.e. for communicating within the enterprise, not with the outside. In this case the nodes of the enterprise may use any of the addresses in the following three blocks:

	10.0.0.0    to 10.255.255.255
	172.16.0.0  to 172.31.255.255
	192.168.0.0 to 192.168.255.255

that are guaranteed never to be used anywhere in the (public) internet. As long as the nodes communicate with each other there is no problem since their IP addresses are "unique" within the network. The problem occurs when this private network is connected to the internet. At issue is what to do when communicating to/from an external node [it is assumed that in this situation the communication will be initiated by the local node]. In this case one can use a Network Address Translation (NAT) device (a router, or firewall, or ad-hoc device) to translate between the local addresses and public addresses that belong to the enterprise. This association, local/public, can be static (and usually for only a part of the local network), or dynamic, taking advantage of the fact that at one time only a few nodes will communicate with the environment.

Another solution is the use of a sophisticated NAT that uses Port Address Translation (PAT). It goes as follows: each local node (say 10.0.0.75) is assigned an IP port number (say, 6523) on the NAT (which, say, has IP address 197.48.73.25). Then when the local node wants to communicate with an external node (say, 155.247.152.12), it communicates with the NAT. The NAT sends the packet to the external node as if sent by 197.48.73.25.6523 and when it receives the reply from 155.247.152.12 on port 6523, it redirects it to the internal node 10.0.0.75.
Things are made complicated by the fact that the source address of the packet being transmitted by the NAT has to be changed from 10.0.0.75 to 197.48.73.25, with consequent change in checksum ... and more complications. Yet people have been able to create very successful NAT products.

LAN Addresses

Autonomous System Numbers (16 bits), IP Addresses (32 or 128 bits), Domain and Host Names, are all "logical" identifiers: they are not physically tied to a specific hardware device. Only when we get close to the physical level, at the Data Link layer, we encounter physical addresses, called LAN Addresses or MAC Addresses (Media Access Control). These physical addresses are used when we finally want to communicate on a LAN. They are usually physically tied to the device (the Network Interface).

IP addresses have to be converted to LAN addresses before we can actually access the devices. ARP (Address Resolution protocol) is the protocol used to convert from IP to LAN addresses. The conversion from LAN addresses to IP addresses can be done with the RARP (Reverse ARP) protocol. LAN addresses are usually 48 bit numbers. At one instant the map between IP addresses and MAC addresses is one-to-one. The stress here is on "at one instant": though usually IP addresses are permanently bound to MAC addresses, it is now possible for a network to dynamically associate IP addresses to interfaces using the Dynamic Host Configuration protocol (DHCP). For instance an ISP may allocate IP addresses dynamically to its clients as they get on line.
You can see the information currently available to arp with the command

   % arp -a

Addressing and routing becomes more complex when we consider mobile computing, i.e. the situation where portable computers move around the world.

Since we are on the issue of names in the Internet, let's remember other names you have encountered in your computing practice:

Universal Resource Identifiers: Universal Resource Identifiers (URI) form a system of universal names for Internet objects. They take the form scheme : path. When the scheme is an existing Internet protocol, the URI is said to be an URL.
Uniform Resource Locators: Uniform Resource Locators (URL) are URI where the scheme corresponds to existing well-known Internet protocols such as HTTP, FTP, mailto, file, .. In URLs the scheme names are case-insensitive. Within an URL can appear only printable ASCII characters. In an URL the following characters are unsafe " " , "<", ">", "#", """, "%", "{", "}", "|", "\", "^", "~", "[", "]", "/", ";", ",", "?", ".", "@", "=", "&" since they may have a special meaning. As such, they can be used only where allowed with the specified meaning. For all other circumstances these characters should be encoded using the form "%xy" where x and y are hexadecimal digits.
E-Mail Addresses: E-mail addresses are well known to us all, as a way to identify interlocutors on the internet. As you can see from the RFC specification, e-mail addresses can be more complex than we usually expect.