Csep-561-Lec-5

Routing, BGP, and Software Defined Networking II

Where we are in the course:
physical -> link -> network -> transport -> application

Follow up from last week:

  • We can't quite use IPs directly just yet. We need to handle:
  • Address assignment
  • Translate IPs to MAC addresses
  • Handle errors in the network
  • Handle packets that are too large/small
  • Running out of addresses

We solve this with the Address Resolution Protocol (ARP).

  • ARP is what the node uses to map a local IP address to its link layer addresses.
  • ARP sits right above L2, below IP. It's like the glue between layers.
  • No servers.
  • Uses L2 broadcast to reach all nodes, asking "who has this IP?".

Such broadcast protocols are very handy glue for nodes that haven't connected to each other yet.

This is a trusted domain; there's not any security at this layer!

Bootstrapping

Where do the addresses come from? To bootstrap, we use DHCP.

  • When a node wakes up for the first time, it needs an IP address, and that IP address needs to make sense to reflect the global hierarchy, i.e. have its subnet's prefix and all that.
  • The ethernet is on NIC, but the IP needs to be dynamic
  • If you take your laptop to NY, you still need to be able to connect to the internet!
  • DHCP (Dynamic Host Configuration Protocol): uses a server on your network automatically configures your IP.
  • Leases IP address to nodes
  • Provides network prefixes
  • Provides address of local router
  • Operates on UDP in ( ethernet -> IP -> UDP -> DHCP ) stack, using UDP ports 67, 68.
  • But how does this sit on top of IP if its assigning IPs!?
  • Solution: Broadcast!
  • IP does have the ability to broadcast, it's just not used in the global internet, but rather only on local networks.
  • So how does a node send a msg to DHCP before its configured?
  1. Node sends broadcast messages to all nodes on network.
  2. Broadcast address is all 1s.
  3. IP: 255.255.255.255
  4. Ethernet: ff:ff:ff:ff:ff:ff
  • Where does the DHCP server run? Depends on the network
  • At home, probably on the box from the ISP
  • At UW or some enterprise-y environment, it's likely on a rack in a server room somewhere.

Detecting errors

What happens when something goes wrong when forwarding?
We use ICMP (Internet Control Message Protocol), a companion to IP.

  • On the wire, ICMP is encapsulated within IP
  • Provides error reporting
  • Potentially even intermediate routers
  • It breaks the end-to-end functionality
  • So that you can debug stuff on intermediate hops
  • ping uses ICMP
  • When router encounters an error while forwarding:
  • It sends ICMP error back to the IP source and discards the packet
  • the host needs to rectify this.
  • Format:
  • Has type, code, checksum
  • Often carries the start of the offending packet as its payload
  • Each message is carried in an IP packet
  • IP header | ICMP header | ICMP data
  • E.g.
  • Destination unreachable (lack of connectivity)
  • Destination unreachable (packet too big)
  • Time exceeded (too many hops)
  • Echo request or reply (ping)
  • Traceroute leverages TTL/ICMP by sending probe packets with TTL = 1, 2, 3,... and observing the Time Exceeded ICMP errors to obtain a list of all the routers on the path to a destination.

Running out of addresses

  • Internet growth has led to billions of addresses.
  • We actually ran out in 2020.
  • The IETF (internet engineering task force) has been working on Ipv6 since 1994
  • Nothing much happened with it until 2011 since IPv4 was working alright. Big push in 2011 as IPv4 exhaustion loomed.

IPv6

  • 128 bits
  • 8 groups of 4 hex digits
  • Can omit leading zeros and groups of zeros
  • Allows link-local and unique-local addresses:
  • use the 64 bit prefix from the router and the bottom 64 bits from the MAC address.
  • No checksum
  • Flow label for grouping packets
  • IPSec by default
  • Better neighbor discovery protocol!
  • Uses ICMPv6 over link-local IPv6 addresses
  • Now you can use IP on your local network more seamlessly
  • Basically replaces DHCP and ARP and puts them into IP
  • SLAAC (Stateless Autoconfiguration)
  • Send local multicast "all routers" msg
  • Get prefixes from routers
  • Attach MAC to router prefix (plus math)

NAT (Network Address Translation)

  • An alternative to IPv6? More of a stopgap.
  • Basic idea: Map many private addresses to one public IP.
  • Allocated IP ranges for private use:
  • 192.168.0.0/16
  • 172.16.0.0/12
  • 10.0.0.0/8
  • Additional raneg for ISP NAP
  • 100.64.0.0/10
  • Breaks the layering!
  • Router was only supposed to care about the IP src/dest
  • NAT does "Middlebox" processing within the network to allow representing multiple addresses as one.
  • Scenario:
  • Home computers use "private" IP addresses (these get reused in different households)
  • NAT connects home to ISP using single external IP address
  • Configured in your router
  • How it works:
  • Keep internal/external translation table
  • Uses IP address + TCP/UDP port
  • E.g., map
  • 192.168.1.12 : 5523 -> 44.25.80.3 : 1500
  • 192.168.1.13 : 1234 -> 44.25.80.3 : 1501
  • Actually modifies the src/dest in the IP packet header as the packet goes out/in.
  • NAT is everywhere right now, but it is really an IPv4 wart and will go away eventually.

Fragmentation

If a packet exceeds MTU (Maximum Transmission Unit), we can fragment into multiple packets. But,

  • there's increased overhead
  • duplication of headers
  • reassembly on reception requires receiver to support buffers, reordering, etc.
  • if a fragment is lost, since fragmentation happens in the network, source has no way to know that only a fragment needs to be resent, but rather needs to send the whole packet again.

This was largely a bad idea and should have been solved in L2; hence IPv6 avoids fragmentation.

  • Mandated minimum MTU of 1280 bytes for L2 technologies that want to support IPv6.
  • Uses path MTU discovery: the host iteratively sends large packets, and listens for ICMP errors, to figure out the minimum MTU along the routing path.
  • Host then responsible for telling transport layer to decrease to the discovered minimum MTU.
  • Thus, no in-network fragmentation. Packets that are too big are dropped, and ICMPv6 error is returned to host.

Routing

We're essentially trying to find the lowest cost (a.k.a. shortest) path from host to host.

  • A sink tree of node A is the union of all the shortest paths from the rest of the network nodes to node A.
  • Given sink trees, we only need the destination address to follow the shortest paths.
  • Each node only needs to send to the next hop.

ECMP (Equal-Cost Multi-Path Routing)

  • Allow multiple routing paths from node to destination to be used at once.
  • Changes the sink tree to a directed acyclic graph.
  • If you just randomly pick just one next hop of "equal cost", this isn't actually equal in practice, and could add jitter.
  • This is where the "flow" field of IPv6 comes into play! The router can be aware that certain packets are part of the same flow, and keep them on the same path.

Classic Routing Algorithm Rules

Always want a decentralized, distributed algorithm:

  • All nodes are alike; no controller
  • Nodes only learn by exchanging messages with neighbors
  • Nodes operate concurrently
  • May be node/link/message failures

Distance Vector Algorithm

Each node maintains vector of distances to all destinations.

  1. Start with 0 to self, $\infty$ to others.
  2. Periodically send vectors to neighbors.
  3. Update vector for each destination by selecting the shortest distance heard, after adding cost of the respective neighbor's link.
  4. Use the best neighbor for forwarding.
  • But failed nodes are hard to discover! It can't be distinguished by just the link to that node being down.
  • Good news travels quickly, bad news travels slowly.
  • Still used today for low-resource or relatively static networks.

Two phases:

  1. Nodes flood topology with Link State Packets (LSPs)
  • Inefficeint, but each node learns the full topology
  1. Each node computes its own forwarding table
  • E.g. via Dikjstra's algorithm
  • Inefficient since the computation is repeated across nodes
  • And it gives the entire source/sink tree, which is more than what's needed for outputting a forwarding table.
    On a change, flood updated LSPs again. Since the flood goes to all nodes, and not just neighbors, this doesn't have the "count to infinity" problem of Distance Vector.

There are real-world issues with this algorithm, mainly they are consistent with the usual distributed systems problems of today.

Comparison

DV is more scalable with respect to compute resources, but LS converges more quickly in the presence of failures. Most scenarios today are not compute limited, so LS is more common.

BGP (Border Gateway Protocol)

Used for interdomain routing.
This is for dealing with the Internet.
Networks are richly interconnected using Internet Exchange Points (IXPs)

  • ISPs come together and agree to put routing equipment in a building, and connect each other with high gig fiber.
    Main problems:
  1. Scaling to very large networks
  2. Incorporating policy decisions
  • Letting big enterprises use routes that fit their interests.
  • This kinda sucks.
    Independent parties can yield inefficient routes for end users. ISPs want to limit traffic within their own network, so they will i.e. choose routes that send traffic out of the network quicker.
    Routing Polices:
  • Capture goals of different parties
  • ISPs give transit service to customers
  • ISP accepts traffic for customer from the rest of the internet
  • ISP sends traffic from customer to the rest of the internet
  • Customer pays ISP for the privilege.
  • ISPs can give peer service to each other
  • Each ISP accepts traffic from the other ISP only for their customers
  • ISPs do not carry traffic to the rest of the internet for each other
  • ISPs don't pay each other
  • (Optimization between equal sized parties)

This is a really flexible protocol.

  • iBGP is for internal routing
  • eBGP is for interdomain routing for the Internet
  • Kinda like Distance Vector but uses path vector.
  • Parties are called Autonomos Systems (AS)
  • E.g. ISP, UW, Microsoft have these
  • Assigned by regional Internet Assigned Numbers Authority (IANA)
  • BGP routers communicate with each other to keep up to date on route rules.
  • AS's self-configure their internal BGP routes
  • External routes go through complicated filters for forwarding/filtering policy.