Csep-561-Lec-8

Network Planning, Datacenters, and Machine Learning

Where we are in the course:
The entire stack:
physical -> link -> network -> transport -> application
within a datacenter

Finishing up HTTP notes

Note that the QUIC paper was Google's initial paper. In the last 5 years IETF has standardized this RFC 9000 and some things have changed.

A big takeaway innovation is that by moving more of QUIC up into the application layer, we're less restricted by the underlying TLC API between the OS and the application, and this allows us to use more flexible multiplex streaming, thus avoiding head of line blocking.
The handshake improvement of QUIC over TCP+TLS is huge, way less (0) RTT.

What happens when you move between wifi APs or wifi / cellular?

Your IP address changes
This IP Mobility is a hard problem.
- DNS Anchor: a solution involving a DNS server which rapidly updates as a user moves between IP addresses
QUIC solves this by: expecting user to check on their own IP and make a new connection with the server (quick given the 0-RTT)

So why wouldn't we use QUIC?

Support for legacy application that can't be rewritten
Want to take advantage of specialized hardware
In the cloud, you might be in an environment where the underlying OS will change very quickly, and is more at the whim of a network operator.

Datacenters

Generally:

Handle internet-scale workloads in a cost effective manner.
Use commodity hardware for cost efficiency
Workload is often vritualized and portable
- VMs
  - EC2, Azure, etc.
- Containers
  - Docker, Kubernetes, etc.
- Serverless computing / functions
  - AWS lamba, etc.

Datacenters and Virtual Networks go hand-in-hand

Recall that in Virtual Networking
- the hosts' view of the network can be different than the physical network.
- we can have a bunch of VMs that have vNICs that all get connected at a virtual switch
A tunnel acts as a single link across the network overlay
- GRE (Generic Routing Encapsulation) in L3
- VLAN (802.1Q) in L2

SDN makes this feasible for people to manage

Virtualization can "stack"

Maybe a bunch of VMS
In a virtual network
You can run kubernetes on those VMs which also provies a virtual network
... adding layers will give you a simpler abstraction, but ultimately those packets still need to travel across the network, so there may be performance impacts.

Datacenter Concerns

Tail latency: what is the worst case latenchy (i.e. latest 99% or 99.99% percentile)
Throughput
Management of topology of the network
Typically a data centertenant will sign an SLA (Service Level Agreement) and if the datacenter performance falls beneath the statistical network guarantees (e.g. tail latency / uptime), there is monetary compensation.

Scale

Intra-datacenter service timescales measured in $\mu$ s. At such timescales, everything matters:

Interrupt latency to processor
Throughput & latency of PCI interconnect between NIC & processor
Context switch overhead for syscall between userspace & OS kernel

Kernel Bypass Networking

We can give a userspace process exclusive control over NIC and skip the kernel.
- This is common now for high-perf networking software
DPDK (DataPlane Development Kit) and VPP (Vector Packet Processing) are two common frameworks for this.

eBPF

This is the opposite approach: framework and infrastructure for writing limited programs from userspace that execute in kernel context
(This is non-Turing complete, has bounded execution time)
Avoids syscall overhead, moving per-packet processing into the kernel and only pushing sets of packets to userspace if needed

Smart NIC

This goes even further: but a processor right on the NIC.

Question: Why do this? Wouldn't the NIC still need to communicate with the processor on the VM, for an application to actually use the network data?

Answer: offload repetitive common networking tasks, like verifying checksum and sending ACK

Challenges:

Limited resources on NIC: memory, performance
- because it's optimized for low latency / low power
  `

RDMA (Remote Direct Memory Access)

This allows a NIC to move bytes to/from another server's main memory without the processor involved.

Extremely low latency given no CPU interrupt

Machine Learning

ML workloads are growing
Large models constrained more by memory than compute.