Csep-564-Lec-3

Memory Safety and Software Vulnerabilities

In blackbox, the fuzzer has no idea what the program looks like.
Here, we have the binary of the program as context.
Symbolic execution: an algebraic way of walking through a program execution.
Path constraints: what leads to a particular point in a program
Code coverage: fundamentally limited as a metric; we don't know the total number of bugs!

strcpy is bad
strncpy is also bad; may not leave a null terminator! just returns a dest pointer, no error indication.
strlcpy (from BSD)
- always null terminates
- returns size(src)
- but how can we check if there was truncation?
As of August 2022 there's a massive migration to strscpy throughout the Linux kernel.

Early 2000s solution: mark heap/stack (any writeable memory regions) with an NX non-executable bit.
However, we can still overwrite the RET to call into any executable library routine (e.g. ret-to-libc). By arranging variables on the stack, we can actually pass arbitrary values to functions and the next RET to go to.
- So we can chain these RETs together and ultimately do arbitrary computation.
- Known as return-oriented programming.

Solution: embed canaries (aka stack cookies) in stack frames and verify their integrity prior to function return.
- Randomized so attackers cannot guess it.
- Contains null bytes to prevent common string overwrites.
- Performance penalty! (8% for apache web server at one point)
- If an attacker has a buf and destination pointer that it can overwrite to the RET address, they can jump around the canary.

Randomize processes' address space, in particular:
- execution region base pointer
- position of stack
- position of heap
- position of libraries
More effective on 64-bit addresses, since the address space is so huge
Attacks:
- Any vulnerability that prints pointer values, can be used to map out the memory space.
- NOP sleds / heap spraying to increase likelihood of reaching adversary's code
  - Suppose we can corrupt return address, but we don't know where to point it to.
  - 0x90 is a valid instruction that does nothing; so string them together before the shellcode
  - Essentially builds a bigger target for the shellcode payload

Attack: overflow a function pointer so that it points to attack code
Solution: Encrypt all pointers in memory, and only decrypt them in registers.
- Generate random key on program execution
- Each pointer is XORed with the key on load/store between memory/registers.
Problems:
- Must be fast, as dereferences are very common
  - But XOR isn't too bad
- Compiler must be careful to only enc/dec pointers
- Compiler sometimes spills register values to memories when they are full
- Need to store key in its own non-writeable memory page / register
- What about passing a pointer from user program to OS kernel code?
Note: not generally adopted, but some successors showing promise.

Attack: people commonly overwrite return addresses on the stack.
Solution: store return addresses on a different stack. They live in different regions of memory, so overwriting the stack via buffer overflow won't actually affect the RET.
Either store/retrieve RET on function call/return, or simply duplicate the RET on the normal stack but verify that it matches the shadow stack on function return.
Hardware support exists
Problems:
- Where do we put it?
  - A static offset is no good, attacker will know it.
  - Randomized offset? Store it somewhere? Is it modifiable by attacker?
- How fast is it? Hardware helps...
- How big is the shadow stack?

Consider

if (access("file", W_OK) != 0) exit;
fd = open("file", O_WRONLY);
write(fd, buffer, sizeof(buffer));

An attacker might watch for file access and then ln -S /etc/passwd file before the open call!
Solution: lock the file.

Consider

fn check_pw(real, candidate) {
    for i in 0..8 {
        if real[i] != candidate[i] return false;
    }
    return true;
}

Attack: time it!

These are complex to deal with! In particular cryptographic code in general cannot use branching depending on secrets, e.g. if (secret).

This means you need to be careful with compiler optimizations, e.g. use -O0 to turn off optimizations.
Over a network? Such attacks are still practical :(
Examples:
- Cache misses
- Padding oracles

Timing is only one such channel. Other examples:

Check inputs
Check all return values (e.g. detect malloc failure)
Least privilege: any piece of code or user should be using the least privileges they need to do their job.
Securely clear memory (passwords, keys, etc.) when you're done using it.
Failsafe defaults
Defense in depth: prevention, detection, and response
Simplicity, modularity: make it easier to secure
Minimize attack surface
Use vetted components
Security by design
- Define up front your threat model and adversaries

Positive example: Linux kernel backdoor attempt thwarted by open source review in 2003.
Negative example: Heartbleed in 2014
- OpenSSL vulnerability that allowed arbitrary memory accesses.
Unclear whether or not open source is actually helpful in this regard.

What if you find a security bug? What if you tell the company and they do nothing? There are a few options: