Csep-564-Lec-5

More Cryptography

2DES

So why is 2DES a bad idea?

Recall
- $K_1$ is 56 bits
- $K_2$ is 56 bits
- $E(M_0) = Enc_{DES}(K_2, Enc_{DES}(K_1, M_0))$

Guess $K_1, K_2$ independently as $A, B$. Suppose $C_0, M_0$ is our cipher/plain texts pair.
Now make an encryption table table $E(M_0, A_0), \ldots, E(M_0, A_{2^{56}})$
And a decryption table $D(C_0, B_0), \ldots D(C_0, B_{2^{56}})$.
There will exist an $A_i, B_j$ pair where the encryption and decryption matches $E(M_0, A_i) = D(C_0, B_j)$.
BUT this is $2^{57}$ guesses for a $112$ bit key!

Encryption

We want to encrypt arbitrary length messages. To do so, we use block ciphers.

ECB Mode

This is the obvious mode but it has horrible security properties. In particular, it utterly fails IND-CPA because identical plaintext blocks result in identical ciphertext blocks. There's also no integrity; attackers can shift blocks around without detection.
This is the obvious mode but it has horrible security properties. In particular, it utterly fails IND-CPA because identical plaintext blocks result in identical ciphertext blocks. There's also no integrity; attackers can shift blocks around without detection.

CBC Mode

This mode uses an Initialization Vector (IV) for the first block, and every subsequent block gets XORed with the previous block's cipher output. Still no integrity, but we don't leak plaintext structure like in ECB mode.

Enc not parallelizable
Dec is parallelizable

CTR Mode

Counter Mode uses an initial random ctr key. Then each plaintext block $p_i$ is XORed with the output of block cipher of (ctr + i). This is essentially a one-time pad for each block.

What mode to use?

Just use a good library! But, other good modes are:

GCM (Galois/Counter Mode)
CTR (sometimes)
If messages never repeat and they're all less than 128 bits, ECB is actually okay too.
AES-128 or AES-256 are standard.

When is an encryption scheme secure?

Levels of security:

Ciphertext-only: adversaries only know ciphertext
KPA (Known plaintext attack): adversaries know some plaintext-ciphertext pairs
CPA (Chosen plaintext attack): adversaries can obtain ciphertext for any plaintext they want
CCA (Chosen ciphertext attack): adversaries can decrypt any ciphertext except the target ciphertext

Integrity

So far, we've only discussed achieving privacy. We still need to handle integrity, to ensure that messages aren't modified in transit. We do this with MACs (Message Authentication codes).

CBC-MAC is one method (actually a derivative is used called CMAC).
- Take your ciphertext, run it through the CBC-mode encryption above, and then use the last block output as your tag. This has great properties! The last block depends on all the previous ones, so this has strong integrity. And yet it only takes one block ~ 128 bits of space.

Hash functions

Or, hash functions. A hash function is a lossy compression function that tries to:

look random: $H(x)$ should not be predictable based on $x$
preimage resistance: given $y$ it should be hard to find $x$ such that $h(x) = y$.
collision resistance: it should be hard to find two distinct $x, x'$ such that $h(x) = h(x')$.
weak collision resistance: if you are given $x$, it is hard to find $x'$ such that $h(x) = h(x')$. This does not imply collision resistance, where the attacker gets to choose both inputs.

As an aside, given the birthday paradox, with a $2^{N}$ size output, you can expect a collision with $2^{N/2}$ distinct inputs. So you get 64 bits of security out of a 128 bit hash.

Hashing vs Encryption

Hashing is one-way. There is no "un-hashing" or decryption.
Hash(x) looks random but can be compared for equality with Hash(x'). (It's not randomized!)
TODO what was this bullet point

Applications

Storing passwords: For example, hashing is a much better solution for storing passwords
- They can't be decrypted
- Don't have to store a key
- In practice we actually suffix the passwords with a random salt before hashing, and then store the plaintext salt next to the password. This prevents precomputation attacks where adversaries precompute hashes of common passwords.
Software Integrity: users can download software and check the hash to ensure the file is the intended one. It would be extremely difficult for an attacker to send a bad file that matches the hash of the good file.
Commitments: A user can submit $H(B)$ without revealing $B$. Later, they can reveal $B$ with integrity.

Common hash functions

SHA-2
SHA-3
Don't use MD5
- Some collision problems
- Because it is so fast, it's easy to brute force
Don't use SHA-1
- Broken in theory in 2004
- Broken in practice in 2017 by Google
  - Two different PDFs that output to the same SHA-1.
- Only widely deprecated in 2018.

Evaluating hash functions

Speed
Diffusion: does changing 1 bit affect most of the output bits?
Resistance: to collision attacks, one-wayness attacks, etc.

HMAC

Construct the integrity MAC from a cryptographic hash function.

Why not encryption?
- Block ciphers are pretty expensive
- Can easily replace one hash function with another
- There used to be US export restrictions on encryption

Note: we don't actually need HMACs anymore, as integrity is built into SHA-3.

Authenticated Encryption

What if we want both privacy and integrity?

Don't encrypt and MAC:
- Appending the MAC to the encrypted message leaks information about the plaintext! It's deterministic! (Same problem as ECB).
Encrypt then MAC:
- Encrypt the message $M$ into $C$.
- Append $T = MAC(C)$ to the ciphertext.
- Note: if we MAC then Encrypt, we need to decrypt before checking integrity. This is annoying and also opens up timing attack vectors.

Asymmetric Cryptography

Anyone can encrypt a message (they just need the target's public key).
Only someone who knows private key can decrypt.
Key management is... different.
- It's nice that we don't need to securely share private keys.
- But there are more keys to deal with.
Digitial signatures for authentication:
- We can sign a message with a private key.
Session key establishment
- Exchange messages to create a secret session key.
- Then switch to symmetric encryption (it's faster and easier).

Diffie-Helman

Choose some large $p$ and a generate $g$ of the group $Z_p^*$. There are public values we can use in practice. Then do Diffie-Helman (covered in Applied Crypto notes, on paper).
These days, we use elliptice curve groups instead.
Note that Diffie-Helman does not provide authentication, so, it's secure against passive attackers but not active attackers.

Note: Our asymmetric schemes are grounded in hard math problems. Symmetric encryption (block ciphers) are more contrived things that just look hard.

RSA

NOTE: don't actually use RSA anymore!
The Euler totient function $\phi(n)$ is the number of integers in $1,\ldots,n$ that are relatively prime to $n$. If $a, b$ relatively prime, $\phi(ab) = \phi(a)\phi(b)$.
Key gen:
- Generate large primes $p, q$.
- Compute $n = pq$ where $\phi(n) = (p-1)(q-1)$.
- Choose small $e$ relatively prime to $\phi(n)$, typically $e \in {3, 65537}$.
- Compute a unique modular inverse $d$ such that $ed = 1 \mod \phi(n)$.
- Set public key to $(e, n)$ and private key to $(d, n)$.
- $Enc(m) = m^e \mod n$.
- $Dec(c) = c^d = (m^e)^d = m \mod n$.
Security relies on the hardness of factoring into primes.
Messages must be an integer less than $n$.
Output is deterministic, so it needs to pre-process input somehow.
Plain RSA also does not provide integrity.

In practice, OAEP is used instead of RSA. Why?

Math is OK, but implementations are not.
All implementations are riddled with side channels.
See Fuck RSA by Trail of Bits.

Signatures

We can use RSA for signing messages:

Sign with $s = m^d \mod n$.
Verify with $s^e = (m^d)^e = m \mod n$.

Post-Quantum

If quantum computer becomes a reality, factoring integers becomes very easy (polynomial time), and conventional asymmetric encryption schemes are busted.
Check out NIST's PQC competition

Distributing Public Keys

Common approach: Certificate Authority (CA):
- A single agency responsible for certifying public keys
- Computers are pre-configured with the CA's public key
- Similarly chrome/firefox ship with CA's that they trust
- This is used in SSL/TLS.
We use a hierarchy of these CAs, because a single or a few certifying all public keys is just not practical. At the top is a root authority e.g. Verisign.

SSL/TLS

Consists of two protocols
- Handshake: public-key crypto to establish shared secret between client and server
- Record: Use that secret in symmetric-key crypto to communicate between the client and server.