Fhe-Intro

Fully Homomorphic Encryption

Notes from this brief survey.

What is it?

A homomorphism is just a function that respects a given algebraic structural relation from its domain to its codomain. If the encryption function $Enc : P \to C$ from plaintext to ciphertext respects some operation, such as addition, e.g.
$$ Enc(m_1) + Enc(m_2) = Enc(m_1 + m_2) $$
then it would be considered homomorphic with respect to addition. We say an encryption is fully homomorphic if it is homomorphic over both addition and multiplication.

History

The first FHE scheme was published by Craig Gentry in 2009. It was based on a lattice-based cryptosystem called NTRU that was somewhat homomorphic (SHE), meaning it was homomorphic for a fixed number of operations, but Craig exposed a way to refresh ciphertexts in a process called bootstrapping. This first scheme was too computational- and memory-intensive to be used in a real world setting.
However, a number of extensions from the original scheme have been introduced: Brakerski, Gentry and Vaikuntanathan's (BGV) , Fan and Vercauteren's (FV), and YASHE. These are still being actively researched today, with lots of effort to reduce their performance overheads.

Applications

Today, performance considerations still hinder broad adoption. In Microsoft's SEAL library, a simple logistic regression model acting on 1 Mb of data results in 10Gb of encrypted data, with each multiplication taking over 5 seconds.

Assume ciphertexsts are $n$-dimensional vectors over $\mathbb{Z}_q$. Increasing multiplicative depth is done by increasing $q$, but this requires increasing the dimension $n$ to keep the ciphertext noisy. However, increasing $n$ also increases the time complexity of additions.

Thus, given a certain level of security to maintain, there's a balance between computation capacities and computation overhead.

Background

  • Given some basis ${v_1,\ldots,v_n}$ of a vector space, a lattice is the set of vectors $$ {a_1v_1 + \cdots + a_nv_n \ | \ a_i \in \mathbb{Z}} $$ that is, linear combinations of the basis vectors with integer coefficients. (I guess this assumes we're working in $\mathbb{R}$ but restricting to $\mathbb{Z}$?) The Closest Vector Problem (CVP) is finding the closest lattice point to a point in the larger vector space.
    • Is the lattice a subspace?
      • Certainly not! It doesn't contain scalar (of the field) multiples of its elements.
  • The Learning With Error (LWE) problem concerns how to find the lattice vector closest to a vector with added error noise. There is a relationship between this problem and finding a more relevant lattice basis (finding a basis with vectors that are as short and orthogonal as possible makes the problem easier).
  • There is a polynomial time algorithm for basis reduction called LLL; however, it is an approximation, and error is exponential with respect to dimension. Therefore, for a given $q$, the problem gets harder as $n$ increases.
  • Let $R_p$ denote the ring of integers modulo $p$. For this exploration, our plaintext domain is $R_p$, and our ciphertexts live in the $n$-dimensional vector space over $R_q$, where $p \ll q$.
  • Reduction modulo $q$ produces result in $[-\frac{q}{2}, \frac{q}{2}]$

Lattice Based Cryptography (SHE)

Consider a vector space where $c_1$ is some linear combination of the basis vectors with some error, and $a_1$ is the linear combination without the error. Finding $a_1$ from $c_1$ is equivalent to CVP. The idea is to hide a plaintext message within the error. If we choose some $p \ll q$ and an error $e$ divisible by $p$, then taking the message modulo $p$ will remove the error.

  • TODO is $a_1$ all integer coefficients? Not explicit, but I assume so.
    In this setting, the properties of the vector space give us homomorphism over addition:
    $$ c_1 + c_2 = (e_1 + m_1) + (e_2 + m_2) = (m_1 + m_2) + (e_1 + e_2)$$
    provided that the errors $e_i$ are kept small.

To recap, $q \ll n$ to keep CVP hard, $p \ll q$ to allow encryption, $p \ll q$ to keep errors small. And multiplication causes error to grow even faster (hence the discussion around multiplicative depth).

More Formally

  • Let $s \xleftarrow{\$} \mathcal{N}^n$ denote that $s$ is sampled from distribution $\mathcal{N}$. (I think missing exponent: $\mathcal{N}^n$)
  • Let $s \xleftarrow{\$} \mathcal{R}^n_q$ denote that $s$ is sampled from the ring $\mathcal{R}^n_q$.
  • Tidbits that weren't initially clear in the article:
    • $a_i$ is a vector
    • $b_i$ is an integer
    • still no idea where $r$ without a subscript comes from :shrug:
ContextGen
  • Given: requirements of plaintext modulo $p$, multiplicative depth $L$, security parameter $\lambda$
  • Output: ciphertext modulo $q$, variance of error distribution $\sigma$, and dimension $n$.

Note that from $p$ and $L$ we can directly find a $q$ to ensure the correct operations of encryption and decryption. It is a bit more difficult to ensure security: we must find an error distribution with a small norm, yet sufficiently random. Hence we use a normal distribution centered at mean zero with sufficiently high variance $\sigma$. Given all of these parameters, we can then choose $n$ to ensure security

  • Caveat: this context generation is a simplification; see BGV / SEAL for a complete version.
KeyGen
  • Given: the parameters from the ContextGen,
  • Output: basis of lattice $A$, a private key $s$, and a public key $(A, b)$.
    $$\begin{align} \ & function \ KeyGen(q, \sigma, n) \\ \ & A \xleftarrow{\$} \mathcal{R}^{n \times n}_q \\ \ & s \xleftarrow{\$} \mathcal{R}^{n}_q \\ \ & e \xleftarrow{\$} \mathcal{N}^{n} \\ \ & b \leftarrow (\langle A,s\rangle + p \times e \mod q) \\ \ & return (A, b) \\ \end{align}$$
    (No algorithms in mathjax :( ) So to find $s$, an adversary would need to firstt solve CVP for $b$, and then solve the linear equation $\langle A,s\rangle = b - p \times e \mod q$.
Encrypt
  • Given: plaintext message $m_1$ and public key $b$
  • Output: ciphertext $c_1$

$$\begin{align} \ & function \ Encrypt(m_1, b) \\ \ & r_1 \xleftarrow{\$} \mathcal{N}^{n} \\ \ & e_1 \xleftarrow{\$} \mathcal{N}^{n} \\ \ & a_1 \leftarrow \langle A^T,r_1\rangle \mod q \\ \ & b_1 \leftarrow \langle b,r_1\rangle + m_1 + p \times e_1 \mod q \\ \ & c_1 \leftarrow (a_1, b_1) \\ \ & return \ c_1 \\ \end{align}$$

Decrypt
  • Given: ciphertext $c_1$ and private key $s_1$
  • Output: plaintext $m_1$
    $$\begin{align} \ & function \ Decrypt(c_1, s_1) \\ \ & (a_1, b_1) \leftarrow c_1 \\ \ & m_1 \leftarrow (b_1 - \langle a_1, s \rangle \mod q) \mod p \\ \ & return \ m_1 \\ \end{align}$$
Add
  • Given: two ciphertexts $c_1, c_2$
  • Output: new ciphertext $c_3$ that decrypts to $m_1 + m_2$
    $$\begin{align} \ & function \ Add(c_1, c_2) \\ \ & (a_1, b_1) \leftarrow c_1 \\ \ & (a_2, b_2) \leftarrow c_2 \\ \ & a_3 \leftarrow a_1 + a_2 \\ \ & b_3 \leftarrow b_1 + b_2 \\ \ & return \ (a_3, b_3) \\ \end{align}$$
Mult
  • Given: two ciphertexts $c_1, c_2$
  • Output: new ciphertext $c_3$ that decrypts to $m_1 \times m_2$

The strategy for multiplication is unfortunately not very clear from the article. An explicit algorithm is not given. If I had to guess, it would look something like this:
$$\begin{align} \ & function \ Mult(c_1, c_2) \\ \ & (a_1, b_1) \leftarrow c_1 \\ \ & (a_2, b_2) \leftarrow c_2 \\ \ & a_3 \leftarrow \langle a_1^T, a_2 \rangle \\ \ & b_3 \leftarrow b_1 \cdot b_2 \\ \ & return \ (a_3, b_3) \\ \end{align}$$
with perhaps the matrix $\langle a_1^T, a_2 \rangle$ arbitrarily transformed into the shape of a vector of length $n \times n$. What is explicit is that computing $c_3$ increases the ciphertext quadratically (consistent with the above conjecture implementation) along with increasing the error quadratically.

Caveats

To keep error in check, for a given multiplicative depth $L$, $q$ needs to increase exponentially with the message space $p$. Thus, this SHE scheme is practically limited in the number of multiplications.

From SHE to FHE

The main idea behind Getry's bootstrapping trick is that the decryption operation itself can be computed homomorphically over already-encrypted data.
The first level (we'll subscript with $1$), is just as above: message $m_1$, secret $s_1$, and ciphertext $c_1$. Next, generate a second secret $s_2$. Then set
$$ c_1' := Enc(c_1, s_2) \qquad \text{ and }\qquad s' := Enc(s_1, s_2)$$
so we've moved into the realm of things encoded by $s_2$. But decryption in this realm $Dec(c_1', s_1')$ is the same as decryption in the first level, and then encoding by $s_2$:
$$m_1 = Dec(c_1, s_1) \quad\text{ and }\quad Enc(m_1, s_2) = Dec(c_1', s')$$
However,

  • decryption has great multiplicative depth, which is going to increase both error and ciphertext size (as seen above)
  • re-encryption is also memory intensive, with quadratic overhead.

Therefore, research in this area is very much focused on

  • increasing multiplicative depth capabilities
  • reducing ciphertext errors

Ring Learning With Error (RLWE)

These ideas come from algebraic number theory. Evidently, the vectors described thus far can also be seen as polynomials modulo the $n^{th}$ cyclotomic polynomial. All the current practical implementations of FHE are based on RLWE.

  • Pretty much all the important results in LWE can be translated to RLWE.
  • Computation overhead improves when working with polynomials instead of vectors.

BGV

This was the first practical scheme for FHE, with a number of improvements on Gentry's ideas.

  • Reduced growth in ciphertext/key sizes
  • Reduced noise growth via switch modulus
    • This technique requires multiple keys shared between data owner and computing entity.
  • Reduced multiplicative depth of decryption algorithm
  • Reduced computation overhead for RWLE
  • Batching technique to combine multiple plaintexts to one ciphertext

FV and YASHE

These are popular today. They deal with error in a new "scale-invariant" way, rather than the switch modulus which requires multiple keys shared between parties: messages $m$ are multiplied by $\frac{q}{p}$ to shift messages into the "upper part of the cipher".

  • TODO what does this mean? a high number $\mod q$?
    Apparently, this means that errors and messages are "separated" so long as $e < \frac{q}{2p}$, in contrast to BGV.
    Multiplied ciphertexts need to be scaled down by $\frac{p}{q}$, which reduces noise growth.
    These schemes cannot keep ciphertexts in "Double CRT" form during multiplication.

What scheme to use?

  • Very small plaintexts ($2^1$ ~ $2^4$)-> scale-invariant scheme (FV or YASHE)
  • Larger plaintexts ($2^5, \ldots$) -> BGV
  • If you are doing bitwise logic, then encryption must be bitwise, hence plaintexts are literally just binary ${0, 1}$, and scale-invariant schemes do better.
    • Note: the paper seems to immediately contradict itself on this point :shrug:.

Implementation Challenges

Bitwise vs Integer Encryption

Often papers will refer to the fact that any operation can be reduced to bitwise logic circuits. However, this glosses over the fact that an AND operation on an encrypted bit actually requires multiplying two full ciphertext encodings, increasing error and requiring greater multiplicative depth. Thus, integer encryption is more efficient for arithmetic-only circuits.

Data Dependent Branching (DDB)

Algorithms that depend on conditional branching are now more complicated. They need to be translated to bit-logic computations that are

  • not trivial to translate
  • result in even more computations
Real Numbers

Many real-world computations depend on real numbers (meaning those in $\mathbb{R}$): computing averages, regression lines, etc. We can do this with the schemes described above, by choosing a set precision level, but without DDB we can't round ciphertexts, and in the end this means computing on even higher levels of precision. So, currently this incurs a ton of overhead.

Conclusion

Two things are holding us back from offering an automated cryptographic tool that doesn't require deep knowledge of the implementation details:

  • No automated DDB-free translation for algorithms
  • No feasible FHE computations on real numbers

Appendix

  • I haven't found the source for where "Double CRT" is defined, but I've deduced that CRT stands for Chinese Remander Transformation, and it probably comes from algebraic number theory. Seeing things about roots of unity, prime factors, polynomials, etc.