Security Proofs

Defining MPC

This is a higher level introduction to security in the context of MPC. Many of the notions are re-defined more precisely in the Proof Technique section. Note that this comes from the Secure Computation book, and all chapters are available for free. In particular the Oblivious Data Structures and Alternative Threat Models may be worth exploring.

Preliminaries

  • Assume secure channels among parties.
  • Denote:
    • $\kappa$: computational security param
      • Governs hardness of problems that can be broken by adversary's offline computation
      • Typically 128 or 256
    • $\sigma$: statistical security param
      • Typically 40 or 80
    • interpretation: security violated with probability $2^{-\sigma} + \mu(\kappa)$, where $\mu$ is a negligible function depending on resources of the adversary.
  • See below for computationally indistinguishable definition. If we relax $D$ to be any algorithm regardless of its computational complexity, this becomes statistical indistinguishability.
    • In this case, the probability is bounded by the statistical distance:
      • $\Delta(X(a,n), Y(a,n)) = \frac{1}{2}\sum_x \left| Pr[x = X(a,n)] - Pr[x = Y(a,n)] \right|$

Primitives

  • Secret Sharing: primitive defined in an obvious way
  • Random Oracle: a heuristic model for security of hash functions
    • In this model, all parties have access to public $H:{0,1}^* \to {0,1}^{\kappa}$
    • On input $x$,
      • if $H$ has never seen $x$, it picks a random $r_x \in {0,1}^{\kappa}$, remembers it, and returns it.
      • if $H$ has seen $x$, it returns the remembered $r_x$.
    • This model only captures attacks that treat the hash function $H$ as a black box.
    • You can indeed construct schemes that are secure in the random oracle model, but insecure whenever $H$ is instantiated by a concrete function.
    • Despite this, RO model is often acceptable for practical applications, and can lead to more efficient constructions.

Real/Ideal Paradigm

  • In the ideal world, parties securely compute the function $\mathcal{F}$ by privately sending inputs to a completely trusted party $\mathcal{T}$, referred to as the functionality.
    • Each party $P_i$ has associated input $x_i$, sends it to $\mathcal{T}$, who computes $\mathcal{F}(x_1, \ldots, x_n)$ and returns the result to all parties.
    • Consider an adversarial attack:
      • It can take over any $P_i$ but not the trusted party $\mathcal{T}$.
      • $\mathcal{F}(x_1, \ldots, x_n)$ is the only message received, so the adversary learns no more than this.
  • In the real world, there is no trusted party, rather parties communicate using a protocol $\pi$.
    • Protocol specifies "next message" function $\pi_i$ for each party $P_i$
      • $\pi_i$ inputs: security param, private party input $x_i$, a random tape, and all messages $P_i$ received so far
      • $\pi_i$ outputs: either next message to send and destination, or instruct $P_i$ to terminate with some output
    • Adversaries can corrupt parties.
      • Corruption at start of protocol is equivalent to an adversarial party.
    • Corrupt parties might follow the protocol or deviate from it - depends on the threat model.
  • $\pi$ is secure if the effect an adversary can achieve in the real world can also be achieved by a corresponding adversary in the ideal world.

Semi-Honest Security

A semi-honest a.k.a. honest-but-curious adversary is one who corrupts parties but follows the protocol as specified; they just try to learn as much as possible the protocol messages they receive.

  • This may involve several colluding parties pooling their views together.
  • Also considered passive, since they can't take any actions other than std protocol.
  • An "attack" can simply be considered as the adversary outputting its entire view.
    Formally:
  • Let $\pi$ be a protocol, $\mathcal{F}$ be a functionality, $C$ be the set of corrupted parties, and $Sim$ a simulator algorithm.
  • Define the distribution $Real_{\pi}(\kappa, C; x_1,\ldots, x_n)$ by:
    • running $\pi$ with security param $\kappa$, where each $P_i$ runs honestly with input $x_i$.
    • denote the final view and output of each party as $V_i$ and $y_i$ respectively.
    • output: ${V_i \ : \ i \in C}$ and $(y_1,\ldots,y_n)$.
  • Define the distribution $Ideal_{\mathcal{F},Sim}(\kappa, C; x_1,\ldots, x_n)$ by:
    • compute $(y_1, \ldots, y_n) \leftarrow \mathcal{F}(x_1, \ldots, x_n)$.
    • output: $Sim(C, {(x_i, y_i) \ : \ i \in C})$ and $(y_1, \ldots, y_n)$.
  • We say a protocol $\pi$ securely realizes $\mathcal{F}$ in the presence of semi-honest adversaries if there exists a simulator $Sim$ such that, for every subset of corrupt parties $C$ and all inputs $x_1, \ldots, x_n$, the above distributions are indistinguishable in $\kappa$.

Malicious Security

A malicious a.k.a. active adversary can deviate arbitrarily from the protocol. Malicious inputs are not well defined in the ideal world, so the simulator extracts an effective ideal-world input from the real-world adversary that "explains" the input's real-world effect. In many constructions this can just be a black box oracle implementing the real-world adversary.
Formally:

  • Let $\mathcal{A}$ be a real-world adversary, $corrupt(\mathcal{A})$ denote the set of corrupted parties, and $corrupt(Sim)$ denote the set of parties corrupted by the ideal-world adversary $Sim$.
  • Define distribution $Real_{\pi, \mathcal{A}}(\kappa ; {x_i \ : \ i \notin corrupt(\mathcal{A})})$ by:
    • run $\pi$ with security param $\kappa$, where each honest party uses their private input $x_i$ but corrupt parties use whatever inputs $\mathcal{A}$ chooses.
    • let $y_i$ denote output of honest party $P_i$, and $V_i$ be the view of party $P_i$.
    • output: ${V_i \ : \ i \in corrupt(\mathcal{A})}$ and ${ y_i \ : \ i \notin corrupt(\mathcal{A}) }$.
  • Define distribution $Ideal_{\mathcal{F}, Sim}(\kappa ; {x_i \ : \ i \notin corrupt(\mathcal{A})})$ by:
    • run $Sim$ until it outputs a set of inputs ${ x_i \ : \ i \in corrupt(\mathcal{A}) }$.
    • compute $(y_1, \ldots, y_n) \leftarrow \mathcal{F}(x_1, \ldots, x_n)$.
    • give ${y_i \ : \ i \in corrupt(\mathcal{A})}$ to $Sim$.
    • let $V^*$ denote the final set of views output by $Sim$
    • output: $V^*$ and ${y_i \ : \ i \notin corrupt(Sim) }$.
  • We say a protocol $\pi$ securely realizes $\mathcal{F}$ in the presence of malicious adversaries if for every real-world adversary $\mathcal{A}$ there exists a simulator $Sim$ with $corrupt(\mathcal{A}) = corrupt(Sim)$ such that, for all inputs of honest parties ${x_i \ : \ i \notin corrupt(\mathcal{A})}$, the above distributions are indistinguishable in $\kappa$.
  • Functionalities can also be reactive, i.e. have their own internal state and interact with parties over multiple rounds.
  • Not all functionalities can be computed with the implicit output fairness above. Results in the malicious setting often provide a weaker property of security with abort, wherein the functionality first delivers outputs to the corrupt parties, who can then instruct the functionality to continue delivering output to honest parties, or send them an abort $\bot$ instruction.
  • The above definitions assume static corruption, that is, the corrupt parties are static throughout the protocol. We may also need to consider adaptive corruption wherein an adversary can choose which parties to corrupt during protocol execution.
    • Here, we must adapt the adversary in our paradigm to have a "corrupt $P_i$" command.
    • In the real world, this means they learn the current view and private randomness of $P_i$, and subsequently take over its messages.
    • Simulating this in the ideal world is rather challenging, and most research assumes static corruption (CITATION NEEDED).

Composition

  • Composing secure protocols does not guarantee security. The universal composability (UC) framework does provide this guarantee. The above definitions of real/ideal world are both augmented with an environment that provides context to the protocol. The environment has a goal of trying to learn whether it is in the ideal or real world, and the protocol is considered UC-secure if the environment has no non-negligible advantage in guessing correctly.
  • However, a weaker notion of combining protocols sequentially often is secure.

Simulation Proof Technique

Preliminaries

Things are defined in these weird ass-backward ways because, it just so happens, it is incredibly awkward and difficult to mathematically define "adversary learns nothing" in a useful way.

Probability ensembles $X, Y$ are computationally indistinguishable, denoted $X \overset{c}{\equiv} Y$, if:

  • For every non-uniform polynomial-time algorithm $D$,
    • there exists negligible function $\mu(\cdot)$
      • such that for all $a \in {0, 1}^*$ and all $n \in \mathbb{N}$,
        • $| Pr[D(X(a, n)) = 1] - Pr[D(Y(a, n)) = 1] | \le \mu(n)$
  • Notice that the negligible function $\mu$ must be valid for all $a$.
  • In the negation, when $X,Y$ are distinguishable, there can be a different $a$ for each $n$.

Semantic Security

A private key encryption scheme $(G, E, D)$ is semantically secure if:

  • For every non-uniform probabilistic-polynomial-time algorithm $\mathcal{A}$,
    • there exists a non-uniform probabilistic-polynomial-time algorithm $\mathcal{A}'$,
      • such that for every probability ensemble ${X_n}_{n \in \mathbb{N}}$ with $|X_n| \le poly(n)$,
      • every pair of polynomially-bounded functions $f, h : {0, 1}^* \to {0, 1}^*$,
      • every positive polynomial $p(\cdot)$,
      • all sufficiently large $n$: $$\begin{align*}
        \underset{k\leftarrow G(1^n)}{Pr}\left[\mathcal{A}(1^n,E_k(X_n),1^{|X_n|},h(1^n,X_n))=f(1^n,X_n)\right]&\\ <Pr\left[\mathcal{A}'(1^n,1^{|X_n|},h(1^n,X_n))=f(1^n,X_n)\right]&+\frac{1}{p(n)}
        \end{align*}$$

This actually isn't so bad. First, notations are weird but I assume:

  • $k\leftarrow G(1^n)$ is generating a key polynomial in $n$, trivially passed ones for some reason. I think the reason is that the adversaries are given "tapes" and not integers? Something like that.
  • $E_k(X_n)$ is of course the encryption of a random plaintext $X_n$ under key $k$
  • $h$ denotes auxiliary information that could be leaked about the plaintext
  • $f$ is what the adversary tries to learn, given the plaintext length, ciphertext, and $h$.
  • So basically, this says that $\mathcal{A}'$, which by definition doesn't even see the ciphertext, learns roughly the same amount as $\mathcal{A}$.
  • We can view $\mathcal{A}'$ as being in an "ideal world" where the encryption is 100% secure.

The proof technique to show semantic security is called a simulation: $\mathcal{A}'$ simulates execution of $\mathcal{A}$ and relays its output. That is, given its input $1^n, 1^{|X_n|}, h = h(1^n, X_n)$:

  1. $\mathcal{A}'$ runs $G(1^n)$ to get $k$.
  2. $\mathcal{A}'$ computes $c = E_k(0^{|X_n|})$ as an encryption of zeroes, or "garbage"
  3. $\mathcal{A}'$ runs $\mathcal{A}(1^n, c, 1^{|X_n|}, h)$ and outputs whatever it gets.
  • Notice that if the encryptions are indistinguishable then $\mathcal{A}$ should output $f(1^n, X_n)$ with approximately the same probability whether given $E_k(X_n)$ or $E_k(0^{|X_n|})$. So, these proofs proceed by showing that any non-negligible difference in the above probabilities can be converted into a distinguisher of e.g. the encryption method vs. random.

Secure Computation - Simulation for Semi-Honest Adversaries

Consider a model of two-party computation in the presence of static semi-honest adversaries. Such an adversary:

  • controls one of the parties
  • is static, so defined at the onset of computation
  • follows the protocol specification exactly
  • but, it tries to learn more information by viewing the transcript and its own internal state.
  • this is a very weak security model, it just provides guarantee of no inadvertent leakage,
  • so this should only be used when parties trust each other, but want to make sure that no record of their input is found elsewhere

We model the two-party protocol as a possibly random process called a functionality $f: {0,1}^* \times {0,1}^* \to {0,1}^* \times {0,1}^*$ (tuple elem for each party). For every pair of inputs $x, y \in {0, 1}^n$, the output-pair is a random variable $(f_1(x,y), f_2(x,y))$ ranging over pairs of strings. The $i$th party wishes to obtain $f_i(x,y)$ .

We proceed with a simulation proof that demonstrates that whatever a party can compute, they could compute it with just their input and output only. In our simulation, because we assume parties are semi-honest, their outputs are well defined; that is, the output $f(x,y)$ is not dependent on the adversary, and so the simulator can be given this value. A malicious adversary could ignore the input tape and take any other input (which makes proofs much more difficult!).

We say that a two party protocol $\pi$ securely computes functionality $f = (f_1, f_2)$ in the presence of static semi-honest adversaries if there exist probabilistic polynomial-time algorithms $\mathcal{S}_1, \mathcal{S}_2$ such that

  • $\left{\left(\mathcal{S}1 (1^n, x, f_1(x, y)), f(x,y)\right)\right}{x,y,n} \overset{c}{\equiv} \left{\left(view_1^{\pi}(x,y,n), output^{\pi}(x,y,n) \right)\right}$
  • $\left{\left(\mathcal{S}2 (1^n, y, f_2(x, y)), f(x,y)\right)\right}{x,y,n} \overset{c}{\equiv} \left{\left(view_2^{\pi}(x,y,n), output^{\pi}(x,y,n) \right)\right}$
  • where $x,y \in {0,1}^*$ such that $|x| = |y|$ and $n \in \mathbb{N}$.

Notice the simulator $\mathcal{S}_i$ doesn't just need to generate a string indistinguishable from the view of party $i$; but rather the joint distribution, with the output of a party and functionality output. This is necessary for probabilistic functionalities. For deterministic functionalities, it suffices to just handle the former.