Security Proofs
Defining MPC
This is a higher level introduction to security in the context of MPC. Many of the notions are re-defined more precisely in the Proof Technique section. Note that this comes from the Secure Computation book, and all chapters are available for free. In particular the Oblivious Data Structures and Alternative Threat Models may be worth exploring.
Preliminaries
- Assume secure channels among parties.
- Denote:
- : computational security param
- Governs hardness of problems that can be broken by adversary's offline computation
- Typically 128 or 256
- : statistical security param
- Typically 40 or 80
- interpretation: security violated with probability , where is a negligible function depending on resources of the adversary.
- : computational security param
- See below for computationally indistinguishable definition. If we relax to be any algorithm regardless of its computational complexity, this becomes statistical indistinguishability.
- In this case, the probability is bounded by the statistical distance:
- In this case, the probability is bounded by the statistical distance:
Primitives
- Secret Sharing: primitive defined in an obvious way
- Random Oracle: a heuristic model for security of hash functions
- In this model, all parties have access to public
- On input ,
- if has never seen , it picks a random , remembers it, and returns it.
- if has seen , it returns the remembered .
- This model only captures attacks that treat the hash function as a black box.
- You can indeed construct schemes that are secure in the random oracle model, but insecure whenever is instantiated by a concrete function.
- Despite this, RO model is often acceptable for practical applications, and can lead to more efficient constructions.
Real/Ideal Paradigm
- In the ideal world, parties securely compute the function by privately sending inputs to a completely trusted party , referred to as the functionality.
- Each party has associated input , sends it to , who computes and returns the result to all parties.
- Consider an adversarial attack:
- It can take over any but not the trusted party .
- is the only message received, so the adversary learns no more than this.
- In the real world, there is no trusted party, rather parties communicate using a protocol .
- Protocol specifies "next message" function for each party
- inputs: security param, private party input , a random tape, and all messages received so far
- outputs: either next message to send and destination, or instruct to terminate with some output
- Adversaries can corrupt parties.
- Corruption at start of protocol is equivalent to an adversarial party.
- Corrupt parties might follow the protocol or deviate from it - depends on the threat model.
- Protocol specifies "next message" function for each party
- is secure if the effect an adversary can achieve in the real world can also be achieved by a corresponding adversary in the ideal world.
Semi-Honest Security
A semi-honest a.k.a. honest-but-curious adversary is one who corrupts parties but follows the protocol as specified; they just try to learn as much as possible the protocol messages they receive.
- This may involve several colluding parties pooling their views together.
- Also considered passive, since they can't take any actions other than std protocol.
- An "attack" can simply be considered as the adversary outputting its entire view.
Formally: - Let be a protocol, be a functionality, be the set of corrupted parties, and a simulator algorithm.
- Define the distribution by:
- running with security param , where each runs honestly with input .
- denote the final view and output of each party as and respectively.
- output: and .
- Define the distribution by:
- compute .
- output: and .
- We say a protocol securely realizes in the presence of semi-honest adversaries if there exists a simulator such that, for every subset of corrupt parties and all inputs , the above distributions are indistinguishable in .
Malicious Security
A malicious a.k.a. active adversary can deviate arbitrarily from the protocol. Malicious inputs are not well defined in the ideal world, so the simulator extracts an effective ideal-world input from the real-world adversary that "explains" the input's real-world effect. In many constructions this can just be a black box oracle implementing the real-world adversary.
Formally:
- Let be a real-world adversary, denote the set of corrupted parties, and denote the set of parties corrupted by the ideal-world adversary .
- Define distribution by:
- run with security param , where each honest party uses their private input but corrupt parties use whatever inputs chooses.
- let denote output of honest party , and be the view of party .
- output: and .
- Define distribution by:
- run until it outputs a set of inputs .
- compute .
- give to .
- let denote the final set of views output by
- output: and .
- We say a protocol securely realizes in the presence of malicious adversaries if for every real-world adversary there exists a simulator with such that, for all inputs of honest parties , the above distributions are indistinguishable in .
- Functionalities can also be reactive, i.e. have their own internal state and interact with parties over multiple rounds.
- Not all functionalities can be computed with the implicit output fairness above. Results in the malicious setting often provide a weaker property of security with abort, wherein the functionality first delivers outputs to the corrupt parties, who can then instruct the functionality to continue delivering output to honest parties, or send them an abort instruction.
- The above definitions assume static corruption, that is, the corrupt parties are static throughout the protocol. We may also need to consider adaptive corruption wherein an adversary can choose which parties to corrupt during protocol execution.
- Here, we must adapt the adversary in our paradigm to have a "corrupt " command.
- In the real world, this means they learn the current view and private randomness of , and subsequently take over its messages.
- Simulating this in the ideal world is rather challenging, and most research assumes static corruption (CITATION NEEDED).
Composition
- Composing secure protocols does not guarantee security. The universal composability (UC) framework does provide this guarantee. The above definitions of real/ideal world are both augmented with an environment that provides context to the protocol. The environment has a goal of trying to learn whether it is in the ideal or real world, and the protocol is considered UC-secure if the environment has no non-negligible advantage in guessing correctly.
- However, a weaker notion of combining protocols sequentially often is secure.
Simulation Proof Technique
Preliminaries
Things are defined in these weird ass-backward ways because, it just so happens, it is incredibly awkward and difficult to mathematically define "adversary learns nothing" in a useful way.
Probability ensembles are computationally indistinguishable, denoted , if:
- For every non-uniform polynomial-time algorithm ,
- there exists negligible function
- such that for all and all ,
- such that for all and all ,
- there exists negligible function
- Notice that the negligible function must be valid for all .
- In the negation, when are distinguishable, there can be a different for each .
Semantic Security
A private key encryption scheme is semantically secure if:
- For every non-uniform probabilistic-polynomial-time algorithm ,
- there exists a non-uniform probabilistic-polynomial-time algorithm ,
- such that for every probability ensemble with ,
- every pair of polynomially-bounded functions ,
- every positive polynomial ,
- all sufficiently large : $$\begin{align*}
\underset{k\leftarrow G(1^n)}{Pr}\left[\mathcal{A}(1^n,E_k(X_n),1^{|X_n|},h(1^n,X_n))=f(1^n,X_n)\right]&\\ <Pr\left[\mathcal{A}'(1^n,1^{|X_n|},h(1^n,X_n))=f(1^n,X_n)\right]&+\frac{1}{p(n)}
\end{align*}$$
- there exists a non-uniform probabilistic-polynomial-time algorithm ,
This actually isn't so bad. First, notations are weird but I assume:
- is generating a key polynomial in , trivially passed ones for some reason. I think the reason is that the adversaries are given "tapes" and not integers? Something like that.
- is of course the encryption of a random plaintext under key
- denotes auxiliary information that could be leaked about the plaintext
- is what the adversary tries to learn, given the plaintext length, ciphertext, and .
- So basically, this says that , which by definition doesn't even see the ciphertext, learns roughly the same amount as .
- We can view as being in an "ideal world" where the encryption is 100% secure.
The proof technique to show semantic security is called a simulation: simulates execution of and relays its output. That is, given its input :
- runs to get .
- computes as an encryption of zeroes, or "garbage"
- runs and outputs whatever it gets.
- Notice that if the encryptions are indistinguishable then should output with approximately the same probability whether given or . So, these proofs proceed by showing that any non-negligible difference in the above probabilities can be converted into a distinguisher of e.g. the encryption method vs. random.
Secure Computation - Simulation for Semi-Honest Adversaries
Consider a model of two-party computation in the presence of static semi-honest adversaries. Such an adversary:
- controls one of the parties
- is static, so defined at the onset of computation
- follows the protocol specification exactly
- but, it tries to learn more information by viewing the transcript and its own internal state.
- this is a very weak security model, it just provides guarantee of no inadvertent leakage,
- so this should only be used when parties trust each other, but want to make sure that no record of their input is found elsewhere
We model the two-party protocol as a possibly random process called a functionality (tuple elem for each party). For every pair of inputs , the output-pair is a random variable ranging over pairs of strings. The th party wishes to obtain .
We proceed with a simulation proof that demonstrates that whatever a party can compute, they could compute it with just their input and output only. In our simulation, because we assume parties are semi-honest, their outputs are well defined; that is, the output is not dependent on the adversary, and so the simulator can be given this value. A malicious adversary could ignore the input tape and take any other input (which makes proofs much more difficult!).
We say that a two party protocol securely computes functionality in the presence of static semi-honest adversaries if there exist probabilistic polynomial-time algorithms such that
- $\left{\left(\mathcal{S}1 (1^n, x, f_1(x, y)), f(x,y)\right)\right}{x,y,n} \overset{c}{\equiv} \left{\left(view_1^{\pi}(x,y,n), output^{\pi}(x,y,n) \right)\right}$
- $\left{\left(\mathcal{S}2 (1^n, y, f_2(x, y)), f(x,y)\right)\right}{x,y,n} \overset{c}{\equiv} \left{\left(view_2^{\pi}(x,y,n), output^{\pi}(x,y,n) \right)\right}$
- where such that and .
Notice the simulator doesn't just need to generate a string indistinguishable from the view of party ; but rather the joint distribution, with the output of a party and functionality output. This is necessary for probabilistic functionalities. For deterministic functionalities, it suffices to just handle the former.