Timing-Attacks-Reading
Timing Attacks on Diffie Helman, RSA, et. al.
This paper presents a powerful timing attack that can recover an entire secret key from a cryptosystem. The submitted review is here.
The attack
First, a demonstration of the attack is shown against a simple modular exponentiation algorithm which computes $R = y^x \mod n$. Here, $x$ is the secret key and is $w$ bits long. The algorithm only computes the multiplication $R_b = s_b \cdot y) \mod n$ if the $b$th bit of $x$ is 1. The attack is inductive; at each step, we use the knowledge of the first $b$ bits of $x$ to find bit $b+1$. If we know that computing $R_b = (s_b \cdot y) \mod n$ is slow for certain $s_b$ and $y$ values, and we observe that total modular exponentiation is fast, then we can conclude bit $b$ is zero, and vice versa. One advantageous aspect of this algorithm is that wrong steps are pretty simple to detect statistically; once a wrong bit is guessed, future timings will have greater variances. Keeping track of guesses and their probabilities reduces the number of samples needed. Using this technique results in an attack where the number of samples needed is proportional to the size $w$ of the key.
This attack is then put to the test against RSAREF. The attack is slightly modified, because RSAREF precomputes $y^2$ and $y^3$ to process two exponent bits at a time, cutting the number of loop iterations in half. So the attack needs to work with four candidates at a time ($x_{b-1}x_b \in {0, 1}^2$), instead of two. However, using this attack, correct guesses at each exponent bit are correct with probability 0.885, and it is therefore extremely effective.
A discussion follows about Montgomery Multiplication, CRT, and DSS. In each of these, the author lays out a hand-wavy explanation of how the same style of attack may be theoretically possible, albiet more difficult, in these scenarios.
The defense
The author explains that the initial obvious solutions, e.g. adding random delays or always computing the multiplication steps instead of branching, don't actually work. Random delays just requires a larger sample size. Always computing the $(s_b \cdot y) \mod n$ calculation doesn't solve the problem either, since the next squaring step still depends on $x_b$, and is therefore still subject to timing attacks.
A better solution is to use blinding. Essentially, a secret masking pair ($v_i, v_f$) is chosen based on $x$ and $n$. The input message gets multiplied by $v_i \mod n$ before going through the modular exponentiation function, and then it is corrected afterwards by multiplying the output by $v_f \mod n$. This way, the attacker doesn't know the input to the modular exponentiation function, so the attack above is not possible.