Pixel-Perfect-Timing-Attacks-Review

Name: Samuel Tay
UW Net ID: tays
Paper: Pixel Perfect Timing Attacks with HTML5
Authors: Paul Stone

Paper Review

Problem: The problem presented in this paper is that the requestAnimationFrame browser API can be used to time rendering actions, leaking unintended information. This is demonstrated explicitly in two separate ways:
- leaking browser history by timing the styling of visited/unvisited links
- leaking arbitrary data (though mostly restricted in this paper to text content with known font face/sizes) from an embedded iframe.
Approach: The approach of this paper is first to explain how the requestAnimationFrame can be used to time rendering actions in a general sense.
- Next the paper walks through a browser sniffing attack that works across Firefox, IE, and Chrome (albiet with minor differences in the approach for certain browsers). Basically, apply an expensive text-shadow property to links, so that they are expensive to render, and use the requestAnimationFrame API to time the rendering. In IE & Firefox, the asynchronous restyling of visited URLs will run and be detected after a few frames. In Chrome, the algorithm has an additional step to update the hrefs after the page has already loaded. Because visited URL styling happens synchronously in Chrome, one can simply time the next frame after the link update and see if it took a while to render; if so, conclude that the expensive link styling was re-drawn and that URL has been visited.
- As a segue to the final attack, sniffing history via SVG filters is explained. This involves a few moving pieces. First, style visited links as a white square and unvisited links as a black square:
```
   a {
    background-color:black
  }
  a: visited {
    background-color: white
  }
```
then use two SVG filters to turn the white squares into noisy squares, and finally a morphology filter is applied that takes longer to render on noisy images than the black images. Then the same timing technique can be used as in the first example.
- Finally, a more flexible attack is explained to read arbitrary pixels from an iframe. This is done by using -moz-element to mirror the iframe into a separate div, and then using CSS properties on that mirror to scale up the pixels into larger squares. In particular, when using the view-source feature in Firefox, the source code of the website will be loaded into the iframe with a known font face and size. This is pretty easy to extract into text! And if the iframe loads a site like mail.google.com, which the user is likely to be logged into, they can obtain sensitive information like email or login usernames.
- Conclusions: The conclusions are that these attacks are quite feasible in a controlled environment, but tricky, slow and not totally reliable in the wild (at least not in ~ 2013 when written). Some context is given on the difficulty faced by browser vendors, and how often performance and privacy are fundamentally at odds. Finally, advice is offered for both website owners and end users to best protect against the attacks described.
New ideas:
- Perhaps we can assume that using requestAnimationFrame for timing is a novel idea at the time of writing.
- The idea to use view-source within the iframe makes this attack awfully plausible.
Improvements:
- The stated attack requires a calibration on known visited links; however, I feel like a basic statistical measurement could be taken during the algorithm which would make the attack more feasible.
- I think the author undercuts the severity of the findings by describing them as too slow to be realistic. While it is quite slow, if this was injected onto a website where you might spend a while (reading an article, watching a video, etc.), that website might indeed have a full 60s of time to run this algorithm. Or if you simply left a tab open, which everyone does.
New directions:
- The paper talks about needing to know the font family and size; sure, maybe if the entire attack is happening within the script running on the browser. But a far more flexible and lethal attack would be to simply scrape the raw pixel data and send it to an ML algorithm off line, which could quite easily extract the text from arbitrary pixelated fonts. This would be an interesting extension of the ideas in the paper.
- Another new direction or extension would be a more sophisticated encoding of pixels. The current attack is limited to white/black pixels but a cleverer SVG encoding, perhaps one that sends N ranges of RGB values into N images with distinct noise levels, can obtain a lossy but richer encoding of arbitrary web content.