Information risks a-gurgling

There are clearly substantial information risks associated with the redaction of sensitive elements from disclosed reports and other formats, risks that the controls don't necessarily fully mitigate.

Yes, controls are fallible and constrained, leaving residual risks. This is hardly Earth-shattering news to any competent professional or enlightened infidel, and yet others are frequently shocked.

A new report* from a research team at the University of Illinois specifically concerns failures in the redaction processes and tools applied to PDF documents. The physical size of redacted text denoted (covered or replaced) with a variable-length black rectangle may give clues as to the original content, while historically a disappointing number of redaction attempts have failed to prevent the original information being recovered simply by removing the cover images or selecting then pasting the underlying text. Doh!

If the original text uses monospaced fonts and left or right paragraph justification, the number of redacted characters is trivial to determine. Proportional fonts and full justification make the information recovery process a little more involved and less accurate but certainly not impossible.

On top of that, there may be many other clues as to the redacted content from the document (e.g. the context and metadata, plus unredacted copies or ) or from other sources (e.g. correlated information), and even wild guesses can reduce the number of possibilities compared to an effectively redacted document. If you work in, say, countersurveillance or counterterrorism actively hunting for moles, spies or whistleblowers in the camp, you may well start with a shortlist of named suspects, significantly increasing the probability of correctly guessing their names from the character counts if they appear in ineptly-redacted disclosures.

Furthermore, there are information risks associated with the redaction and publication process as a whole, including governance and transparency issues. These risks and possible controls are out of scope of the research paper and are largely ignored by redaction standards and advisories such as ISO/IEC 27038, NSA Report # I333-015R-2005 and assorted guidance from the US courts**.

As if that's not enough already, most of the advice in this area concerns redaction of documents produced by office software: redaction of audio and video streams, or databases, or statistical data sets, or biometrics, faces some shared and some novel risks requiring still further controls.

Hinson tip: whispering top secret spy matters in a hotel bathroom with the shower running, or in a tawdry bedsit with the radio turned up loud, is not a terribly effective form of audio redaction. As to spooks using bugs, or laser/microwave monitoring of sound waves on reflective surfaces, or lip-reading through binoculars or drones, well if that kind of stuff truly matters to you, I sincerely hope you know a lot more than me about it!

The assurance aspect intrigues me. Particularly if the risks - the combination of threats, vulnerabilities and impacts, remember - are significant (such as when disclosing reports that name victims, suspects, informants or spies), it makes sense to invest effort into detecting as well as avoiding or preventing redaction failures, but I am not presently aware of any readily-available standards, advice, methods or tools in this area. Of course those too would have information risks (such as access to the redacted documents submitted for testing, plus procedural errors, design flaws, bugs And All That) as information risk spirals away down the plughole.

Maybe it's time to accept that redaction, as currently recommended and practiced, is a rotten, unreliable, fragile control. There's a lot to be said for preparing, reviewing and releasing carefully and competently-crafted summaries or precis of important content rather than attempting to redact the originals, or simply of course refusing outright to divulge the information at all or denying its very existence, Ultra-style (provided those information risks are appropriately managed: plughole, here we come again).

* Thanks to someone's heads-up mention on LinkeDin and a piece in Wired for pointing it out.

** By the way, US District Court of the Southern District of Alabama, physically snipping out and shredding pieces of paper containing redacted print may be close to but is not 100% effective. See the 2nd paragraph of this very piece. Security is asymptotic.