Documentation: Scenarios | Installation | Concepts | FAQ | Troubleshooting | Publications | License
This page contains a brief description of Vanish's basic concepts and high-level architecture. We defer to our paper for a more detailed explanation of how Vanish works. We focus the discussion primarily on Web usages of Vanish, although Vanish can be used more broadly than that.
A VDO is the object that results from encapsulating some clear-text data using Vanish. It contains an encrypted version of your data and some metadata necessary for Vanish at decryption. The VDO is what you should upload to Web sites.
A VDO looks just like data — albeit scrambled data — and can be included in virtually any object that is rendered by the browser: e.g., Facebook messages, Google Docs, Gmail or Yahoo! Mail emails, blog posts, etc. The screenshots below show how VDOs can be incorporated into a Gmail email, a Google Doc, and a Facebook message:
VDOs on Gmail | VDOs on Google Docs | VDOs on Facebook |
For those of you familiar with PGP or GPG, a VDO looks similar to a PGP-encrypted message, as illustrated below:
-----BEGIN VANISH MESSAGE----- This message will self-destruct by 04:14 on 07/05/09. Use http://vanish.cs.washington.edu to read this message. AKztAAVzcgBGZWR1Lndhc2hpbmd0b24uY3MudmFuaXNoLmludGVybmFsLm1ldGFkYXRhLmltcGwuRXBvY2hBd2FyZU1ldGFkYXRhSW1wbE1yi FVDGn2bAgACSgAMZXBvY2hfbGVuZ3RoTAAIbWV0YWRhdGF0ADVMZWR1L3dhc2hpbmd0b24vY3MvdmFuaXNoL2ludGVybmFsL21ldGFkYXRhL0 1ldGFkYXRhO3hwAAAAAAG3dABzcgBHZWR1Lndhc2hpbmd0b24uY3MudmFuaXNoLmludGVybmFsLm1ldGFkYXRhLmltcGwuSW5kaXJlY3RLZXl NZXRhZGF0YUltcGw6bcmI6fsf7QIAAlsAEmVuY3J5cHRlZF9kYXRhX2tleXQAAltCTAAIbWV0YWRhdGFxAH4AAXhwcHNyAEFlZHUud2FzaGlu Z3Rvbi5jcy52YW5pc2guaW50ZXJuYWwubWV0YWRhdGEuaW1wbC5CYXNpY01ldGFkYXRhSW1wbNgVQUjt/E3XAgACSgANbG9jYXRpb25fc2VlZ EwABnBhcmFtc3QANkxlZHUvd2FzaGluZ3Rvbi9jcy92YW5pc2gvaW50ZXJuYWwvbWV0YWRhdGEvVkRPUGFyYW1zO3hwI0GX1yE7og9zcgA0ZW R1Lndhc2hpbmd0b24uY3MudmFuaXNoLmludGVybmFsLm1ldGFkYXRhLlZET1BhcmFtc7292Mmleh6MAgAISgALY3JlYXRpb25fdHNJABVlbmN yeXB0aW9uX2tleV9sZW5ndGhJAApudW1fc2hhcmVzSQAJdGhyZXNob2xkSQAJdGltZW91dF9oSgAGdmRvX2lkTAAUZW5jcnlwdGlvbl9hbGdv cml0aG10ABJMamF2YS9sYW5nL1N0cmluZztMAA9lbmNyeXB0aW9uX21vZGVxAH4ACnhwAAABIkcjCoUAAACAAAAACgAAAAcAAAAITEjPY9yDp sh0AANBRVN0AANDQkPaxwpTkdhvG0nYDtLWr2PF -----END VANISH MESSAGE----- (This VDO has long expired, so any attempt to decapsulate it will fail.)
However, VDOs and PGP messages differ in the threats they counter. A VDO ensures that the encapsulated data becomes at some point permanently unavailable to anyone, even to those who can obtain the user's decryption keys or passphrases. However, during the lifetime of a VDO, anyone with access to the VDO can access the clear-text data encapsulated in that VDO. In contrast, PGP prevents access to adversaries assumed to be incapable of obtaining the user's passphrases, but this protection is maintained for all time. Thus, while VDOs protect against extremely powerful retroactive attacks (such as a subpoena issued during a child dispute for an email that was sent a year ago), PGP protects against less sophisticated attackers, such as communication eavesdroppers or sniffers, but provides this protection at all times. That said, Vanish and PGP/GPG can complement each other nicely to provide you with protection against both types of attacks under certain usage scenarios.
However, once the VDO's timeout is passed, the decryption key has been lost and the metadata or the encrpypted data inside the VDO become worthless for all time.
The figure to the right illustrates the high-level system architecture. At its core, Vanish takes a data object D (and possibly an explicit timeout T), and encapsulates it into a VDO V.
In more detail, to encapsulate the data D, Vanish picks a random data key, K, and encrypts D with K to obtain a ciphertext C. Vanish uses threshold secret sharing to split the data key K into N pieces (shares) K1, ..., KN. A parameter of the secret sharing is a threshold that can be set by the user or by an application using Vanish. The threshold determines how many of the N shares are required to reconstruct the original key. For example, if we split the key into N=20 shares and the threshold is 10 keys, then we can compute the key given any 10 of the 20 shares.
Once Vanish has computed the key shares, it sprinkles those shares at randomly generated locations in a gigantic-scale, geographically-distributed, peer-to-peer network arranged in a structure called a distributed hashtable (DHT). Each key share, then, will be entrusted to a small set of nodes in this massive-scale DHT, which can be running in different parts of the world and which store the share for a pre-configured period of time.
The final VDO V consists of (L, C, N, threshold) and is sent over to the email server or stored in the file system upon encapsulation. The decapsulation of V happens in the natural way, assuming that it has not timed out. Given VDO V, Vanish (1) extracts the access key, L, (2) derives the locations of the shares of K, (3) retrieves the required number of shares as specified by the threshold, (4) reconstructs K, and (5) decrypts C to obtain D.
Our choice of DHTs as storage backends for Vanish stems from three unique DHT properties that make them attractive for our data destruction goals. First, their huge scale (over 1 million nodes for the Vuze DHT), geographical distribution of nodes across many countries, and complete decentralization make them robust to powerful and legally influential adversaries.
Second, DHTs are designed to provide reliable distributed storage; we leverage this property to ensure that the protected data remains available to the user for a desired interval of time.
Last but not least, DHTs have an inherent property that we leverage in a unique and non-standard way: the fact that the DHT is constantly changing means that the sprinkled information will naturally disappear (vanish) as the DHT nodes churn or internally cleanse themselves, thereby rendering the protected data permanently unavailable over time. In fact, it may be impossible to determine retroactively which nodes were responsible for storing a given value in the past.
However, our choice of an existing DHT also introduces limitations to the current implementation. First, Vuze only supports approximately 9-hour timeouts for (index, value) pairs stored in the DHT. Because of this timeout, Vanish cannot provide longer timeouts without additional mechanisms, such as a refresh mechanism for VDO shares in the DHT. Our implementation does provide such a mechanism, however that comes with the additional requirement of having a continuously online Vanish server that handles Vanish refreshes. As we argue here, such a setup makes a lot of sense in certain scenarios, but less in others.
Second, our security evaluation shows that Vuze tends to over-replicate stored data, which raises challenges for a self-destructing data system such as Vanish. The primary effect of over-replication is to expose Vanish shares too needlessly many nodes in the DHT.
We are currently designing a new backend for the Vanish system, which we hope will mitigate these limitations to a large extent.