DupLESS

Server aided encryption for deduplicated storage

Welcome! Look below for papers on DupLESS and the underlying cryptographic tool, message-locked encryption, links to the source code, and a bunch of FAQs.

Publications

DupLESS: Server-Aided Encryption for Deduplicated Storage
USENIX Security Symposium 2013

Message-Locked Encryption and Secure Deduplication
Eurocrypt 2013

Download

Code • Instructions

FAQs

What is deduplication?

Deduplication is avoiding storage of multiple copies of the same data. Cloud storage service providers such as Dropbox, Mozy, and others perform deduplication to save space by only storing one copy of each file uploaded.

Can deduplication be combined with encryption?

Deduplication does not work with conventional encryption of course: should clients locally encrypt their files with their individual keys, different ciphertexts would emerge even for identical plaintexts, making deduplication impossible.

Convergent encryption is a widely used technique to combine the storage savings of deduplication while still providing decent privacy guarantees for the data. In convergent encryption, the message is encrypted under a key derived by the hashing the message itself, along with a public parameter.

For example, to encrypt M, we could get K ← H(P, M) and derive C ← Enc(K, M, IV_0). Here, P is a 128-bit string picked at random and fixed throughout the system, IV_0 is the all-0 initialization vector, H could implemented through SHA256 and Enc through AES128 in CTR mode.

Message-locked encryption is a cryptographic primitive that broadens convergent encryption and captures the properties needed for enabling deduplication over ciphertexts (see the Eurocrypt 2013 paper for more on message-locked encryption).

However, convergent encryption (and message-locked encryption in general) is inherently subject to brute-force attacks that can recover files falling into a known set. If a file M is picked from a set S, given a convergent encryption ciphertext C of M, an attacker can trial-decrypt with each candidate file M' in S by deriving K ← H(P, M') and checking if M' = Dec(K, C).

How serious are brute-force attacks on convergent encryption?

The brute-force attack described above can reach high speeds (up to 12000 attempts per second on a single processor, see the DupLESS paper for details), especially if S consists of short files. Moreover, the attack is completely parallelizable.

How does DupLESS protect against brute-force attacks?

DupLESS was built from the observation that the brute-force attack can be dealt with by using a key server (KS) to derive keys, instead of setting keys to be hashes of messages.

Consider an enterprise network, consisting of a group of affiliated clients (for example, employees of a company) using an external deduplicated cloud storage service. Along with the clients, the network also includes a KS. Access to the KS is preceded by authentication, which stops external attackers. The increased cost slows down brute-force attacks from compromised clients, and now the KS can function as a (logically) single point of control for implementing rate-limiting measures.

If the attacker cannot authenticate herself to the KS, we ensure high security. (Effectively, semantic security, except that ciphertexts leak equality of the underlying plaintexts. The latter is necessary for deduplication.)

Even if the attacker compromises a client and gains access to the KS, the attacker has to launch an online attack, interacting with the KS for each attempt. Moreover, the KS limits the number of queries an attacker can make in a given time period. The limit is set after analysing general usage patterns in the system: for example, if no client performs more than 2 million writes per week, and the pattern keeps repeating on a weekly basis, the KS will not reply to more than 2 million queries in a week. Effectively, this brings the brute-force attack down to 3 queries per second, if the size of S is much larger than 2 million.

Finally, even if the KS is compromised, DupLESS falls back to a usual convergent encryption style guarantee.

How much overhead will the system incur with DupLESS?

DupLESS incurs only slight overheads compared to using no encryption at all.

We deployed a prototype KS on Amazon EC2 and experimentally evaluated its performance in a bunch of experiments. For example, in one of our experiments, we uploaded a 1 MB file to Dropbox, first without any encryption, and then with DupLESS. The bandwidth overhead is less than 1% and the overhead in the time to store a file is about 17%.

People

Mihir Bellare • Sriram Keelveedhi • Thomas Ristenpart