Advanced cryptography: Symbolic methods for security analysis

Instructor: Daniele Micciancio

Introduction

The topic of this course is the study of symbolic models for the analysis of security protocols, the application of formal methods and tools for automatic or computer assisted protocol analysis, and the relation between symbolic security models and the standard computational model used by modern cryptography.

In this first lecture we discuss:

Background

This course assumes some background in cryptography and implied prerequisites. In particular, it is assumed that students

Typically, students taking this course are expected to have already taken introductory courses in algorithms (CSE202), computational complexity (CSE200), and cryptography (CSE207). If you have not taken all of these courses, but think you have the necessary background consult the instructor to determine if you should take the course or not. E.g., if you have taken the undergradaute cryptography course CSE107, you can probably take this course and do just fine with a little extra reading.

The most important prerequisite for this course is that you have already been exposed to the definitional approach and provable security methodology of modern (computational) cryptography, as described below.

Encryption

The typical problem in cryptography involves two parties (Alice and Bob) that want to communicate over an insecure channel. The channel is inseure in the sense that it can be tapped by an adversary Eve:

Alice ------[Eve]-----> Bob

Alice's goal is to send a message m to Bob in such a way that the message is kept secret from Eve. This can be achieved using encryption.

An encryption scheme is a triple of algorithms (G,E,D) where

  1. G is the key generation algorithm. It takes as input a security parameter n, and outputs a pair of keys (e,d)
  2. E is the encryption algorithm that on input a key e and a message m, outputs a ciphertext E(e,m)
  3. D is the decryption algorithm that on input a key d and a ciphertext c, outputs the underlying message D(d,c)

The algorithms are required to satisfy D(d,E(e,m))=m for all (e,d) in the range of G, so that Bob can recover the message sent by Alice:

Alice: E(e,m) -----[Eve]------> Bob: D(d,E(e,m))=m

It is assumed that Alice and Bob know the encryption/decryption keys (e,d) and that at least the decryption key d is not known to Eve. In a public key encryption scheme (i.e., a scheme where the encryption key e can be made public without compromising security) this can be achieved by having Bob generate the key pair G(n)->(e,d) using the key generation algorithm and publishing e.

The scheme is secure if no adversary can efficiently recover any information about m, given the ciphertext E(e,m) and (in the case of a public key system) the encryption key e. A precise mathematical definition of this security requirement is one of the cornerstones of modern cryptography [Goldwasser & Micali, 1982]. The definition, now standard, is the following.

Definition: A public key encryption scheme (G,E,D) satisfies indistinguishability under chosen plaintext attack (IND-CPA) if for any pair of PPT algorithms A1,A2 (the adversary) the difference between the probability that the following experiment outputs 1 when b=0 or b=1 is negligible

  1. Run the key generation algorithm G(n) -> (e,d)
  2. Run the first stage of the adversary A1(e,n) = (m[0],m[1],state)
  3. Check if |m[0]|=|m[1]| have the same length, and let c = E(e,m[b])
  4. Rum the second stage A2(state,c) and output the result.

Note on asymptotic security: The above definition is asymptotic, i.e., it abstracts from the specific value of the running time of adversary (A1,A2) and the actual success probability of the attack, and offers a concise qualitative statement about the security of the encryption scheme. In practice, when the actual running time and success probability of the attack are important to determine an appropriate value of the security parameter, one can make the definition of security more concrete, by saying that the encryption scheme is (t,s)-secure if for every adversary (A1,A2) running in time t, the difference between the two probabilities is at most s. Asymptotic security requires that for any polynomial t(n)=n^{O(1)} there is a negligible function s(n)=n^{-omega(1)} such that the scheme is (t(n),s(n))-secure for every n.

Whether or not you are familiar with the above definition, a few words of explaination are due. (If you are familiar with the definition, you probably got so used to it that it is good to explicitly think about the implications of specific definitional choices.) The definition captures the following important properties:

All of the above points can be formalized by giving precise definitions, and proving that the new definitions are implied by IND-CPA. An example of such definition is the "semantic security" property originally defined by Goldwasser & Micali (1982), which informally states that for any adversary A that tries to compute partial information f(m) given the public key e, ciphertext E(e,m) and apriori information h(m) about the message, there is a simulator S that computes f(m) with essentially the same probability given only e, h(m), |m|, but not the ciphertext.

Provable security

One of the advantages of a rigorous mathematical treatments of cryptography is that we can formulate different versions of a definition, and formally prove that they are equivalent, e.g., we can prove that an encryption scheme satisfies a definition if and only if it satisfies the other one. This is the case for example for IND-CPA security for single or multiple messages, or IND-CPA and an appropriate formulation of semantic security. This allows to use simpler definitions, and still allow for great generality, e.g., an encryption scheme satisfying the given definition of single message IND-CPA security can be safely used in a context where several messages are encrypted.

Beside proving the equivalence of different formulations of a definition, the mathematical treatment allows to prove the security of cryptographic applications bottom up. The provable security methodology involves several steps:

  1. Giving a definition for the security notion achieved by the cryptographic primitives used in the application
  2. Giving, in a similar style, a definition for the security goals of the application
  3. Coming up with a protocol to solve the application, using the cryptographic primitives as building blocks
  4. Prove that if the cryptographic primitives are secure, then the application is also secure. This typically involves showing that any adversary that successfully attacks the application, can be efficiently converted into an adversary that breaks one of the primitives.

This way, cryptanalytic efforts can focus on the security of the primitive, and a small set of primitives can be used in the construction of provably secure solutions to many different applications.

Symbolic cryptography

No matter how useful and powerful the computational definitions from modern cryptography are, using them can get pretty complicated as we analyze larger and larger protocols. Intuitively, it would be more desirable to be able to reason about cryptographic applications in abstract terms, so that we can make assertions of the form

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because b is protected by k2 and k2 is protected by k1"

rather than

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because otherwise we could break the encryption scheme as follows. If we can distinguish E(k1,k2),E(k2,0) from E(k1,k2),E(k2,1) with probability >p, then we can

  1. either distinguish E(k1,k2),E(k2,b) from E(k1,k3),E(k2,b) with probability >p/2
  2. or distinguish E(k1,k3),E(k2,0) from E(k1,k3),E(k2,1) with probability >p/2.

In the first case, we can break E(k1,*) on message space {k2,k3} as follows: given a ciphertext c, choose b at random, and run the attack on c,E(k2,b). In the second case, we can break E(k2,*) on message space {0,1} as follows: given a ciphertext c, choose k1 and k3 at random, and run the attack on E(k1,k2),c"

The first kind of argument can be formalized by setting up an abstract model of computation, where encryption (or other cryptographic primitives) are considered much like abstract data types, so that cryptographic objects can only be manipulated using a restricted set of operations. Symbolic treatments of cryptography of this kind have been developed in parallel to the computational approach during the 80', and continue today. One of the advantages of the symbolic approach is that it allows for partial automation of security proofs. The drawback is that it does not seem to provide the same level of provable security guarantees of the computational approach. E.g., it only establishes security against adversaries that respect the cryptographic abstraction. Since adversaries can behave arbitrarily, security in the symbolic model has been largely disregarded in the computational cryptography community as too weak of a notion for real world applications. Still the symbolic approach has proved useful in uncovering subtle bugs in many published protocols, so it seems to be useful somehow.

We remark that both the computational and symbolic treatments try to address the same general problems associated to the analysis of security protocols:

The difference between the symbolic and computational approach is in the choice of execution model:

  1. Symbolic approach: abstract computation and communication model. Cryptography is part of the model, i.e., its properties are axiomatized rather than defined in terms of simpler notions
  2. Computational approach: detailed model of computation / communication. Cryptographic operations are not modeled, but defined within the model in terms of efficient computations, randomness and success probabilities..

These are both reasonable execution models, but at a different level of abstraction. In fact, we can consider even more detailed models than the computational one. As we increase the level of abstraction we find:

  1. Physics / EE: atoms, wires, signals, time, power. This details are typically disregarded by moderncryptography, but there are some exceptions. E.g., some cryptanalytic attacks exploit time and power analysis, while time can be used to devise protocols achieving concurrent zero knowledge.
  2. Digital circuits: this is basically the model used by the computational approach. Cryptography is defined and studied in terms of circuit implementations, randomness and success probabilities.
  3. Symbolic computation: cryptography becomes part of the model, and we abstract away from its implementation details in much the same way the computational approach abstracts from physical layer.

So, which model should be used? Each model has its own advantages and shortcomings:

Advantages Disadvantages
computational model Hish security assurance

Provides guidance to design of crypto primitives

Allows definition of new crypto primitives

Proofs are long and hard

Security intuition is easily lost

Few cryptographers still write full proofs,
and even fewer people read them anyway

symbolic model Simpler, higher level proofs (e.g., no probabilities)

Potential for automatic proof verification

Security holds only against abstract adversaries

Unclear assumption about primitives

Tailored to specific security properties and
classes of protocols

In this course we will cover the following topics:

The last part (computational soundness of symbolic models) is a very recent and active area of research, and that's where a solid cryptographic background will be more important. But for the first part of the course, we will put the computational approach temporarily aside, and concentrate on the symbolic model.

The computaitonal and symbolic security models evolved in parallel pretty much at the same time, shortly after the invention of public key cryptography in the 1970's. In order to better understand the symbolic model, we will start by making a jump into the past, and read some old classics. The goal is to understnand how cryptography used to be at the time when both the symbolic and computational security models started evolving.

Reading assignments for next time:

These papers mark the birth of modern cryptography. Definitions in the papers are rather informal, but still one can recognize many of the most important ideas that contributed to the development of modern cryptography. These are among the most influential papers in the whole cryptographic literature, a must read for anybody interested in cryptography. They are also an excellent reading to get an idea of what cryptography looked like in the 70's.

All students taking the class are expected to read the papers so they can actively partecipate in the discussion in the next lecture.