# Advanced cryptography: Symbolic methods for security analysis

Instructor: Daniele Micciancio

# Introduction

The topic of this course is the study of symbolic models for the analysis of security protocols, the application of formal methods and tools for automatic or computer assisted protocol analysis, and the relation between symbolic security models and the standard computational model used by modern cryptography.

In this first lecture we discuss:

• Background and prerequisites to take this course
• Course overview

### Background

This course assumes some background in cryptography and implied prerequisites. In particular, it is assumed that students

• have a general computer science background, e.g., programming, discrete math, probability, algorithms design and analysis, automata and computability theory, computational complexity, etc.
• have taken an Introductory cryptography course

Typically, students taking this course are expected to have already taken introductory courses in algorithms (CSE202), computational complexity (CSE200), and cryptography (CSE207). If you have not taken all of these courses, but think you have the necessary background consult the instructor to determine if you should take the course or not. E.g., if you have taken the undergradaute cryptography course CSE107, you can probably take this course and do just fine with a little extra reading.

The most important prerequisite for this course is that you have already been exposed to the definitional approach and provable security methodology of modern (computational) cryptography, as described below.

### Encryption

The typical problem in cryptography involves two parties (Alice and Bob) that want to communicate over an insecure channel. The channel is inseure in the sense that it can be tapped by an adversary Eve:

Alice ------[Eve]-----> Bob

Alice's goal is to send a message m to Bob in such a way that the message is kept secret from Eve. This can be achieved using encryption.

An encryption scheme is a triple of algorithms (G,E,D) where

1. G is the key generation algorithm. It takes as input a security parameter n, and outputs a pair of keys (e,d)
2. E is the encryption algorithm that on input a key e and a message m, outputs a ciphertext E(e,m)
3. D is the decryption algorithm that on input a key d and a ciphertext c, outputs the underlying message D(d,c)

The algorithms are required to satisfy D(d,E(e,m))=m for all (e,d) in the range of G, so that Bob can recover the message sent by Alice:

Alice: E(e,m) -----[Eve]------> Bob: D(d,E(e,m))=m

It is assumed that Alice and Bob know the encryption/decryption keys (e,d) and that at least the decryption key d is not known to Eve. In a public key encryption scheme (i.e., a scheme where the encryption key e can be made public without compromising security) this can be achieved by having Bob generate the key pair G(n)->(e,d) using the key generation algorithm and publishing e.

The scheme is secure if no adversary can efficiently recover any information about m, given the ciphertext E(e,m) and (in the case of a public key system) the encryption key e. A precise mathematical definition of this security requirement is one of the cornerstones of modern cryptography [Goldwasser & Micali, 1982]. The definition, now standard, is the following.

Definition: A public key encryption scheme (G,E,D) satisfies indistinguishability under chosen plaintext attack (IND-CPA) if for any pair of PPT algorithms A1,A2 (the adversary) the difference between the probability that the following experiment outputs 1 when b=0 or b=1 is negligible

1. Run the key generation algorithm G(n) -> (e,d)
2. Run the first stage of the adversary A1(e,n) = (m[0],m[1],state)
3. Check if |m[0]|=|m[1]| have the same length, and let c = E(e,m[b])
4. Rum the second stage A2(state,c) and output the result.

Note on asymptotic security: The above definition is asymptotic, i.e., it abstracts from the specific value of the running time of adversary (A1,A2) and the actual success probability of the attack, and offers a concise qualitative statement about the security of the encryption scheme. In practice, when the actual running time and success probability of the attack are important to determine an appropriate value of the security parameter, one can make the definition of security more concrete, by saying that the encryption scheme is (t,s)-secure if for every adversary (A1,A2) running in time t, the difference between the two probabilities is at most s. Asymptotic security requires that for any polynomial t(n)=n^{O(1)} there is a negligible function s(n)=n^{-omega(1)} such that the scheme is (t(n),s(n))-secure for every n.

Whether or not you are familiar with the above definition, a few words of explaination are due. (If you are familiar with the definition, you probably got so used to it that it is good to explicitly think about the implications of specific definitional choices.) The definition captures the following important properties:

• The encryption scheme does not hide the length of the message. This is necessary for efficiency, and it can be proved that in order to be able to encrypt arbitrary messages, some information about the length must be leaked.
• The encryption does not necessarily hides all partial information about the key. E.g., the first bit of the key may be leaked, and still meet the security definition.
• Security holds with respect to any probability distribution on messages. (This is implicitly captured by having a message space of size 2.)
• The scheme is secure even if the adversary can see several ciphertexts. (This is proved using a standard hybrid argument.)
• The definition captures secrecy under passive attacks, where the adversary can tap the channel, but cannot otherwise alter the messages, and, say, trick Bob into decrypting badly formed ciphertexts. There are stronger notions of security for encryption that capture these additional threats, but for now we will focus on the simplest definition.

All of the above points can be formalized by giving precise definitions, and proving that the new definitions are implied by IND-CPA. An example of such definition is the "semantic security" property originally defined by Goldwasser & Micali (1982), which informally states that for any adversary A that tries to compute partial information f(m) given the public key e, ciphertext E(e,m) and apriori information h(m) about the message, there is a simulator S that computes f(m) with essentially the same probability given only e, h(m), |m|, but not the ciphertext.

### Provable security

One of the advantages of a rigorous mathematical treatments of cryptography is that we can formulate different versions of a definition, and formally prove that they are equivalent, e.g., we can prove that an encryption scheme satisfies a definition if and only if it satisfies the other one. This is the case for example for IND-CPA security for single or multiple messages, or IND-CPA and an appropriate formulation of semantic security. This allows to use simpler definitions, and still allow for great generality, e.g., an encryption scheme satisfying the given definition of single message IND-CPA security can be safely used in a context where several messages are encrypted.

Beside proving the equivalence of different formulations of a definition, the mathematical treatment allows to prove the security of cryptographic applications bottom up. The provable security methodology involves several steps:

1. Giving a definition for the security notion achieved by the cryptographic primitives used in the application
2. Giving, in a similar style, a definition for the security goals of the application
3. Coming up with a protocol to solve the application, using the cryptographic primitives as building blocks
4. Prove that if the cryptographic primitives are secure, then the application is also secure. This typically involves showing that any adversary that successfully attacks the application, can be efficiently converted into an adversary that breaks one of the primitives.

This way, cryptanalytic efforts can focus on the security of the primitive, and a small set of primitives can be used in the construction of provably secure solutions to many different applications.

### Symbolic cryptography

No matter how useful and powerful the computational definitions from modern cryptography are, using them can get pretty complicated as we analyze larger and larger protocols. Intuitively, it would be more desirable to be able to reason about cryptographic applications in abstract terms, so that we can make assertions of the form

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because b is protected by k2 and k2 is protected by k1"

rather than

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because otherwise we could break the encryption scheme as follows. If we can distinguish E(k1,k2),E(k2,0) from E(k1,k2),E(k2,1) with probability >p, then we can

1. either distinguish E(k1,k2),E(k2,b) from E(k1,k3),E(k2,b) with probability >p/2
2. or distinguish E(k1,k3),E(k2,0) from E(k1,k3),E(k2,1) with probability >p/2.

In the first case, we can break E(k1,*) on message space {k2,k3} as follows: given a ciphertext c, choose b at random, and run the attack on c,E(k2,b). In the second case, we can break E(k2,*) on message space {0,1} as follows: given a ciphertext c, choose k1 and k3 at random, and run the attack on E(k1,k2),c"

The first kind of argument can be formalized by setting up an abstract model of computation, where encryption (or other cryptographic primitives) are considered much like abstract data types, so that cryptographic objects can only be manipulated using a restricted set of operations. Symbolic treatments of cryptography of this kind have been developed in parallel to the computational approach during the 80', and continue today. One of the advantages of the symbolic approach is that it allows for partial automation of security proofs. The drawback is that it does not seem to provide the same level of provable security guarantees of the computational approach. E.g., it only establishes security against adversaries that respect the cryptographic abstraction. Since adversaries can behave arbitrarily, security in the symbolic model has been largely disregarded in the computational cryptography community as too weak of a notion for real world applications. Still the symbolic approach has proved useful in uncovering subtle bugs in many published protocols, so it seems to be useful somehow.

We remark that both the computational and symbolic treatments try to address the same general problems associated to the analysis of security protocols:

• We want to analyze security in the presence of malicious adversaries. This makes cryptographic protocol verification a much harder task than traditional program verification because of the adversarial and unpredictable execution environment. .
• Security cannot be tested, but only proved, because we want to counter arbitrary attacks associated to unknown execution environments.
• Need of a formal (mathematical) execution model to precisely formulate and prove security properties

The difference between the symbolic and computational approach is in the choice of execution model:

1. Symbolic approach: abstract computation and communication model. Cryptography is part of the model, i.e., its properties are axiomatized rather than defined in terms of simpler notions
2. Computational approach: detailed model of computation / communication. Cryptographic operations are not modeled, but defined within the model in terms of efficient computations, randomness and success probabilities..

These are both reasonable execution models, but at a different level of abstraction. In fact, we can consider even more detailed models than the computational one. As we increase the level of abstraction we find:

1. Physics / EE: atoms, wires, signals, time, power. This details are typically disregarded by moderncryptography, but there are some exceptions. E.g., some cryptanalytic attacks exploit time and power analysis, while time can be used to devise protocols achieving concurrent zero knowledge.
2. Digital circuits: this is basically the model used by the computational approach. Cryptography is defined and studied in terms of circuit implementations, randomness and success probabilities.
3. Symbolic computation: cryptography becomes part of the model, and we abstract away from its implementation details in much the same way the computational approach abstracts from physical layer.

So, which model should be used? Each model has its own advantages and shortcomings:

 Advantages Disadvantages computational model Hish security assurance Provides guidance to design of crypto primitives Allows definition of new crypto primitives Proofs are long and hard Security intuition is easily lost Few cryptographers still write full proofs, and even fewer people read them anyway symbolic model Simpler, higher level proofs (e.g., no probabilities) Potential for automatic proof verification Security holds only against abstract adversaries Unclear assumption about primitives Tailored to specific security properties and classes of protocols

In this course we will cover the following topics:

• Study various methods to formalize the symbolic execution or the process of analyzing security protocols
• State transition systems (e.g., Dolev-Yao)
• Belief logics (e.g., BAN)
• Process algebras (spi-calculus)
• Type systems, etc.
• Get some experience with a selection of tools for the automatic (or computer assisted) analysis of protocols:
• Model checkers (e.g., Murphi)
• General purpose theorem provers (e.g., Isabelle / HOL)
• Specialized systems for security analysis (e.g., Casrul, ProVerif)
• Explore the relation between the computational and symbolic model
• Computational soundness of certain symbolic models, i.e., are symbolic proofs computationally meaningful?
• Completeness issues, i.e., can all true computational statements be proved within a given symbolic model?
• Applications to specific areas (e.g., multicast security, database publishing)

The last part (computational soundness of symbolic models) is a very recent and active area of research, and that's where a solid cryptographic background will be more important. But for the first part of the course, we will put the computational approach temporarily aside, and concentrate on the symbolic model.

The computaitonal and symbolic security models evolved in parallel pretty much at the same time, shortly after the invention of public key cryptography in the 1970's. In order to better understand the symbolic model, we will start by making a jump into the past, and read some old classics. The goal is to understnand how cryptography used to be at the time when both the symbolic and computational security models started evolving.