**Instructor: **Daniele Micciancio

The topic of this course is the study of symbolic models for the analysis of security protocols, the application of formal methods and tools for automatic or computer assisted protocol analysis, and the relation between symbolic security models and the standard computational model used by modern cryptography.

In this first lecture we discuss:

- Background and prerequisites to take this course
- Course overview

This course assumes some background in cryptography and implied prerequisites. In particular, it is assumed that students

- have a general computer science background, e.g., programming, discrete math, probability, algorithms design and analysis, automata and computability theory, computational complexity, etc.
- have taken an Introductory cryptography course

Typically, students taking this course are expected to have already taken introductory courses in algorithms (CSE202), computational complexity (CSE200), and cryptography (CSE207). If you have not taken all of these courses, but think you have the necessary background consult the instructor to determine if you should take the course or not. E.g., if you have taken the undergradaute cryptography course CSE107, you can probably take this course and do just fine with a little extra reading.

The most important prerequisite for this course is that you have already been exposed to the definitional approach and provable security methodology of modern (computational) cryptography, as described below.

The typical problem in cryptography involves two parties (Alice and Bob) that want to communicate over an insecure channel. The channel is inseure in the sense that it can be tapped by an adversary Eve:

Alice ------[Eve]-----> Bob

Alice's goal is to send a message m to Bob in such a way that the message is kept secret from Eve. This can be achieved using encryption.

An encryption scheme is a triple of algorithms (G,E,D) where

- G is the key generation algorithm. It takes as input a security parameter n, and outputs a pair of keys (e,d)
- E is the encryption algorithm that on input a key e and a message m, outputs a ciphertext E(e,m)
- D is the decryption algorithm that on input a key d and a ciphertext c, outputs the underlying message D(d,c)

The algorithms are required to satisfy D(d,E(e,m))=m for all (e,d) in the range of G, so that Bob can recover the message sent by Alice:

Alice: E(e,m) -----[Eve]------> Bob: D(d,E(e,m))=m

It is assumed that Alice and Bob know the encryption/decryption keys (e,d) and that at least the decryption key d is not known to Eve. In a public key encryption scheme (i.e., a scheme where the encryption key e can be made public without compromising security) this can be achieved by having Bob generate the key pair G(n)->(e,d) using the key generation algorithm and publishing e.

The scheme is secure if no adversary can efficiently recover any information about m, given the ciphertext E(e,m) and (in the case of a public key system) the encryption key e. A precise mathematical definition of this security requirement is one of the cornerstones of modern cryptography [Goldwasser & Micali, 1982]. The definition, now standard, is the following.

**Definition:** *A public key encryption scheme (G,E,D)
satisfies indistinguishability under chosen plaintext attack (IND-CPA) if for
any pair of PPT algorithms A1,A2 (the adversary) the difference between the
probability that the following experiment outputs 1 when b=0 or b=1 is
negligible*

*Run the key generation algorithm G(n) -> (e,d)**Run the first stage of the adversary A1(e,n) = (m[0],m[1],state)**Check if |m[0]|=|m[1]| have the same length, and let c = E(e,m[b])**Rum the second stage A2(state,c) and output the result.*

**Note on asymptotic security:** The above definition is
asymptotic, i.e., it abstracts from the specific value of the running time of
adversary (A1,A2) and the actual success probability of the attack, and
offers a concise qualitative statement about the security of the encryption
scheme. In practice, when the actual running time and success probability of
the attack are important to determine an appropriate value of the security
parameter, one can make the definition of security more concrete, by saying
that the encryption scheme is (t,s)-secure if for every adversary (A1,A2)
running in time t, the difference between the two probabilities is at most s.
Asymptotic security requires that for any polynomial t(n)=n^{O(1)} there is a
negligible function s(n)=n^{-omega(1)} such that the scheme is
(t(n),s(n))-secure for every n.

Whether or not you are familiar with the above definition, a few words of explaination are due. (If you are familiar with the definition, you probably got so used to it that it is good to explicitly think about the implications of specific definitional choices.) The definition captures the following important properties:

- The encryption scheme does not hide the length of the message. This is necessary for efficiency, and it can be proved that in order to be able to encrypt arbitrary messages, some information about the length must be leaked.
- The encryption does not necessarily hides all partial information about the key. E.g., the first bit of the key may be leaked, and still meet the security definition.
- Security holds with respect to any probability distribution on messages. (This is implicitly captured by having a message space of size 2.)
- The adversary may have partial information about the message, and still cannot gain additional information.
- The scheme is secure even if the adversary can see several ciphertexts. (This is proved using a standard hybrid argument.)
- The definition captures secrecy under passive attacks, where the adversary can tap the channel, but cannot otherwise alter the messages, and, say, trick Bob into decrypting badly formed ciphertexts. There are stronger notions of security for encryption that capture these additional threats, but for now we will focus on the simplest definition.

All of the above points can be formalized by giving precise definitions, and proving that the new definitions are implied by IND-CPA. An example of such definition is the "semantic security" property originally defined by Goldwasser & Micali (1982), which informally states that for any adversary A that tries to compute partial information f(m) given the public key e, ciphertext E(e,m) and apriori information h(m) about the message, there is a simulator S that computes f(m) with essentially the same probability given only e, h(m), |m|, but not the ciphertext.

One of the advantages of a rigorous mathematical treatments of cryptography is that we can formulate different versions of a definition, and formally prove that they are equivalent, e.g., we can prove that an encryption scheme satisfies a definition if and only if it satisfies the other one. This is the case for example for IND-CPA security for single or multiple messages, or IND-CPA and an appropriate formulation of semantic security. This allows to use simpler definitions, and still allow for great generality, e.g., an encryption scheme satisfying the given definition of single message IND-CPA security can be safely used in a context where several messages are encrypted.

Beside proving the equivalence of different formulations of a definition, the mathematical treatment allows to prove the security of cryptographic applications bottom up. The provable security methodology involves several steps:

- Giving a definition for the security notion achieved by the cryptographic primitives used in the application
- Giving, in a similar style, a definition for the security goals of the application
- Coming up with a protocol to solve the application, using the cryptographic primitives as building blocks
- Prove that if the cryptographic primitives are secure, then the application is also secure. This typically involves showing that any adversary that successfully attacks the application, can be efficiently converted into an adversary that breaks one of the primitives.

This way, cryptanalytic efforts can focus on the security of the primitive, and a small set of primitives can be used in the construction of provably secure solutions to many different applications.

No matter how useful and powerful the computational definitions from modern cryptography are, using them can get pretty complicated as we analyze larger and larger protocols. Intuitively, it would be more desirable to be able to reason about cryptographic applications in abstract terms, so that we can make assertions of the form

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because b is protected by k2 and k2 is protected by k1"

rather than

"If the adversary sees E(k1,k2) and E(k2,b), then it cannot tell if b=0 or b=1 because otherwise we could break the encryption scheme as follows. If we can distinguish E(k1,k2),E(k2,0) from E(k1,k2),E(k2,1) with probability >p, then we can

either distinguish E(k1,k2),E(k2,b) from E(k1,k3),E(k2,b) with probability >p/2or distinguish E(k1,k3),E(k2,0) from E(k1,k3),E(k2,1) with probability >p/2.

In the first case, we can break E(k1,*) on message space {k2,k3} as follows: given a ciphertext c, choose b at random, and run the attack on c,E(k2,b). In the second case, we can break E(k2,*) on message space {0,1} as follows: given a ciphertext c, choose k1 and k3 at random, and run the attack on E(k1,k2),c"

The first kind of argument can be formalized by setting up an abstract model of computation, where encryption (or other cryptographic primitives) are considered much like abstract data types, so that cryptographic objects can only be manipulated using a restricted set of operations. Symbolic treatments of cryptography of this kind have been developed in parallel to the computational approach during the 80', and continue today. One of the advantages of the symbolic approach is that it allows for partial automation of security proofs. The drawback is that it does not seem to provide the same level of provable security guarantees of the computational approach. E.g., it only establishes security against adversaries that respect the cryptographic abstraction. Since adversaries can behave arbitrarily, security in the symbolic model has been largely disregarded in the computational cryptography community as too weak of a notion for real world applications. Still the symbolic approach has proved useful in uncovering subtle bugs in many published protocols, so it seems to be useful somehow.

We remark that both the computational and symbolic treatments try to address the same general problems associated to the analysis of security protocols:

- We want to analyze security in the presence of malicious adversaries. This makes cryptographic protocol verification a much harder task than traditional program verification because of the adversarial and unpredictable execution environment. .
- Security cannot be tested, but only proved, because we want to counter arbitrary attacks associated to unknown execution environments.
- Need of a formal (mathematical) execution model to precisely formulate and prove security properties

The difference between the symbolic and computational approach is in the choice of execution model:

- Symbolic approach: abstract computation and communication model. Cryptography is part of the model, i.e., its properties are axiomatized rather than defined in terms of simpler notions
- Computational approach: detailed model of computation / communication. Cryptographic operations are not modeled, but defined within the model in terms of efficient computations, randomness and success probabilities..

These are both reasonable execution models, but at a different level of abstraction. In fact, we can consider even more detailed models than the computational one. As we increase the level of abstraction we find:

- Physics / EE: atoms, wires, signals, time, power. This details are typically disregarded by moderncryptography, but there are some exceptions. E.g., some cryptanalytic attacks exploit time and power analysis, while time can be used to devise protocols achieving concurrent zero knowledge.
- Digital circuits: this is basically the model used by the computational approach. Cryptography is defined and studied in terms of circuit implementations, randomness and success probabilities.
- Symbolic computation: cryptography becomes part of the model, and we abstract away from its implementation details in much the same way the computational approach abstracts from physical layer.

So, which model should be used? Each model has its own advantages and shortcomings:

Advantages | Disadvantages | |

computational model | Hish security assurance
Provides guidance to design of crypto primitives Allows definition of new crypto primitives |
Proofs are long and hard
Security intuition is easily lost Few cryptographers still write full proofs, |

symbolic model | Simpler, higher level proofs (e.g., no probabilities)
Potential for automatic proof verification |
Security holds only against abstract adversaries
Unclear assumption about primitives Tailored to specific security properties and |

In this course we will cover the following topics:

- Study various methods to formalize the symbolic execution or the
process of analyzing security protocols
- State transition systems (e.g., Dolev-Yao)
- Belief logics (e.g., BAN)
- Process algebras (spi-calculus)
- Type systems, etc.

- Get some experience with a selection of tools for the automatic (or
computer assisted) analysis of protocols:
- Model checkers (e.g., Murphi)
- General purpose theorem provers (e.g., Isabelle / HOL)
- Specialized systems for security analysis (e.g., Casrul, ProVerif)

- Explore the relation between the computational and symbolic model
- Computational soundness of certain symbolic models, i.e., are symbolic proofs computationally meaningful?
- Completeness issues, i.e., can all true computational statements be proved within a given symbolic model?
- Applications to specific areas (e.g., multicast security, database publishing)

The last part (computational soundness of symbolic models) is a very recent and active area of research, and that's where a solid cryptographic background will be more important. But for the first part of the course, we will put the computational approach temporarily aside, and concentrate on the symbolic model.

The computaitonal and symbolic security models evolved in parallel pretty much at the same time, shortly after the invention of public key cryptography in the 1970's. In order to better understand the symbolic model, we will start by making a jump into the past, and read some old classics. The goal is to understnand how cryptography used to be at the time when both the symbolic and computational security models started evolving.

Reading assignments for next time:

- Diffie, Hellman,
*New directions in cryptography*, IEEE TIT 22(6):644-654 (1976) - Rivest , Shamir, Adleman,
*A method for obtaining digital signatures and public-key cryptosystems*, CACM 21(2):120-126 (1978)

These papers mark the birth of modern cryptography. Definitions in the papers are rather informal, but still one can recognize many of the most important ideas that contributed to the development of modern cryptography. These are among the most influential papers in the whole cryptographic literature, a must read for anybody interested in cryptography. They are also an excellent reading to get an idea of what cryptography looked like in the 70's.

All students taking the class are expected to read the papers so they can actively partecipate in the discussion in the next lecture.