Homework 5: Computing discrete logs

The Pretty Bad Privacy encryption tool can be used to insecurely encrypt files to a 512-bit ElGamal public key using 128-bit AES.

The pdf file for your next homework assignment has been encrypted using PBP to this ElGamal public key.

The encrypted file is available here. Your task is to use the attack described in this paper by van Oorschot and Wiener to compute the discrete log of the public key and use it to decrypt the homework so you can do the rest of the problems. This is a real vulnerability when implementations don't generate primes carefully: your professor has published a paper where we used this exact attack to compromise several hundred hosts on the Internet. (See Section 3.5, "Attacks on composite-order subgroups".)

The following steps may be helpful:

- Install Sage and get it working. Anaconda may come in handy if you run into package issues.
- Implement a brute force discrete log algorithm. Your function should take a generator g, a prime p, a target t, and a subgroup order q, and brute force all possible exponents until it finds an x such that g^x = t mod p, when g generates a subgroup of order q mod p.
- Implement Pollard rho or baby step-giant step. Your function should take a generator g, a prime p, a target t, and a subgroup order q, and in expected time sqrt(q) output x such that g^x = t mod p.
- Implement the Chinese remainder theorem. Your function should take a list of pairs (xi,pi) of residues xi and primes pi and output a z such that z = xi mod pi for each pi.
- Use your functions above to implement the Pohlig-Hellman algorithm when the prime factorization of the group order is known.
- Factor p-1 for the public key (you can use Sage's factor() function or any other computer algebra package or implementation you like) and identify a subgroup with the properties that (1) you can efficiently compute discrete logs in this subgroup (2) the order of your subgroup is large enough that you can uniquely recover a private key as short as PBP uses.
- Use your Pohlig-Hellman implementation to recover the private key, and then use the private key to decrypt the ciphertext. You can use our code to generate your own keys with known private keys and plaintext to test your implementation.

Please implement your own discrete log algorithms. You do not need to implement your own factoring algorithm. The code used to encrypt the homework is here. It uses Sage for mathematical calculations and PyCryptodome for symmmetric encryption.

You may discuss this assignment in small groups with classmates, but please code and write up your solutions yourself. Please credit any collaborators you discussed with and any references you used.

- Python3 solver script named "hw5-sol.py." string.
- LaTeX typed answer PDF called "hw5-solutions.pdf"

The autograder...

- ... runs your script on a Ubuntu 22.04 Docker image
- ... has a 10 minute time limit and resource constrained (0.5 vCPU, 0.75 RAM)
- ... runs your code in a directory which contains an encrypted file named as "hw5.pdf.enc.asc"
- ... will use these file names, the file contents will not be the same as for your decryption challenge! We will encrypt a different version of hw5.pdf. We will use the same code as was used to encrypt the homework, with the same value of p, but since this code selects a random value for x, the answer to the discrete log problem will be different (as will the contents of key.pub). Feel free to ask any clarifying questions you need to on Piazza.

For reference, we give some excerpts from the OpenPBP RFC, inspired by the OpenPGP RFC.

3.2. Multiprecision Integers Multiprecision integers (also called MPIs) are unsigned integers used to hold large integers such as the ones used in cryptographic calculations. An MPI consists of two pieces: a four-octet little-endian scalar that is the length of the MPI followed by a string of octets that contain the actual integer. 5.5.2. Public-Key Formats A public key contains: - MPI of Elgamal prime p; - MPI of Elgamal group generator g; - MPI of Elgamal public key value y (= g**x mod p where x is secret). 5.1. Public-Key Encrypted Messages The body of the message consists of a string of octets that is the encrypted session key, followed by the symmetrically encrypted data. The symmetric session key is derived from m by interpreting m as an appropriate length string of octets. - MPI of Elgamal (Diffie-Hellman) value g**k mod p. - MPI of Elgamal (Diffie-Hellman) value m * y**k mod p. - Encrypted data, the output of the AES symmetric-key cipher operating in CBC mode, with PKCS 7 padding. 6.2. Forming ASCII Armor When PBP encodes data into ASCII Armor, it puts specific headers around the Radix-64 encoded data, so PBP can reconstruct the data later. A PBP implementation MAY use ASCII armor to protect raw binary data. PBP informs the user what kind of data is encoded in the ASCII armor through the use of the headers. Concatenating the following data creates ASCII Armor: - An Armor Header Line, appropriate for the type of data - The ASCII-Armored data - The Armor Tail, which depends on the Armor Header Line An Armor Header Line consists of the appropriate header line text surrounded by five (5) dashes ('-', 0x2D) on either side of the header line text. The header line text is chosen based upon the type of data that is being encoded in Armor, and how it is being encoded. Header line texts include the following strings: BEGIN PRETTY BAD ENCRYPTED MESSAGE Used for encrypted files. BEGIN PRETTY BAD PUBLIC KEY BLOCK Used for armoring public keys. Note that all these Armor Header Lines are to consist of a complete line. That is to say, there is always a line ending preceding the starting five dashes, and following the ending five dashes. The header lines, therefore, MUST start at the beginning of a line, and MUST NOT have text other than whitespace following them on the same line. These line endings are considered a part of the Armor Header Line for the purposes of determining the content they delimit.