CSE 207B, Fall 2023
Homework 5: Computing discrete logs

The Pretty Bad Privacy encryption tool can be used to insecurely encrypt files to a 512-bit ElGamal public key using 128-bit AES.

The pdf file for your next homework assignment has been encrypted using PBP to this ElGamal public key.

The encrypted file is available here. Your task is to use the attack described in this paper by van Oorschot and Wiener to compute the discrete log of the public key and use it to decrypt the homework so you can do the rest of the problems. This is a real vulnerability when implementations don't generate primes carefully: your professor has published a paper where we used this exact attack to compromise several hundred hosts on the Internet. (See Section 3.5, "Attacks on composite-order subgroups".)

The following steps may be helpful:

  1. Install Sage and get it working. Anaconda may come in handy if you run into package issues.
  2. Implement a brute force discrete log algorithm. Your function should take a generator g, a prime p, a target t, and a subgroup order q, and brute force all possible exponents until it finds an x such that g^x = t mod p, when g generates a subgroup of order q mod p.
  3. Implement Pollard rho or baby step-giant step. Your function should take a generator g, a prime p, a target t, and a subgroup order q, and in expected time sqrt(q) output x such that g^x = t mod p.
  4. Implement the Chinese remainder theorem. Your function should take a list of pairs (xi,pi) of residues xi and primes pi and output a z such that z = xi mod pi for each pi.
  5. Use your functions above to implement the Pohlig-Hellman algorithm when the prime factorization of the group order is known.
  6. Factor p-1 for the public key (you can use Sage's factor() function or any other computer algebra package or implementation you like) and identify a subgroup with the properties that (1) you can efficiently compute discrete logs in this subgroup (2) the order of your subgroup is large enough that you can uniquely recover a private key as short as PBP uses.
  7. Use your Pohlig-Hellman implementation to recover the private key, and then use the private key to decrypt the ciphertext. You can use our code to generate your own keys with known private keys and plaintext to test your implementation.

Please implement your own discrete log algorithms. You do not need to implement your own factoring algorithm. The code used to encrypt the homework is here. It uses Sage for mathematical calculations and PyCryptodome for symmmetric encryption.

You may discuss this assignment in small groups with classmates, but please code and write up your solutions yourself. Please credit any collaborators you discussed with and any references you used.

Submission requirements and autograder

Submission requirements:

The autograder...

Additional resources

For reference, we give some excerpts from the OpenPBP RFC, inspired by the OpenPGP RFC.

3.2.  Multiprecision Integers

   Multiprecision integers (also called MPIs) are unsigned integers used
   to hold large integers such as the ones used in cryptographic
   calculations.

   An MPI consists of two pieces: a four-octet little-endian scalar
   that is the length of the MPI followed by a string of octets that
   contain the actual integer.

5.5.2.  Public-Key Formats

   A public key contains:
        - MPI of Elgamal prime p;

        - MPI of Elgamal group generator g;

        - MPI of Elgamal public key value y (= g**x mod p where x
           is secret).

5.1.  Public-Key Encrypted Messages

The body of the message consists of a string of octets that is the
encrypted session key, followed by the symmetrically encrypted data.
The symmetric session key is derived from m by interpreting m as an
appropriate length string of octets.

     - MPI of Elgamal (Diffie-Hellman) value g**k mod p.

     - MPI of Elgamal (Diffie-Hellman) value m * y**k mod p.

     - Encrypted data, the output of the AES symmetric-key cipher
       operating in CBC mode, with PKCS 7 padding.

6.2.  Forming ASCII Armor

   When PBP encodes data into ASCII Armor, it puts specific headers
   around the Radix-64 encoded data, so PBP can reconstruct the data
   later.  A PBP implementation MAY use ASCII armor to protect raw
   binary data.  PBP informs the user what kind of data is encoded
   in the ASCII armor through the use of the headers.

   Concatenating the following data creates ASCII Armor:

     - An Armor Header Line, appropriate for the type of data

     - The ASCII-Armored data

     - The Armor Tail, which depends on the Armor Header Line

   An Armor Header Line consists of the appropriate header line text
   surrounded by five (5) dashes ('-', 0x2D) on either side of the
   header line text.  The header line text is chosen based upon the type
   of data that is being encoded in Armor, and how it is being encoded.
   Header line texts include the following strings:

   BEGIN PRETTY BAD ENCRYPTED MESSAGE
       Used for encrypted files.

   BEGIN PRETTY BAD PUBLIC KEY BLOCK
       Used for armoring public keys.

   Note that all these Armor Header Lines are to consist of a complete
   line.  That is to say, there is always a line ending preceding the
   starting five dashes, and following the ending five dashes.  The
   header lines, therefore, MUST start at the beginning of a line, and
   MUST NOT have text other than whitespace following them on the same
   line.  These line endings are considered a part of the Armor Header
   Line for the purposes of determining the content they delimit.