Project Euler Problem 59

Another fun one! Problem 59 reminds me of those long-past days when I was a young boy, heavily armed with water pistols and secure in my friend’s tree house, trying to develop secret alphabets and substitutions ciphers to keep our secret documents out of the hands of our evil sisters, who generally had no interest in what we were writing anyway.

But wait! This problem gives us a secret message stolen straight from the sisters’ secret laboratory. Let’s see if we can’t first break the simple rotating XOR cipher of this problem using sheer brute force. Since we know the keylength (3), and that the key is lowercase ASCII and the text is also mostly lowercase ascii, this will be a very easy task indeed. We’ll just try to find the particular keys which maximize the number of ascii characters present in the decoded message.

(defn num-ascii-chars 
  "Returns the number of ascii characters found in string s."
  [s]
  (count (filter #(or (and (<= (int \a) %) (<= % (int \z)))
                      (and (<= (int \A) %) (<= % (int \Z)))
                      (= (int \newline) %)
                      (= (int \space) %))
                 (map #(int %) s))))

(defn decrypt-xor
  "Try to decrypt integer seq s with cyclically repeating key k."
  [s k]
  (map bit-xor s (cycle k)))

(defn euler-59 [file]
  (let [chars (map #(Integer/parseInt %) (re-seq #"\d+" (rp file)))
        lowers (map #(int %) "abcdefghijklmnopqrstuvwxyz")
        best-match (reduce
                    #(if (> (second %1) (second %2)) %1 %2)
                    (for [a lowers
                          b lowers
                          c lowers]
                      [(str (char a) (char b) (char c))
                       (num-ascii-chars (decrypt-xor chars [a b c]))]))
        decrypted (decrypt-xor chars (map #(int %) (first best-match)))]
    (println "key: " best-match)
    (println "decrypted: " (apply str (map #(char %) decrypted)))
    (reduce + decrypted)))

(time (euler-59 "/path/to/cipher1.txt")) ;; "Elapsed time: 21275.297615 msecs"

We were able to safely decrypt the encrypted message (obviously a red herring planted to disguise their true intentions…it mentions nothing about an attack on our fortified encampment) in under a minute this time, but what if our scheming sisters start using longer keys with which to communicate amongst themselves? The complexity of this type of exhaustive search is going to grow exponentially with each new digit they add! We will be buried in computational complexity long before we learn of their true plans.

We need to find an algorithm that works in linear time if possible, and improve the scoring system that we have been using. A simple way to do this for substitution ciphers is to perform a letter frequency analysis; in written English, vowels appear far more often than consonants like q or z, and we can make use of this fact to select the letter that maximizes the likelihood of the decrypted text having a letter frequency similar to English.

Since we know that our particular rotating cipher uses a 3-letter key, we can make three lists and solve for each digit independently. The most likely candidate for the key is thus the letter that maximizes the letter frequency probability when XOR’d over the list.

(def letters "abcdefghijklmnopqrstuvwxyz") 

;; For English text. From http://en.wikipedia.org/wiki/Letter_frequency
(def letter-freq {\a 11.602, \b 4.702, \c 3.511, \d 2.670, \e 2.000,
                  \f 3.779,  \g 1.950, \h 7.232, \i 6.286, \j 0.631,
                  \k 0.690,  \l 2.705, \m 4.374, \n 2.365, \o 6.264,
                  \p 2.545,  \q 0.173, \r 1.653, \s 7.755, \t 16.671
                  \u 1.487,  \v 0.619, \w 6.661, \x 0.005, \y 1.620,
                  \z 0.050})

(defn xor-chars [a b]
  (char (bit-xor (int a) (int b))))

(defn char-score [c s]
  (reduce + (remove nil? (map #(letter-freq (xor-chars % c)) s))))

(defn best-char [s]
  (first (first (sort-by val > (zipmap letters
                                       (map #(char-score % s) letters))))))

(defn euler-59-revised [keylength file]
  (let [msg (map #(char (Integer/parseInt %)) (re-seq #"\d+" (rp file)))
        ks (apply str (for [k (range 0 keylength)]
                        (best-char (take-nth keylength (drop k msg)))))
        decrypted (apply str (map xor-chars msg (cycle ks)))]
    (println "key: " ks)
    (println "decrypted: " decrypted)
    (reduce + (map #(int %) decrypted))))

(time (euler-59-revised 3 "/zzz/work/cipher1.txt"))
;; "Elapsed time: 27.96454 msecs"

Much better! Now if those pesky girls add extra digits to their rotating XOR cipher, it only adds a linear increase in complexity to our decoding operation.

As an aside: getting the key with the maximum value out of a hash map isn’t as straightforward as I would like. For the curious, here is another idiom for computing best-char:

(defn best-char [s]
  (first (apply max-key second (zipmap letters (map #(char-score % s) letters)))))

And for our final trick, we can avoid even the linear increase in computation time by performing the computation of all the digits in parallel, meaning that finding keys up to n-digits long (where n is the number of cores in your computer) will take roughly the same amount of time as it takes to find a 3-digit key.

Here’s our final version, with the println’s stripped out and a pmap introduced. Note the single-character difference between the sequential version and the parallel version:

(defn euler-59-sequential [keylength file]
  (let [msg (map #(char (Integer/parseInt %)) (re-seq #"\d+" (rp file)))
        ks (map (fn [k] (best-char  (take-nth keylength (drop k msg))))
                 (range 0 keylength))
        decrypted (apply str (map xor-chars msg (cycle ks)))]
    (reduce + (map #(int %) decrypted))))

(defn euler-59-parallel [keylength file]
  (let [msg (map #(char (Integer/parseInt %)) (re-seq #"\d+" (rp file)))
        ks (pmap (fn [k] (best-char  (take-nth keylength (drop k msg))))
                 (range 0 keylength))
        decrypted (apply str (map xor-chars msg (cycle ks)))]
    (reduce + (map #(int %) decrypted))))

(dotimes [_ 5]
  (time (euler-59-sequential 3 "/zzz/work/cipher1.txt")))

"Elapsed time: 26.849201 msecs"
"Elapsed time: 25.795924 msecs"
"Elapsed time: 25.956343 msecs"
"Elapsed time: 26.187312 msecs"
"Elapsed time: 29.941494 msecs"


(dotimes [_ 5]
  (time (euler-59-parallel 3 "/zzz/work/cipher1.txt")))
"Elapsed time: 11.838932 msecs"
"Elapsed time: 11.529343 msecs"
"Elapsed time: 19.139515 msecs"
"Elapsed time: 11.350502 msecs"
"Elapsed time: 10.873593 msecs"

A simple change has more than doubled the speed of the algorithm, although not quite to the 3x faster. Regardless, we are roughly 2000 times faster than our original naive implementation!

The taste of success brought on by efficient, parallel decryption has bolstered my juvenile mood and I will now take this moment to laugh maniacally: Muhuhuhahahaha!

Art is Never Finished

...Only Abandoned. – Leonardo da Vinci

Project Euler Problem 59