Breaking Civil
War Codes and Encryption
by Lee A. Taylor
I will deal with 2 issues
here, one is recovering the correspondence of flag waves in wig-wag encoding,
as used by both sides during the War for Southern Independence (sometimes known
as the American Civil War) and the other is much more complicated, dealing with
the reading of encrypted messages.
FIGURING OUT
THE ENCODING OF LETTERS USING FLAG WAVES
The recovery of the
correspondence between flag waves and the underlying letters is pretty straightforward
and could be easily done by an experienced signals man, even over 140 years
ago. All that it requires is that you have the ability to translate the 3 flag
waves with corresponding pauses to numbers, to get sequences from each message
sent such as:
2-pause-1-1-1-pause-1-3-1-pause-2-1-pause-1-pause-1-1-1-1-pause-1-2-2-1-3-1-1-2-pause-2-2-3-1-1-pause-2-pause-2-pause-1-1-pause-1-2-1-2-pause-2-1-2-2-pause-1-1-2-pause-2-1-pause-2-1-1-2-3-3-3
NOTE: I purposefully did not
use any preconcerted codes or short endings, as that complicates the discussion
slightly. This message can be parsed into "letter groups" by using
the pauses and 3's as the separators between "letter groups". Doing
this, we see that
1 occurs 3 times
2 occurs 3 times
11 occurs 2 times
12 occurs 0 times
and so on…
After collecting enough
messages, you could use the monographic distribution of letters in the English
language (i.e., the probability of an E occurring in text, and the probability
of T occurring in text, and so on and compare it to the counts to find out
which letter is which, see table below for an example distribution from a
corpus of English text).
|
LETTER |
% time letter occurs among all letters in sample text |
|
E |
12.350% |
|
T |
9.116% |
|
N |
8.445% |
|
O |
8.327% |
|
A |
7.986% |
|
I |
7.904% |
|
R |
6.845% |
|
S |
5.505% |
|
H |
5.281% |
|
L |
4.364% |
|
D |
3.823% |
|
C |
3.093% |
|
F |
2.646% |
|
U |
2.329% |
|
M |
2.235% |
|
G |
1.964% |
|
Y |
1.952% |
|
P |
1.788% |
|
B |
1.282% |
|
V |
1.176% |
|
W |
0.882% |
|
K |
0.212% |
|
Q |
0.188% |
|
J |
0.176% |
|
X |
0.082% |
|
Z |
0.047% |
A much more familiar way to
solve this puzzle would be to assign letters to each group, such as A for 1, B
for 2, C for 11, D for 12, E for 21, F for 22, G for 111, H for 112, I for 121,
J for 122, K for 211, L for 212, M for 221, N for 222, O for 1111, P for 1112,
Q for 1121, R for 1122, S for 1211, T for 1212, U for 1221, V for 1222, W for
2111, X for 2112, Y for 2121, Z for 2122, 0 (since I ran out of letters) for
2211, 1 for 2212, 2 for 2221 and 3 for 2222 and then the above message
translates to
BGA AEAOU HF CBBCTZHEX
After you've collected
enough of these messages, the problem becomes EXACTLY like solving Cryptograms
in the newspaper or in a puzzle book.
BREAKING THE
ENCRYPTION USED BY THE SOUTH DURING THE WAR FOR SOUTHERN INDEPENDENCE
First, I need to discuss the
method of encryption used by The South and warn you that this section (after
the description of the method of encryption) gets pretty heavy into the
mathematics. The encryption scheme uses the following array of letters:
A
B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B
C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C
D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D
E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E
F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F
G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G
H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H
I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I
J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J
K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K
L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L
M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M
N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N
O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O
P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P
Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q
R S T U V W X Y Z A B C D E F G H I J K L M N O P
R
S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S
T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T
U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U
V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V
W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W
X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X
Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y
Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z
A B C D E F G H I J K L M N O P Q R S T U V W X Y
Once you have this array in
hand, you next need a code phrase (the above table was the only table that
appears to have been used during The War). Let's use the key phrase
COME RETRIBUTION
Next, remove the spaces to get:
COMERETRIBUTION
Now, if you want to encode
the message
THE ENEMY IS ATTACKING
you would first remove all of the spaces from the
message to get
THEENEMYISATTACKING
and then pair the letters of your message (the
"plain text") up with the code phrase (the "key") to get
COMERETRIBUTION
THEENEMYISATTACKING
Notice
that there are not enough letters in our key phrase to match up to all of the
letters of the message we wish to send. So, the procedure is to repeat the key
phrase as many time as necessary. For this example, you get:
COMERETRIBUTIONCOME
THEENEMYISATTACKING
The letter in the top row tells you which COLUMN to
go to in the encryption array and the corresponding letter in the bottom row
tells you which ROW to go to in the encryption array. So, for example, the
first C of COMERETRIBUTION and the T of THE line up, so you would look up the
character in column C of the array and row T, which is the letter V, which is
the first letter of the encrypted message that you will send (the encrypted
message is called the "cipher text"). The first O of COMERETRIBUTION
and the H of THE line up, so the next letter in the encrypted message comes
from COLUMN O and ROW H, which is the letter V. Continuing in this way, we get
the following letters:
|
COLUMN |
ROW |
CIPHER TEXT |
|
C |
T |
V |
|
O |
H |
V |
|
M |
E |
Q |
|
E |
E |
I |
|
R |
N |
E |
|
E |
E |
I |
|
T |
M |
F |
|
R |
Y |
P |
|
I |
I |
Q |
|
B |
S |
T |
|
U |
A |
U |
|
T |
T |
M |
|
I |
T |
B |
|
O |
A |
O |
|
N |
C |
P |
|
C |
K |
M |
|
O |
I |
W |
|
M |
N |
Z |
|
E |
G |
K |
Which gives the encrypted message:
VVQIEIFPQTUMBOPMWZK
which you could put the original spacing back in to
get
VVQ IEIFP PQ TUMBOPMWZK
Now, I know this looks like
it is probably a pretty good method of encryption and, if the code phrase is
changed often enough (like daily!) and messages are kept very short, it can be
(if you have no computers).
The first HUGE flaw in this
scheme is in how it was used by the Confederates. For common usage, there were
only 3 key phrases ever used! They were
COMPLETE
VICTORY
COME
RETRIBUTION
MANCHESTER
BLUFF
I’ve
read numerous references that hint at the use of private keys used by various
spies, such as Belle Boyd and Rose O’Neal Greenhow. I’ve never seen what these
keys actually were, they each supposedly only ever had one, and only one or two
other people actually knew them (e.g., Major/Lt. Col. William Norris, Chief of
the Confederate Signal and Secret Service in Richmond, would have been one of
the people who knew the codes). Anyhow, with only 3 key phrases, it wasn’t too
hard to figure out how to break the code, once you had solved a few messages.
But the Confederates made it even easier! Before sending text enciphered with
one of the 3 keys above, they would send a plaintext message with the key
phrase in it! For example, “With the outcome of this battle, complete victory
is ours”. The signal officer at the other end, knowing what the 3 possible keys
were, could easily pick out which one was used and decipher the message. It is
important to note that only the 30-40 signal officers in the Confederate Signal
and Secret Service (ranging in rank from Sergeant to Captain, except for
Norris) were actually taught the encryption method.
The technique I’m going to describe in this article
is actually not a 21st century or even a 20th century
technique. During the Civil War, if one of the Yankees had been up on his
mathematical literature from Prussia, he would have found the following
article:
Friedrich Kasiski, Die Geheimschriften und die
Dechiffrierkunst ("Secret writing and the Art of Deciphering"),
1863.
that
described a complete attack on the Vigenčre cipher as used by the Confederates.
If you search for “Kasiski test”, on the Internet, you can find a number of
readable versions of his algorithm that (at their core) use a higher-order
language model to solve the cipher. It turned out that Charles Babbage had
actually developed this attack in 1854. However, due to the need for
Information Security in Great Britain, it was decided not to publish his
results (the Russians also used the Vigenčre cipher, and the English didn’t
want them to stop).
TACTICAL VS.
STRATEGIC
Before proceeding, let’s stop for a reality check.
If you are a field commander and you would like an artillery unit a mile away
to open fire, do you encrypt the message? Probably not (at least not back then,
encrypted radios make this a triviality today). First, it would take too long
to encrypt the message; you want the artillery to open fire NOW! Second, the
lifetime of the message is very short, within a few moments after sending the
message; the Yankees are going to know what the message was, so it doesn’t
really matter if they are able to read it five minutes after intercepting it.
Encryption was primarily reserved for messages of strategic importance (e.g.,
troop movements over the next few days or supply line information). These
messages tended to be longer and the enemy would be willing to spend more time
decrypting them. It is ratio of the length of the message to the length of the
key that is going to be the main approach in breaking this code.
For a mathematical diversion
(which could lead to a more sophisticated attack), let me point out something
of the mathematical structure of this code. If you’ve studied modular
arithmetic, you might have recognized the structure in the array of letters
used to encrypt in the Vigenčre cipher. Except for a handful of mathematicians,
modular arithmetic was not commonly taught at the time (it’s main use was in
Number Theory, see, for example, Carl Friedrich Gauss’ 1801 text, Disquitiones
Arithmeticć). Notice that if we assign the number 0 to A, 1 to B, and so on
up to 25 for Z, then the "letter" (encoded using the numbers 0, 1, …,
25) in row i and column j of the array, can be figured out by the simple
formula
(i + j) (mod 26)
For example, in encoding the first letter of our
message, we went to ROW T (the T from the word THE in the plain text) and
COLUMN C (C from COMERETRIBUTION). T is the 20th letter of the
alphabet, so it has the code 19 and C is the 3rd letter, so it has
the code 2 (remember, we started with A=0 NOT A=1). So the letter at position
(T,H)=(20,2)=19+2(mod 26)=21(mod 26)=21, which corresponds to the letter V,
which is exactly what we got before!
Now, I'm going to assume that both the code phrase
and the message were both in American English as spoken during The War (the
method gets a little more complicated when shorthand, or the preconcerted,
codes are introduced, but not too much and the general technique will still
work, so I'll stick with the easy case). This means that we can use the
monographic distribution table found earlier in this article (for different
languages, you can find appropriate distributions on the Internet or use a
large body of text in the language and compute estimates for the probabilities
yourself).
Before getting to the point, though, where we can
use the distribution of letters in English, we’re going to need to figure out
how long the key phrase is. Kasiski’s approach simply notes that in
Indo-European languages, there are a lot of common trigraphs (3 letters
occurring in a row). In English, “THE” is the most frequent trigraph (not only
is it a word, but it occurs in common words such as THEre, THEir, farTHEr,
norTHErn, moTHEr, souTHErn, etc.). Kasiski’s approach is to find how far apart
repeated trigraphs are in the encrypted message and stare at these to figure
out how long the key phrase is. Let’s use the following encrypted message as
our example:
JSMHHYTIBFLLIFZACRRFVMYMSHOQFTKBUEAYGVXNGTRURPXQFJXNRZUWHUANPRURXGTMIMLROSAGFMPPSNIVPNXACSCBPEDXHUIZBTDSWWGFFVIGIMDYBDSQCGDIXEKUAUBXXIEEVMWVSYKWCUVKCSQFKSLVFVVTOIXCFKBSXYEMTWOZXLSECHQQFRXPEJFEVCGDSFEBIGZINQBTZVPURSICHLZDIFUWFUODMVWHIYVUKBSEOOEXVVLKWQOKKVNUSFLZWMFJBWVWOAFZQXKLXDMOAXBWGHFAQKLXDJVNBKOAJOHIESMYQOALMWMGRNCKLXDMOCYOSAGFMPYSHBMSMTZALTSYEZRLZVBWMQJRACGGRREVIWYMECOTWSEUILKWXUMKVUKAMRUABKPELTEKVVVFLVXAIMFIMPSEUPGXJLHLTEBXVCGCDBIRVMFJFGHDWAIBAVKLPRZECMPWAMMAYYEWSMUNXZKVVVPVRAMYQTMBLSBHHTIDSNEBBCGBCZQFDSNRBXPUWKWGFCHELVTAVZEMMWKAPSJXUERRVEGHDSBXSDXFJKVLFLBKYGQKZCFYPZTMBHESIGFNIRFEVBPDNLURYVQXYIKPWVWTVDNUGMVFYGUBIYBZOEOMIMKLHLBICGLSECBOIUSBEOUBXUOYNHTIUEFROFSHCQNPOZHTVHJAUBXZWIGFQEJXHWBIYFWIAVOURJMGVQUBXZQNUSMJKIKTZPMLQBTVVQVZZXIGPOFCGGOCHIFRTELGYXTHUGFUKYXHWMXYETGGTCATJGHCTFWMQBTKBRSIQTKQPHIZCIKGUSEWVXQWYBVGGTIOXZSGJBPNAMQBOAMRUIKFNUBXJFVIOPIJPXWBCYAQBQVCIEKGAKPFZEIBXCBPVVEKFNUBXIFZAOZHZRMYMFPXVHBHHTIVRXDGMYTDWAIHTIZVYIWONKMHVTSRVFQMYMNINVHNKBEAVWMFNUBXAVRPOZHFEACMBPBVUFWTRMTMXEBQCVSSGUHAKLEKUBIYIIGFGGMRUFKZVHCGOSIGFKXYMGXKMYTVOYQBSXYIORTMYRKZBUWZKLTHEBIYKMOEQTFLVEKDGBMKMUNTREXYIFFDFGXVHFQTFLVXPFJSCZIRRUCRXYIXEMNSFWJVPUFSNEKUEBLKMBGQBFLVGHDUBHWMFBHHTISVBXIEYLBCOGZQJKMGKPFGHCBGCWZWDYLKLPQAIHUGQMRKSVFCONXZOPVHTIDFNKQUBBVYGJSESFRXIGPOVZCFUWZXFQTIGMUGLOSVSDXFQHIZPQMPSOGHFIIXAVUPPXUSAVGAJVAXCTTWHZDFCFQEJWMRBFXBVALHCDQVVEVBUYKPWYNGRMIWMUQWCLQCAYWXPIITTPUBXXCGQAMGKSWRGBHWTCAIGFVVIMNQMFYWZYQKFSDSKIWXVXEOGEVRYCEGUKJLVCAFRSOXZRTCTZINZABXSYIEXLZINPXZMEGGBITXYLTMSTVRGTIXCPSNIASYEMSTGBQVRP
The text of this message is a message sent to
General Stuart by General Lee just before the Battle of Gettysburg, as we shall
see shortly. Anyhow, we get the following distribution of distances between
common trigraphs in the enciphered text (COMERETRIBUTION is 15 letters long):
|
DISTANCE |
# common trigraphs
DISTANCE letters apart |
|
1 |
0 |
|
2 |
0 |
|
3 |
0 |
|
4 |
0 |
|
5 |
0 |
|
6 |
0 |
|
7 |
0 |
|
8 |
0 |
|
9 |
0 |
|
10 |
0 |
|
11 |
0 |
|
12 |
0 |
|
13 |
2 |
|
14 |
0 |
|
15 |
4 |
|
16 |
0 |
|
17 |
1 |
|
18 |
1 |
|
19 |
1 |
|
20 |
0 |
|
21 |
0 |
|
22 |
0 |
|
23 |
0 |
|
24 |
0 |
|
25 |
0 |
|
26 |
0 |
|
27 |
1 |
|
28 |
0 |
|
29 |
0 |
|
30 |
9 |
|
31-40 |
0 |
|
41 |
1 |
|
42-44 |
0 |
|
45 |
12 |
|
46 |
1 |
|
47-59 |
0 |
|
60 |
4 |
|
61-70 |
0 |
|
71 |
1 |
|
72 |
0 |
|
73 |
1 |
|
74 |
0 |
|
75 |
10 |
If
I were to continue, you would see that the count is much higher at multiples of
15 (15, 30, 45, 60, 75, 90, 105, 120, 135, …) than anywhere else. This leads us
to guess that the key phrase is 15 letters long (REMEMBER: we don’t know what
the key phrase is yet if we’re the Yankees).
The next step is to divide the encrypted message
into 15 groups, one for each letter of the (as yet unknown) key. The first
group will consist of the first letter, the sixteenth letter, the 31st
letter, the 46th letter, etc., as each letter in this group will
have been encrypted with the same letter of the key phrase. Similarly, the
second group will consist of the second letter, the 17th letter, the
32nd letter, the 47th letter, etc. and so on. The 15
groups are, then, as follows:
JAKPPGCWCEQKCDPUOUFHJGGTATKVUCIMVHQCPXQGYUOCNPGVUVOGTKKTOIVCAHITKPWUGGQUQTQUPQHGCGVJUVGVCHNYQIQERXGTG
SCBXRFBGGVFBHSUOOSZFORFSCWAVPDBMVHFHSSKFVGMBHOFOSVCFCBGIAOCBOHHSBOTHGFBWTRTCUBHZWQHSWSHGFCGWAGKVSSGIB
MRUQUMPFDMKSQFRDEFQAHNMYGSMFGBAAPTDEJDZNQMIOTZQUMQHUARUOMPIPZTTREZRAMKSZFEFRFFTQZMTEZDFAQDRXMFFROYBXQ
HREFRPEFIWSXQESMXLXQICPEGERLXIVYVISLXXCIXVMIIHERJVIKTSSXRIEVHIIVAHMKRXXKLXLXSLIJWRISXXIJEQMPGVSYXIICV
HFAJXPDVXVLYFBIVVZKKEKYZRUUVJRKYRDNVUFFRYFKUUTJJKZFYJIEZUJKVZVZFVFTLUYYLVYVYNVSKDKDFFFIVJVIIKVDCZETPR
YVYXGSXIESVERICWVWLLSLSRRIAXLVLEASRTEJYFIYLSEVXMIZRXGQWSIPGERRVQWEMEFMITEIXIEGVMYSFRQQXAWVWISISERXXSP
TMGNTNHGKYFMXGHHLMXXMXHLELBAHMPWMNBARKPEKGHBFHHGKXTHHTVGKXAKMXYMMAXKKGOHKFPXKHBGLVNXTHAXMEMTWMKGTLYN
IYVRMIUIUKVTPZLIKFDDYDBZVKKILFRSYEXVRVZVPULERJWVTIEWCKXJFWKFYDIYFCEUZXREDFFEUDXKKFKIIIVCRVUTRNIUCZLI
BMXZIVIMAWVWEIZYWJMJQMMVIWPMTJZMQBPZVLTBWBBOOABQZGLMTQQBNBPNMGWMNMBBVKTBGDJMEUIPLCQGGZUTBBQPGQWKTITA
FSNUMPZDUCTOJNDVQBOVOOSBWXEFEFEUTBUEEFMPVIIUFUIUPPGXFPWPUCFUFMONUBQIHMMIBFSNBBEFPOUPMPPTFUWUBMXJZNMS
LHGWLNBYBUOZFQIUOWANACMWYULIBGCNMCWMGLBDWYCBSBYBMOYYWHYNBYZBPYNIBPCYCYYYMGCSLHYGQNBOUQPWXYCBHFVLIPSY
LOTHRXTBXVIXEBFKKVXBLYTMMMTMXHMXBGKMHBHNTBGXHXFXLFXEMIBAXAEXXTKNXBVIGTRKKXZFKWLHAXBVGMXHBKLXWYXVNXTE
IQRUOADDXKXLVTUBKWBKMOZQEKEPVDPZLBWWDKELVZLUCZWZQCTTQZVMJQIIVDMVAVSIOVKMMVIWMMBCIZVZLPUZVPQXTWECZZVM
FFUASCSSICCSCZWSVOWOWSAJCVKSCWWKSCGKSYSUDOSOQWIQBGHGBCGQFBBFHWHHVUSGSOZOUHRJBFCBHOYCOSSDAWCCCZOAAMRS
ZTRNASWQESFEGVFENAGAMALROUVEGAAVBZFABGIRNEEYNIANTGUGTIGBVQXZBAVNRFGFIYBENFRVGBOGUPGFSOAFLYAGAYGFBEGT
In each group, see which letter occurs the most
often, and guess that it is the encrypted form of the letter “E”. This first
attempt gives us the following key phrase:
COBERADEXQUTROX
Now, we could do one of 2 things here. We could
either attempt to decrypt with this key phrase and the fix the wrong letters in
the decrypted message and try again, or do something more sophisticated, like
look for the entire letter distribution in each group and see which possible
key letter gives the best decrypt. Let’s try the first way and see what
happens:
HELDQYQEEPRSRRXYOQNOVJUPCNVZRRINTAJYDRAXMAAGPNJPBSXKNCECODMLNDTNGGQILWRYXEYERLLYSKEYZTEJOQANOAMXEQLJHAMEUUSEBEIDEPNEIMEOASCEGEHQDEHEGUCCHLSESVGZMACTOQORJOUVCRYDUPGODINRTHEJPZYFEUECATPMORULHTLLEOEBEEAKIDVLXWICLTNGQORCEHCNOMDIDSACIEWEEBFARKECMADTEVIGZAURTHLSEEHIWJBMLCCFAYDLPTTLUZPYGEKIEFRZMTLUZMFTITAYHAGENSJUTYGSVIKEDMYTLUZPYIFXEYERLLHSEXPCSAIMJREXAIRIVYLCTZVPYOFCARBRLGETNOMRIRADIIGZHATTHSIMLNDAYGSORANWTTHEHEXXEPPOTYECSBFTSLEHWOHEEOEAPAEAVJBMPMOMIYGNZRTLMNCOITYIYKYZUHETOPETEIWTTHORAAJUTDSIUEZFTSEMSKAELINKOXORCOWRYTSECRFSDATDHETXRCOSTFWYNEITDEONYOMOMEZVECTOJHROPRITKEOWYYOYMVWWHONEGERMEAFBREZJUUGPWHPTHIHLZFCAEPLSSLROYDQESEIIACMYHITLEHESINUECANNEDSYAREHEDAWLTSEDECNRPYOLCLNAYDCVEFDEHEIIGERPASXESESEMFUYTATNSMDRTEHEICLSELFTIHPCZSSZNRTHPRIZUEJZUMLSEMOGEORQAOQEECTSERTGHXESPHELCSEROZPSGEYWPCTZNRINQORQQGTZNPIOGISTONWSTTGEIESERUNTISDFEZTHVCZMMLNDIHBQEHESRTGAOESPUSEMEHZNOTOHATGXGSPFLRNVANOREEHBQEHERRXYAYDIRJUPPVEETZFTSEERUZJWEAMIYGTSEIVVEZYTRVTTREQROQJUPXOUETLINDWEWJBQEHEJHPNAYDOEXYPLVIEGDUFQICMUAEAICBEESTZGUEHQESEPRSDESLNDFHVYRINXEGERJTHMDTNWEAEAWONRTHILNWWEYTLZSIYGUTEAESERVACOFEHEEHZJLSRVGLRDDTHICBGPMEETDOFEHEXMBMCIGRDPSOQTHIUAPXYMFVTNGEOWEHQHLRRVNEONEHEGEZXLNDVRZFTSEBVYTLOESKOMELPFTMDGSPMOLNEAIYSMYIGOZWHRTSECLNTSSBFYTEIANTTSEMFKGTEHIEKEHEDOORUEJZUCIODSIYTOQQEJWANUAQTECTOQEECZWTYEMETEERXXRXZVEDEYTSZFEAUYWDCOIPDARPASWJNEPDIEMJFOCMEVBREEERYIWLSQIRWJQTGISZOYWIWLRIQPSEHEGOEOMLCTSTNJLNDCOYGSEREIJJTWLFFLWOWEOMSHEZHBENAECHQULEDQNTRCLMDPENTIRQYWJOUIMZVEXENXIVLXVEIYCESAECXVHWWYAEDERUWYYSKEDCELVERENPRAP
This isn’t quite so easy to figure out. So now,
let’s go back and try the second way, where we declare a key letter to be good
if, when decrypting its corresponding group, we get something that looks like a
distribution where “E” is the most common letter, “T”, the second, and so on.
Using this technique, we get the following guess at a key phrase:
COMERETRIBUTION
To see this entire attack at work, here is a C
program (for Microsoft .NET) that will take you through the entire process
described in this article (no comments in the code, see if you can figure it
out):
#include "stdafx.h"
#include "stdio.h"
#include "string.h"
#include "conio.h"
#include "dos.h"
#using <mscorlib.dll>
#include <tchar.h>
#include <ctype.h>
double monoprobs[26] = { 0.07986, 0.01282, 0.03093,
0.03823, 0.12350, 0.02646, 0.01964, 0.05281, 0.07904, 0.00176, 0.00212,
0.04364,
0.02235,
0.08445, 0.08327, 0.01788, 0.00188, 0.06845, 0.05505, 0.09116, 0.02329,
0.01176, 0.00882, 0.00082, 0.01952, 0.00047 };
using namespace System;
char message[5000] = "Headquarters Army of Northern
VirginiaJune 23, 1863--5 p.mMaj. Gen. J. E. B. STUART, Commanding Cavalry:
General, Your notes of 9 and 10.30
a.m. to-day have just been received. As regards the purchase of tobacco for
your men, supposing that Confederate money will not be taken, I am willing for
your commissaries or quartermasters to purchase this tobacco and let the men
get it from them, but I can have nothing seized by the men. If General Hooker's
army remains inactive, you can leave two brigades to watch him, and withdraw
with the three others, but should he not appear to be moving northward, I think
you had better withdraw this side of the mountain to-morrow night, cross at
Shepherdstown next day, and move over to Fredericktown.You will, however, be
able to judge whether you can pass around their army without hinderance, doing
them all the damage you can, and cross the river east of the mountains. In
either case, after crossing the river, you must move on and feel the right of
Ewell's troops, collecting information, provisions, &c.Give instructions to
the commander of the brigades left behind, to watch the flank and rear of the
army, and (in the event of the enemy leaving their front) retire from the
mountains west of the Shenandoah, leaving sufficient pickets to guard the
passes, and bringing everything clean along the Valley, closing upon the rear
of the army.As regards the movements of the two brigades of the enemy moving
toward Warrenton, the commander of the brigades to be left in the mountains
must do what he can to counteract them, but I think the sooner you cross into
Maryland, after to-morrow, the better.The movements of Ewell's corps are as
stated in my former letter. Hill's first division will reach the Potomac
to-day, and Longstreet will follow to-morrow.Be watchful and circumspect in all
your movements. I am,
very respectfully and truly, yours,R. E. Lee General.";
int delta[5000];
char key[16] = "COMERETRIBUTION";
char encrypted[5000];
char atoffsets[15][1000];
void removespaces (char
*);
void encryptit (char *, char *);
void decryptit (char *s,
char *k);
char mostfreq (char *s);
char tryaletter (char
*s);
int _tmain(void)
{
char x;
char
guess, keyguess[100];
int i, j,
k;
FILE *fout;
fout = fopen
("C:\\junkout.txt", "w");
removespaces (message);
fprintf (fout, "Message with spaces
removed: %s\n", message);
encryptit (message, key);
fprintf (fout, "Encrypted with '%s':
%s\n", key, message);
strcpy (encrypted, message);
for (i = 1;
i < ((int) strlen (encrypted)) - 1; i++)
{
delta[i
- 1] = 0;
for (j = 0; j < ((int)strlen
(encrypted)) - i - 3; j++)
if (strncmp (&(encrypted[j]),
&(encrypted[j+i]), 3) == 0)
delta[i - 1]++;
}
for (i =
1; i < ((int)strlen (encrypted)) - 1; i++)
fprintf
(fout, "delta[%d] = %d\n", i, delta[i - 1]);
for (i =
0; i < 15; i++)
{
for (j = 0, k = 0; (i + j) < strlen (encrypted); j
+= 15, k++)
atoffsets[i][k]
= encrypted[i + j];
atoffsets[i][k]
= '\0';
fprintf
(fout, "Group %d = %s\n", i, atoffsets[i]);
x
= mostfreq (atoffsets[i]);
guess
= ((char)(((int)(x
- 'E' + 26)) % 26)) + 'A';
keyguess[i]
= guess;
fprintf
(fout, "Most frequent letter = %c, so guess at key letter is %c\n",
x, guess);
}
keyguess[15] = '\0';
strcpy (message, encrypted);
decryptit (message, keyguess);
fprintf (fout, "Attempted decrypted
message is: %s\n", message);
for (i =
0; i < 15; i++)
{
x
= tryaletter (atoffsets[i]);
keyguess[i]
= x;
fprintf
(fout, "Guess at key letter is %c\n", x);
}
fclose (fout);
}
void removespaces (char
*s)
{
int i, j,
slen;
slen = strlen (s);
for (i =
0, j = 0; j < slen; j++)
{
if (isalpha (s[j]))
{
s[i]
= toupper (s[j]);
i++;
}
}
s[i] = '\0';
}
void encryptit (char *s,
char *k)
{
int i, j,
klen, slen;
slen = strlen (s);
klen = strlen (k);
for (i =
0, j = 0; j < slen; j++)
{
s[j]
= ((s[j] + k[i] - 2 * ((int)'A')) % 26) + (int)'A';
i++;
if (i >= klen)
i
= 0;
}
}
void decryptit (char *s,
char *k)
{
int i, j,
klen, slen;
slen = strlen (s);
klen = strlen (k);
for (i =
0, j = 0; j < slen; j++)
{
s[j]
= ((s[j] - k[i] + 26) % 26) + (int)'A';
i++;
if (i >= klen)
i
= 0;
}
}
char mostfreq (char *s)
{
char
retval;
int
freqs[26], i, max;
for (i =
0; i < 26; i++)
freqs[i]
= 0;
for ( ;
*s; s++)
freqs[(*s)
- 'A']++;
max = 0;
for (i =
0; i < 26; i++)
{
if (freqs[i] > max)
{
max
= freqs[i];
retval
= (char) (i + 'A');
}
}
return
(retval);
}
char tryaletter (char
*s)
{
char c,
retval;
double
weights[26];
double
highscore, score;
int
distro[26];
int i,
slen;
slen = strlen (s);
highscore = 0;
for (c =
'A'; c <= 'Z'; c++)
{
for (i = 0; i < 26; i++)
distro[i]
= 0;
for (i = 0; i < slen; i++)
distro[(s[i]
- c + 26) % 26]++;
score
= 0;
for (i = 0; i < 26; i++)
score
+= distro[i] * monoprobs[i];
if (score > highscore)
{
highscore
= score;
retval
= c;
}
}
return
(retval);
}
YANKEE
ENCRYPTION (THE ROUTE CIPHER)
The North didn’t use the Vigenčre Cipher. For the
most part they used a Route Cipher, which the South never broke. Union messages
were often put in newspapers in the South with rewards offered for their
solution. The key components to the Route Cipher as used by the North was:
- common political, military and geographic terms
were substituted with more common words (for example, Abraham Lincoln might be
an “oak tree” and Washington might be “brick”)
- the words of the message (after the above
substitution) were written out in a rectangular matrix of some prescribed shape
(sometimes pad words were added which had nothing to do with the message, just
filler to complete the rectangular shape)
- a prescribed path through the rectangular matrix
(usually things like “up the second column”, then “across to the right”, etc.)
was followed and the words copied out in this new, jumbled order
I think this could have been solved by 19th
Century techniques, but would have required a great deal of organization and a
fairly large number of methodical clerks, as counts would have to have been
kept across hundreds of messages and correlated with events of the time the
message was received to try to uncover what common terms had been substituted
(e.g., messages coming out of Washington are more likely to mention President
Lincoln or Secretary of War Stanton then a message coming out of Kentucky).
Major Norris could have even caused messages to be sent by having his spies in
Washington cause some sort of uproar and then see what messages came out. Had
the South known about the rectangular shapes, they could have tried various
spacings of words to put the messages back together again (if you go “up a
column”, the words will have originally been equally spaced apart, dependent
only on the number of words in each row of the matrix).