Stay ahead of learning milestones! Enroll in a class over the summer!

G
Topic
First Poster
Last Poster
Cracking the Vigenere Cipher knowing the keyword length.
fortenforge   0
Sep 7, 2009
VNZZNXVRBEGBJAZIETKPKFFXJSBFNMYEKVILKHXJAMZSYRCMZOGFFPRTVYIG
XAYZGFVNMFFMYEBDAZZNTKIHEEFVRZVTAIONXHMYETZDHWSVZEGTEMFAICAG
FNIRPXITAVNBKMHMELKOKVAEZSTKIHEIGJTHEEHIMXKAEFRXEEKXYMYEGZTU
IIGXSAFMXJTHDEGFRPFMXETAVNBKEEVVTKELKHXJTTEDTIDHWLBMIGXAGUAW
USMFTAVCHDFHITLFFEZFXKHBJILKHXVNZZNXVRLYIZYPKZVBCEZVHXIBXITA
FOOVR

Here is our ciphertext, we know that it was encrypted using a vigenere cipher of length 3.

We split the text into 3 parts, if we call the first letter $ 0$, the second letter $ 1$, the third letter $ 2$, and so on, the $ 0$th group has all letters $ 0 \pmod{3}$, the first group has all letters $ 1 \pmod{3}$, and the second group has all letters $ 2 \pmod{3}$.

VZVEJIKFJFYVKJZRZFRYXZVFYDZKEVVIXYZWZTFCFRIVKMKVZKEJEIKFEXYZ
IXFJDFFEVKVKKJEIWMXUUFVDIFZKJKVZVYYZCVIIFV

NNRGAEPFSNEIHASCOFTIAGNFEANIERTOHEDSEEAANPTNMEOASIITEMAREYET
ISMTERMTNEVEHTDDLIAASTCFTFFHIHNNRIPVEHBTOR

ZXBBZTKXBMKLXMYMGPVGYFMMBZTHFZANMTHVGMIGIXABHLKETHGHHXEXKMGU
GAXHGPXABETLXTTHBGGWMAHHLEXBLXZXLZKBZXXAO

Let us start with group 1.

VZVEJIKFJFYVKJZRZFRYXZVFYDZKEVVIXYZWZTFCFRIVKMKVZKEJEIKFEXYZ
IXFJDFFEVKVKKJEIWMXUUFVDIFZKJKVZVYYZCVIIFV

We know that all the letters in this group have been encrypted in a caesar cipher of a certain key, to find the key we can sort through all 26 keys and decide which one matches the frequency analysis of normal english text.

A B C D E _F _G H I J K _L M N O P Q R S T U V _W X Y Z
0 0 2 3 7 _13 0 0 9 7 12 0 2 0 0 0 0 3 0 1 2 15 2 5 7 12
8 2 3 4 13 2 _2 6 7 0 1 _4 2 7 8 2 0 6 6 9 3 1 _2 0 2 0

The first row is the percentages of frequency in the ciphertext and the second row is normal english. We see a row of 4 0's in the cipher text row, that probably corresponds to WXYZ in the english row. We have to shift the first 0 backwards 17 spaces to correspond to the W. But are we convinced that the key is 17? Well, when we shift the 15 in the first row back 17 spaces it corresponds to E, because E is the most commonly occurring letter in the English language and because 15 is the highest number in the first row, we can be pretty confident that 17 is the key. If A = 0 , B = 1 and so on 17 = R. R is the first letter of the keyword for the vigenere cipher. When decrypted in a caesar cipher of key R, our text becomes

EIENSRTOSOHETSIAIOAHGIEOHMITNEERGHIFICOLOARETVTEITNSNRTONGHI
RGOSMOONETETTSNRFVGDDOEMROITSTEIEHHILERROE.

When inserted back in our ciphertext in the correct place our ciphertext becomes

E**I**E**N**S**R**T**O**S**O**H**E**T**S**I**A**I**O**A**H**
G**I**E**O**H**M**I**T**N**E**E**R**G**H**I**F**I**C**O**L**
O**A**R**E**T**V**T**E**I**T**N**S**N**R**T**O**N**G**H**I**
R**G**O**S**M**O**O**N**E**T**E**T**T**S**N**R**F**V**G**D**
D**O**E**M**R**O**I**T**S**T**E**I**E**H**H**I**L**E**R**R**
O**E*

* represents an unknown letter.

If we examine the 2nd group

NNRGAEPFSNEIHASCOFTIAGNFEANIERTOHEDSEEAANPTNMEOASIITEMAREYET
ISMTERMTNEVEHTDDLIAASTCFTFFHIHNNRIPVEHBTOR

you will find that it's frequency analysis is already very close to the English language. This tells us that it was encrypted with a caesar cipher of shift 0. The corresponding letter is A. (If you don't believe me, use the same method that we did on the 1st group).

Now we can plug these letters strait back into our ciphertext we get:

EN*IN*ER*NG*SA*RE*TP*OF*SS*ON*HE*EI*TH*SA*IS*AC*IO*OF*AT*HI*
GA*IG*EN*OF*HE*MA*IN*TI*NE*ER*ET*RO*GH*HE*ID*FS*IE*CE*OA*LA*
ON*AP*RT*EN*TM*VE*TO*EA*IS*TI*NI*ST*NE*RM*TA*OR*NE*GY*HE*IT*
RI*GS*OM*ST*ME*OR*OM*NT*EN*TE*EV*TE*TH*ST*ND*RD*FL*VI*GA*DA*
DS*OT*EC*MF*RT*OF*IF*TH*SI*TH*EN*IN*ER*HI*HP*IV*LE*EH*RB*RT*
OO*ER

Now at this point we have several options. We could use the same method on group 3 to find the plaintext as we did on groups 1 and 2. We know 2/3 of the plaintext so we can guess with fairly good accuracy the rest of the plaintext. We also know what 2/3 of the letters in the keyword are. RA* is the current keyword. Because the keyword is usually an actual word, we can guess at the key letter for the last group. Possible endings are N for 'RAN', T for 'RAT', B for 'RAB' (the person sending the message might be a Harry Potter fan) or G for 'RAG'. We could then try each of these letters until we find the correct answer.

Using any of the methods you would probably find that the last key letter was 'T', and that the decrypted text was,

GEIIGAREITRSETFTNWCNFMTTIGAOMGHUTAOCNTPNPEHIOSRLAONOOELERTNB
NHEONWEHILASEAAOINNDTHOOSLEISEGESGRIGEEHV

Plugging this back into our partially decrypted message and formatting it a bit we get:

Engineering is a great profession. There is the satisfaction of watching a figment of the imagination emerge through the aid of science to a plan on paper. Then it moves to realisation in stone or metal or energy. Then it brings homes to men or women. Then it elevates the standard of living and adds to the comforts of life. This is the engineer's high privilege.
--Herbert Hoover
0 replies
fortenforge
Sep 7, 2009
0 replies
The Keyword Length
fortenforge   0
Sep 6, 2009
If you have been paying any attention to the mathematics of the vigenere cipher, you should realize that one variable we have always included is $ k$ for the length of the key word.

Why is $ k$ so important? If you know what $ k$ is can you decrypt the ciphertext?

It turns out, yes! If you know that $ k = 5$ decrypt the plaintext like this:

Split the ciphertext into $ 5$ parts. The first letter should go in the first part the second in the second part the third in the third part the fourth in the fourth part the fifth in the fifth part the sixth in the first part the seventh in the second part...

Basically split the plaintext into $ 5$ groups where if you assume the first letter is letter $ 0$, and the first group is group $ 0$, group letter $ n$ into the $ n \pmod{5}$ group.

Each of these groups is encrypted in a regular caesar cipher. If you can find the caesar cipher's key for one of the groups, you can find $ 1/5$th of the plaintext.

But how do you find the key for each of the groups? Well we know how to decrypt a caesar cipher, usually we just try all 26 possible keys and look for English text, but that will not work here. Each of the groups is a meaningless jumble of letters, when we find the correct key we will not know it.

We could use frequency analysis, we could try all 26 possible keys and see which one gives us the most frequency analysis accurate result. If we do this for each of the groups we can find that the keys we get for the individual parts are P, A, S, T, and A we can be sure that the key word for the vigenere cipher is PASTA meaning that we can decrypt it. I will show an example of this method in the next post.
0 replies
fortenforge
Sep 6, 2009
0 replies
Why frequency analysis does not work
fortenforge   0
Aug 30, 2009
Frequency Analysis works because there is a one to one correspondence between the plaintext alphabet and the ciphertext alphabet. If frequency analysis is going to work, the letter $ p$ should ALWAYS be encrypted as the letter $ c$. In a Vigenere cipher this does not occur. Depending of $ p$'s position in the plaintext, $ p$ could be encrypted as one of several letters. If the keyword has length $ 5$, then $ p$ could be encrypted as $ c_1,c_2,c_3,c_4,c_5$. This is not a one to one correspondence so frequency analysis does not work.

Let us say that the frequency of $ p$ in normal English was $ i$. If the key word was of length $ 1$, the frequency of $ c$ in the cipher text would be $ i$ as well. But if the key word was of length $ 2$, then the frequency of $ c_1 = i/2$ and the frequency of $ c_2 = i/2$. Basically frequency analysis works if there is one alphabet that corresponds to another alphabet in a 1 to 1 correspondence.
To find a method for cryptanalysis we need to be more creative.
0 replies
fortenforge
Aug 30, 2009
0 replies
Good stuff to know
fortenforge   0
Jul 21, 2009
Here are some things that will help you a lot when trying to decrypt a substitution cipher.

Order Of Frequency Of Single Letters

E T A O I N S H R D L U


Order Of Frequency Of Digraphs

th er on an re he in ed nd ha at en es of or nt ea ti to it st io le is ou ar as de rt ve


Order Of Frequency Of Trigraphs

the and tha ent ion tio for nde has nce edt tis oft sth men


Order Of Frequency Of Most Common Doubles

ss ee tt ff ll mm oo


Order Of Frequency Of Initial Letters

T O A W B C D S F M R H I Y E G L N P U J K


Order Of Frequency Of Final Letters

E S T D N R Y F L O G H A K M P U W


One-Letter Words

a, I.


Most Frequent Two-Letter Words

of, to, in, it, is, be, as, at, so, we, he, by, or, on, do, if, me, my, up, an, go, no, us, am


Most Frequent Three-Letter Words

the, and, for, are, but, not, you, all, any, can, had, her, was, one, our, out, day, get, has, him, his, how, man, new, now, old, see, two, way, who, boy, did, its, let, put, say, she, too, use


Most Frequent Four-Letter Words

that, with, have, this, will, your, from, they, know, want, been, good, much, some, time
0 replies
fortenforge
Jul 21, 2009
0 replies
Example of Decryption of a Substitution Cipher.
fortenforge   2
N Jul 21, 2009 by fortenforge
lnbd qnbh egpvskbm qnb vbdqbm so qnb cdgkbmpb, f asq so jbsjab lgaa ib egpfjjsgdqbe qs egpvskbm qnbh fmb dsq gq.
- ibmdfme ifgabh

Let us say that this is our ciphertext that we need to decrypt. We can tell that this is a quote by the "-" at the end. Ciphertext letters will be in lowercase and plaintext letters will be in uppercase.

'b' appears 17 times in the ciphertext. This is way more than any other letter so 'b' is likely to be 'E'. Let's replace all 'b's with 'E's.

lnEd qnEh egpvskEm qnE vEdqEm so qnE cdgkEmpE, f asq so jEsjaE lgaa iE egpfjjsgdqEe qs egpvskEm qnEh fmE dsq gq. - iEmdfme ifgaEh

The digraph 'qn' appears 4 times at the beginning of words, in all times it is followed by 'E'. The most common word in the English language is 'THE'. There are other words that begin with 'THE' such as 'THERE', 'THEY', and 'THEM'. This means that 'qn' is likely to be the digraph 'TH'. So 'q' = 'T' and 'n' = 'H'. Let's replace these letters:

lHEd THEh egpvskEm THE vEdTEm so THE cdgkEmpE, f asT so jEsjaE lgaa iE egpfjjsgdTEe Ts egpvskEm THEh fmE dsT gT. - iEmdfme ifgaEh

The 16th word is 'Ts'. The only 2 letter word that begins with 'T' is 'TO'. This means 's' = 'O'. Let's replace this:

lHEd THEh egpvOkEm THE vEdTEm Oo THE cdgkEmpE, f aOT Oo jEOjaE lgaa iE egpfjjOgdTEe TO egpvOkEm THEh fmE dOT gT. - iEmdfme ifgaEh

Now things become a little more complicated. The 9th word is a single letter word: 'f'. This means it can be only 'A' or 'I'. The 21st word is 'gT'. The only possible words are 'AT' or 'IT'. This means either 'f' or 'g' is 'A' or 'I'. To figure out which is which we can look at the 19th word 'fmE'. There is no 3 letter word that begins with 'I' and ends in 'E'. There is however a very common 3 letter word that begins in 'A' and ends in 'E'. The word 'ARE'. This is the more likely word. This gives us that 'f' = 'A', 'g' = 'I', and 'm' = 'R'. Now we should replace these in:

lHEd THEh eIpvOkER THE vEdTER Oo THE cdIkERpE, A aOT Oo jEOjaE lIaa iE eIpAjjOIdTEe TO eIpvOkER THEh ARE dOT IT. - iERdARe iAIaEh

The 9th and 10th words are "A aOT". 3 letter words that end in 'OT' are 'NOT', 'LOT', and 'ROT'. 'A LOT', is a much more common phrase than 'A NOT' or 'A ROT'. This tells us that 'a' = 'L'. Now we replace this:

lHEd THEh eIpvOkER THE vEdTER Oo THE cdIkERpE, A LOT Oo jEOjLE lILL iE eIpAjjOIdTEe TO eIpvOkER THEh ARE dOT IT. - iERdARe iAILEh

Now we can notice 2 things. Look at the twelfth word 'jEOjLE'. The only thing that it can be is the word 'PEOPLE'. This means 'j' = 'P'. Also we can notice that the 4th to last word, 'dOT' must be the word 'NOT' because of the words before and after it. This means 'd' = 'N'. Now we replace this:

lHEN THEh eIpvOkER THE vENTER Oo THE cNIkERpE, A LOT Oo PEOPLE lILL iE eIpAPPOINTEe TO eIpvOkER THEh ARE NOT IT. - iERNARe iAILEh

The only 12 letter word that has 'APPOINTE' in it is 'DISAPPOINTED'. So, 'e' = 'D', 'p' = 'S', and 'e' = 'D'.
The first word, 'lHEN' must be the word 'WHEN' because there is no other word that fits. So, 'l' = 'W'. The 5th word, 'vENTER' must be 'CENTER' for the same reason. So, 'v' = 'C'.
Replace these:

WHEN THEh DISCOkER THE CENTER Oo THE cNIkERSE, A LOT Oo PEOPLE WILL iE DISAPPOINTED TO DISCOkER THEh ARE NOT IT. - iERNARD iAILEh

Now it is trivial to find the actual plaintext:

WHEN THEY DISCOVER THE CENTER OF THE UNIVERSE, A LOT OF PEOPLE WILL BE DISAPPOINTED TO DISCOVER THEY ARE NOT IT. - BERNARD BAILEY. :rotfl:

Notice that we did not just use letter frequency analysis, we also used digraph analysis, and word analysis. We also built upon our previous discoveries. Just from guessing that 'b' was 'E', we were able to decrypt the whole ciphertext. This algorithm for decryption is very hard to computer program so most ciphers like these are better to be decrypted by hand using this same method.
2 replies
fortenforge
Jul 20, 2009
fortenforge
Jul 21, 2009
Frequency Analysis
fortenforge   0
Jul 16, 2009
If you have been looking at the shouts you will notice someone mentioned something called "frequency analysis".
Frequency Analysis is an algorithm to easily decrypt substitution ciphers.

To understand how it works, you have to first understand that in the English language, or in any language, some letters will appear more often than others and some will appear less often.

For example, in English the most common letters are E, R, S, T, and A. The least common letters are Q, X, Z, and J.

In the table below, you can see the frequency analysis for the first 2 paragraphs of this Entry, and the average frequency analysis for English. The first row is the alphabet, the second the frequency analysis for the first 2 paragraphs, and the third is the average frequency analysis for english.

A B C D E .F G H I J K L M N O P Q R S T U V W X Y Z
8 1 2 3 11 2 3 4 7 0 1 6 2 9 8 2 1 5 8 9 4 1 2 0 3 0
8 2 3 4 13 2 2 6 7 0 1 4 2 7 8 2 0 6 6 9 3 1 2 0 2 0


The most difference between the actual English text and the average for English is the letters E, N, H, and S in which the difference is 2. The same is true for any piece of English text. The frequency of the letters is always going to be the same. Of course the larger the text you have the more the text will correspond to the averages. It is just like a coin flip. The more times you flip the coin the more the percentage of Heads is closer to 50%.

If I encrypt the first two paragraphs with a Caesar cipher of key 1 I get:

JGZPV IBWFC FFOMP PLJOH BUUIF TIPVU TZPVX
JMMOP UJDFT PNFPO FNFOU JPOFE TPNFU IJOHD
BMMFE GSFRV FODZB OBMZT JTGSF RVFOD ZBOBM
ZTJTJ TBOBM HPSJU INUPF BTJMZ EFDSZ QUTVC
TUJUV UJPOD JQIFS TUPVO EFSTU BOEIP XJUXP
SLTZP VIBWF UPGJS TUVOE FSTUB OEUIB UJOUI
FFOHM JTIMB OHVBH FPSJO BOZMB OHVBH FTPNF
MFUUF STXJM MBQQF BSNPS FPGUF OUIBO PUIFS
TBOET PNFXJ MMBQQ FBSMF TTPGU FO.

Let's do the frequency analysis on this ciphertext:

A B C D E .F .G H I J K L M N O P Q R S T U V W X Y Z
0 8 1 2 3 .11 2 3 4 7 0 1 6 2 9 8 2 1 5 8 9 4 1 2 0 3
8 2 3 3 13 2 .2 6 7 0 1 4 2 7 8 2 0 6 6 9 3 1 2 0 2 0

It is clear from looking at the frequency analysis that the second row has been shifted by 1 column. This tells us that the message has been encrypted using a Caesar cipher of key 1. If we shift it back one column the numbers match up better. Now we can easily decrypt it. If I had used a substitution cipher, it would have been a little more difficult, but still doable. We could immediately notice that whichever letter had 11 occurrences, must be E, because on average E has 13 occurrences. We could move on to the other letters from here thereby decrypting the cipher. In the next post I will show you an actual example.

One note is that if the language of the plaintext had not been English, the average frequency analysis would have been different thereby making it difficult to decrypt it well.
0 replies
fortenforge
Jul 16, 2009
0 replies
No more topics!
an incenter problem
guangzhou-2015   3
N Aug 10, 2017 by hectorraul
Let $I$ be the incenter of $\triangle ABC,$ $AI \cap \odot ABC = M,$ Point $K$ satisfied $KB=IB,KC=IC,$$KM \cap \odot ABC = L,$Prove that: $IA=IL$
3 replies
guangzhou-2015
Aug 8, 2017
hectorraul
Aug 10, 2017
an incenter problem
G H J
G H BBookmark kLocked kLocked NReply
The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
guangzhou-2015
778 posts
#1 • 2 Y
Y by Adventure10, Mango247
Let $I$ be the incenter of $\triangle ABC,$ $AI \cap \odot ABC = M,$ Point $K$ satisfied $KB=IB,KC=IC,$$KM \cap \odot ABC = L,$Prove that: $IA=IL$
Z K Y
The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
hectorraul
361 posts
#2 • 1 Y
Y by Adventure10
Let $ML\cap BC=N$ and $AM\cap BC=D$, it is well known that $MI=MB=MC$ and with inversion centered at $M$ with radius $MI$ you get that $N$ becomes $L$ and $D$ becomes $A$, the rest is angle chasing.
Z K Y
The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
guangzhou-2015
778 posts
#3 • 2 Y
Y by Adventure10, Mango247
Who can give a pure geometry solution
Z K Y
The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
hectorraul
361 posts
#4 • 1 Y
Y by Adventure10
Actually, you can skip inversion in my above solution and use triangle similarity.
Z K Y
N Quick Reply
G
H
=
a