The Kasiski Examination
by fortenforge, Sep 27, 2009, 2:38 AM
The Kasiski Examination is a method for finding the key length of a polyalphabetic cipher. It was discovered independently by Charles Babbage and Friedrich Kasiski. Let us say that this is our ciphertext:
LKRRGOGSVWYEJUHDABDYWZOOOKROVOQDKCURQWFOMZGNGHWRSJHLTCWHSBGB
WCQELFDVWZHRDCQGAGWOGRDNVZROCSGDGKQOFSDSXOUAKWFOMZGTGKKEJSLT
TSQTABWHWIQDWFJRGKWHLVHNLCRKLVHOLVHRSGMUKHDSXOLRSBGHSJLNYDHR
ZOSSLVHBWHWEJQOAAAEEUOXSWWWWSGJRSGVYSBGWSBWEVKHAJHKOMUKAKTRR
LVDTLVHPSGVIFUWHWFHHSRZOJBWHWAUESZOYSPRULHKEKO
PESBGBGHKTZOWMGFQIFUHQMOOLQZDYABOESJHSFCVTWDKAVHUOVRHNTZDCCC
KICSSTLVHFAFVTXCUAFCWHWFGAQMHTCBRWABJHGKZAQZHAVGRNLCZAQWGOMP
WEVWIIKVRUDRHVWFFOESEAUYLSZOOLTSWEDZLNYHKIKKLTZOVIYVVOESZHWF
HAYSVAFRDGWGKEFQHTOCUOSRVDAJHRYSGIFOZOGRDNVWLTGCNTZSRNWZHSKH
UANSOEVPBAFRWHSHKAKADDWOOLLVHDATIEJSQCW
We look for repeated sequences in the ciphertext and we count the number of letters between those sequences. This is a list of all the repeated sequences and the number of letters between them. The number of letters between them includes the number of letters of the second repeated sequence. Look at the bolded letters in the ciphertext "WHW" We call the distance between them 12 not 9.
HDA 540
ZOO 411
WFO 68
FOM 68
OMZ 68
MZG 68
SJH 272
CWH 312
WHS 484
SBG 112
SBG 48
SBG 72
BGB 232
WZH 452
OGR 424
GRD 424
RDN 424
DNV 424
DSX 64
SXO 64
EJS 444
BWH 62
BWH 78
WHW 62
WHW 66
WHW 12
WHW 100
LVH 8
LVH 4
LVH 32
LVH 60
LVH 108
LVH 200
NLC 252
RSG 56
MUK 74
BGH 122
LNY 268
RZO 82
WWW 1
SGV 36
WEV 184
AJH 265
KAK 308
TLV 108
IFU 52
HWF 112
HWF 96
WFH 208
ESZ 189
SZO 159
TZO 156
OOL 124
OOL 116
OES 105
OES 36
NTZ 175
ZAQ 12
AFR 64
WFOM 68
FOMZ 68
OMZG 68
SBGB 232
OGRD 424
GRDN 424
RDNV 424
DSXO 64
BWHW 62
BWHW 78
TLVH 108
WHWF 112
HWFH 208
WFOMZ 68
FOMZG 68
OGRDN 424
GRDNV 424
Looking at these statistics you will find that 79% of the repeated sequences are multiples of four. This means that the length of the key is probably 4. Why? Well, in the plaintext there are going to be a lot of the word 'the'. What makes a vigenere cipher so strong is that 'the' will not be encrypted the same way all the time. Sometimes 'the' will become 'kde' and other times it might become 'pxh'. But , in a large enough plaintext 'the' will get encrypted the same way. When this happens we know that the key letter used to encrypt the first 'T' of the word 'THE' is the same as the one used to encrypt the second 'T'. Because the key repeats itself (KEYKEYKEYKEYKEYKE) over the whole plaintext we know that the distance between the first and second 'T' must be a multiple of the key length. It is not a good idea to just find all the repeated sequences and then just find the GCF of the distances. There will be repeated sequences that just occurred out of pure randomness. This is what makes the Kasiski Examination difficult to program. If you really wanted to program it, you could find the GCF of all repeated sequences that are 5 letters or more. These are not likely to occur just by chance so you will probably get the correct length of the key as the GCF.
Now knowing that the key length is 4, you should be able to decrypt it easily.
One note: you might notice here that a large number of the distances are multiples of 2 (even), similarly if the key length had been 6 then you would notice a large number of the distances are multiples of 3. There is no way to know exactly what the key size is so you can just try both 3 and 6. You should first try 6 because most key sizes are 4, 5, or 6.
The Kasiski examination was a revolutionary breakthrough in cryptography. The Vigenere cipher had been described as "Le chiffre indechiffrable" (The undecipherable cipher) but it had finally been broken. There are however many problems with the Kassiski analysis that makes it less useful then the Friedman Test. Finding repeated sequences is not easy. It becomes much easier with a computer but human interaction still must exist because some of the repeated sequences are random. The Index of Coincidence is a much better method.
LKRRGOGSVWYEJUHDABDYWZOOOKROVOQDKCURQWFOMZGNGHWRSJHLTCWHSBGB
WCQELFDVWZHRDCQGAGWOGRDNVZROCSGDGKQOFSDSXOUAKWFOMZGTGKKEJSLT
TSQTABWHWIQDWFJRGKWHLVHNLCRKLVHOLVHRSGMUKHDSXOLRSBGHSJLNYDHR
ZOSSLVHBWHWEJQOAAAEEUOXSWWWWSGJRSGVYSBGWSBWEVKHAJHKOMUKAKTRR
LVDTLVHPSGVIFUWHWFHHSRZOJBWHWAUESZOYSPRULHKEKO
PESBGBGHKTZOWMGFQIFUHQMOOLQZDYABOESJHSFCVTWDKAVHUOVRHNTZDCCC
KICSSTLVHFAFVTXCUAFCWHWFGAQMHTCBRWABJHGKZAQZHAVGRNLCZAQWGOMP
WEVWIIKVRUDRHVWFFOESEAUYLSZOOLTSWEDZLNYHKIKKLTZOVIYVVOESZHWF
HAYSVAFRDGWGKEFQHTOCUOSRVDAJHRYSGIFOZOGRDNVWLTGCNTZSRNWZHSKH
UANSOEVPBAFRWHSHKAKADDWOOLLVHDATIEJSQCW
We look for repeated sequences in the ciphertext and we count the number of letters between those sequences. This is a list of all the repeated sequences and the number of letters between them. The number of letters between them includes the number of letters of the second repeated sequence. Look at the bolded letters in the ciphertext "WHW" We call the distance between them 12 not 9.
HDA 540
ZOO 411
WFO 68
FOM 68
OMZ 68
MZG 68
SJH 272
CWH 312
WHS 484
SBG 112
SBG 48
SBG 72
BGB 232
WZH 452
OGR 424
GRD 424
RDN 424
DNV 424
DSX 64
SXO 64
EJS 444
BWH 62
BWH 78
WHW 62
WHW 66
WHW 12
WHW 100
LVH 8
LVH 4
LVH 32
LVH 60
LVH 108
LVH 200
NLC 252
RSG 56
MUK 74
BGH 122
LNY 268
RZO 82
WWW 1
SGV 36
WEV 184
AJH 265
KAK 308
TLV 108
IFU 52
HWF 112
HWF 96
WFH 208
ESZ 189
SZO 159
TZO 156
OOL 124
OOL 116
OES 105
OES 36
NTZ 175
ZAQ 12
AFR 64
WFOM 68
FOMZ 68
OMZG 68
SBGB 232
OGRD 424
GRDN 424
RDNV 424
DSXO 64
BWHW 62
BWHW 78
TLVH 108
WHWF 112
HWFH 208
WFOMZ 68
FOMZG 68
OGRDN 424
GRDNV 424
Looking at these statistics you will find that 79% of the repeated sequences are multiples of four. This means that the length of the key is probably 4. Why? Well, in the plaintext there are going to be a lot of the word 'the'. What makes a vigenere cipher so strong is that 'the' will not be encrypted the same way all the time. Sometimes 'the' will become 'kde' and other times it might become 'pxh'. But , in a large enough plaintext 'the' will get encrypted the same way. When this happens we know that the key letter used to encrypt the first 'T' of the word 'THE' is the same as the one used to encrypt the second 'T'. Because the key repeats itself (KEYKEYKEYKEYKEYKE) over the whole plaintext we know that the distance between the first and second 'T' must be a multiple of the key length. It is not a good idea to just find all the repeated sequences and then just find the GCF of the distances. There will be repeated sequences that just occurred out of pure randomness. This is what makes the Kasiski Examination difficult to program. If you really wanted to program it, you could find the GCF of all repeated sequences that are 5 letters or more. These are not likely to occur just by chance so you will probably get the correct length of the key as the GCF.
Now knowing that the key length is 4, you should be able to decrypt it easily.
One note: you might notice here that a large number of the distances are multiples of 2 (even), similarly if the key length had been 6 then you would notice a large number of the distances are multiples of 3. There is no way to know exactly what the key size is so you can just try both 3 and 6. You should first try 6 because most key sizes are 4, 5, or 6.
The Kasiski examination was a revolutionary breakthrough in cryptography. The Vigenere cipher had been described as "Le chiffre indechiffrable" (The undecipherable cipher) but it had finally been broken. There are however many problems with the Kassiski analysis that makes it less useful then the Friedman Test. Finding repeated sequences is not easy. It becomes much easier with a computer but human interaction still must exist because some of the repeated sequences are random. The Index of Coincidence is a much better method.