How xor encryption works.

So what do I mean by an xor encoder and how does it all work?

binary logic functions

xor stands for eXclusive OR, a certain type of logic function. At the heart of any binary (or digital) system everything ends up coming down to a collection of logic functions. These functions are AND, OR, NAND, NOR, XOR and NOT. Each of them has two inputs and one output except for NOT which only has one input.
Since this is binary logic each input is either on or off. Most of the functions are fairly self explanatory e.g. the output of AND is on when both inputs are on. The output of OR is on when either one or the other or both inputs are on.
These functions are the building blocks of any sort of digital system, the processor in the computer that you're using could be described purely using those functions. It would be a rather long description but it could be done.
Exactly what all the different functions do is shown in these tables.

InputNot
On Off

Off On

 

Input 1Input 2AndOrNandNorXor

On

On

On

On

Off

Off

Off

On

Off

Off

On

On

Off

On

Off

On

Off

On

On

Off

On

Off

Off

Off

Off

On

On

Off

So what has this got to do with encryption?

Look again at the table for the xor function. It has a very useful property, if I take an unknown input and xor it with a known value twice the output is the same as my original unknown input.
No matter what value I start with if I XOR it twice with On or twice with Off I'll end up back where I started.
You still with me?
[unknown] xor On = [the opposite of unknown], [the opposite of unknown] xor On = [unknown]
[unknown] xor Off = [unknown], [unknown] xor Off = [unknown]

Let's say I need to secretly tell someone whether something is on or off. First of all I agree with him in advance what the first input to our xor functions is. I then take my secret information and put it into the second input to my xor function. I can now openly tell the recipient the result of my xor. The recipient sets the second input of their xor to the value I tell them and the output will give them the secret information. Since no one else knows the pre-arranged value of the first input the message I passed on is meaningless, that have a 50/50 chance of guessing the correct value.
OK so a 50/50 chance of guessing the correct information isn't exactly very secure but that is for one bit of information. If I used the same pre-arranged value all the time then it is normally going to be easy to guess what it was and so guess the information I want to keep secure.
The easy way around this is to have more than one pre-arranged value. A computer will use a sequence of 8 ons and offs (8 bits of information) to describe each letter in a message. If I use a different pre-arranged value for each one of those bits then someone listening will only have a 1 in 256 chance of guessing the correct value. This pre-arranged sequence is called the encryption key and it can be any length we like, if the message is longer than the key then once we reach the end of the key we loop around and start again at the beginning.
Each bit I add to the length of the key will double the number of possible keys so double the amount of time it would take someone to try all the possible values. A 1 bit key only has 2 possible values, 2 bits give 4 possibilities, 3 bits give 8. Once the key reaches 128 bits (a tiny length for modern electronics) there are just over 340 billion billion billion billion possible values. To put it a different way, if I could try a trillion keys a second (there may be one or two computers in the world that could just about do that) then my chances of finding the correct key within a billion years are still less than winning the lottery jackpot next week with a single entry.

Real world encryption

The numbers above are real and a 128 bit encryption key is considered very secure for a short message. Normally it's going to work out far easier to steal the key or obtain the information from either the sender or the receiver than it is to break the encryption.
There is however one serious flaw in the system described. The message is often far longer than the key used, this means that the key is re-used lots of times within the same message. Plus unless I have a larger number of pre-arranged keys handy I'm going to have to re-use the same key for several messages. As with any type of encryption repeating yourself is a bad idea, it can create patterns and patterns give clues to the message.
e.g. A 128 bit key is going to repeat every 16 letters. On any reasonable length piece of text the chances are that some common letter groups (e.g. t-h-e) will happen to appear at some multiple of 16 characters apart. This means that my encrypted message will also have this same repetition, by guessing that this repetition represents a common letter grouping it becomes possible to make a guess at some of the key. This is by no means an instant solution but it is a foot in the door, once you have that the rest is easy.
Armies of mathematicians have spent years looking for more and more complex flaws like this and finding ways of using them to crack open messages. An equally large number have been working on ways to remove these patterns or at least make the patterns meaningless. The simple solution to the problem mentioned above would be to compress the text (e.g. put it in a zip file) before encrypting it, that would remove these obvious patterns and so make the message harder to crack. The counter to that counter is that a zip file has a standard start and end format, if you assume that the file that has been encrypted is a .zip then you can guess some of the key by working out what it would have to be to give you that standard format. The counter to the counter to the counter is to either remove the standard header or deliberately corrupt it before encrypting the file or simply not encrypt the header section of the file. The counter to the counter to the counter to the counter is... You get the idea.
The ultimate fallback is a system called a one time pad. This uses a key that is as long as the message to be sent and as the name implies it is only used once and then thrown away. This is mathematically un-crackable for messages of any length but has one drawback: It requires the transport from the sender to the recipient of a key that is as long as the message and just as sensitive. In many ways that sort of defeats the whole purpose. For this reason one time pads are only of use when speed is critical or secure transport of data is only possible in one direction.

Public key encryption

Public key encryption is a cunning trick to work around the whole problem of passing keys around. It relies on the fact that multiplying is easy but factoring is hard. i.e. 4*7 = 28 is easy, working out what 28 can be divided by is a bit harder. When the two numbers are 100 digit long prime numbers it's still easy to multiply them but figuring out what the two numbers were from the result is almost impossible.
This is the basis of public key encryption, I can publish the result of the multiplication and anyone can use it to encrypt a message. But only by using the two original numbers (which no one else knows) is it possible to decrypt the message.
Public key encryption has become the backbone of internet encryption but it has one big drawback. It is always crackable. You can make it hard to break by using very long keys but it will always be possible given enough time. Conventional encryption is far less convenient but if done right (e.g. a one time pad) it is virtually impossible to crack.
To give you a rough idea, a 3000 bit public key is about as hard to crack as a 128 bit conventional key. Remember adding one bit to the key doubles the complexity to crack. Most systems on the internet use 128 bit public keys, banks internal systems use 512 bit public keys. The current record for cracking a 512 bit public key encrypted message is under 24 hours.

So getting back to how this encoder works...

So what we need to do is come up with a key value that we can easily exchange in advance and somehow XOR this with the message to be sent.
First of all each letter in the message is looked up and converted into a 6 bit binary number (6 on's and off's). We then take the number that was entered as the key and XOR it's right hand 6 bits with the first letter in the message.
The binary result of this XOR is looked up in a table that has the reverse of the table originally used to convert the letter into binary, the result of this gives us the first encrypted letter of our message.
The right hand 6 bits of the key are thrown away and we perform a couple of mathematical operations on the remaining key to create a new key value, the purpose of these operations are to increase the length of the key so that we never run out of bits.
We take this new key and start everything again on the second letter of the message.
This process is repeated until the entire message is encoded.
The strength of the encryption is decided by how "random" the sequence of keys is, remember it can't be truly random since the recipient needs to be able to recreate the sequence. The hacked together method used here may be reasonable at first glance but wouldn't stand up to anything beyond the most casual attempts to crack it.

back to the index