So what do I mean by an xor encoder and how does it all work?

xor stands for eXclusive OR, a certain type of logic
function. At the heart of any binary (or digital) system everything ends up
coming down to a collection of logic functions. These functions are AND, OR,
NAND, NOR, XOR and NOT. Each of them has two inputs and one output except for
NOT which only has one input.

Since this is binary logic each input is either on or off. Most of the
functions are fairly self explanatory e.g. the output of AND is on when both
inputs are on. The output of OR is on when either one or the other or both
inputs are on.

These functions are the building blocks of any sort of digital system, the
processor in the computer that you're using could be described purely using
those functions. It would be a rather long description but it could be done.

Exactly what all the different functions do is shown in these tables.

Input | Not |

On | Off |

Off | On |

Input 1 | Input 2 | And | Or | Nand | Nor | Xor |

On |
On |
On |
On |
Off |
Off |
Off |

On |
Off |
Off |
On |
On |
Off |
On |

Off |
On |
Off |
On |
On |
Off |
On |

Off |
Off |
Off |
Off |
On |
On |
Off |

Look again at the table for the xor
function. It has a very useful property, if I take an unknown input and xor it with a known value twice the output is the same as
my original unknown input.

No matter what value I start with if I XOR it twice with On
or twice with Off I'll end up back where I started.

You still with me?

[unknown] xor On = [the opposite of unknown], [the
opposite of unknown] xor On = [unknown]

[unknown] xor Off = [unknown], [unknown] xor Off = [unknown]

Let's say I need to secretly tell someone whether something is on or off. First
of all I agree with him in advance what the first input to our xor functions is. I then take my secret information and put
it into the second input to my xor function. I can
now openly tell the recipient the result of my xor.
The recipient sets the second input of their xor to
the value I tell them and the output will give them the secret information.
Since no one else knows the pre-arranged value of the first input the message I
passed on is meaningless, that have a 50/50 chance of guessing the correct
value.

OK so a 50/50 chance of guessing the correct information isn't exactly very
secure but that is for one bit of information. If I used the same pre-arranged
value all the time then it is normally going to be easy to guess what it was
and so guess the information I want to keep secure.

The easy way around this is to have more than one pre-arranged value. A
computer will use a sequence of 8 ons and offs (8
bits of information) to describe each letter in a message. If I use a different
pre-arranged value for each one of those bits then someone listening will only
have a 1 in 256 chance of guessing the correct value. This pre-arranged
sequence is called the encryption key and it can be any length we like, if the
message is longer than the key then once we reach the end of the key we loop
around and start again at the beginning.

Each bit I add to the length of the key will double the number of possible keys
so double the amount of time it would take someone to try all the possible
values. A 1 bit key only has 2 possible values, 2 bits give 4 possibilities, 3 bits give 8. Once the key reaches 128 bits (a tiny length
for modern electronics) there are just over 340 billion billion
billion billion possible
values. To put it a different way, if I could try a trillion keys a second
(there may be one or two computers in the world that could just about do that)
then my chances of finding the correct key within a billion years are still
less than winning the lottery jackpot next week with a
single entry.

The numbers above are real and a 128 bit encryption key is
considered very secure for a short message. Normally it's going to work out far
easier to steal the key or obtain the information from either the sender or the
receiver than it is to break the encryption.

There is however one serious flaw in the system described. The message is often
far longer than the key used, this means that the key is re-used lots of times
within the same message. Plus unless I have a larger number of pre-arranged
keys handy I'm going to have to re-use the same key for several messages. As
with any type of encryption repeating yourself is a bad idea, it can create
patterns and patterns give clues to the message.

e.g. A 128 bit key is going to repeat every 16
letters. On any reasonable length piece of text the chances are that some
common letter groups (e.g. t-h-e) will happen to appear at some multiple of 16
characters apart. This means that my encrypted message will also have this same
repetition, by guessing that this repetition represents a common letter grouping
it becomes possible to make a guess at some of the key. This is by no means an
instant solution but it is a foot in the door, once you have that the rest is
easy.

Armies of mathematicians have spent years looking for more and more complex
flaws like this and finding ways of using them to crack open messages. An
equally large number have been working on ways to remove these patterns or at
least make the patterns meaningless. The simple solution to the problem
mentioned above would be to compress the text (e.g. put it in a zip file)
before encrypting it, that would remove these obvious patterns and so make the
message harder to crack. The counter to that counter is that a zip file has a
standard start and end format, if you assume that the file that has been
encrypted is a .zip then you can guess some of the key by working out what it
would have to be to give you that standard format. The counter to the counter
to the counter is to either remove the standard header or deliberately corrupt
it before encrypting the file or simply not encrypt the header section of the
file. The counter to the counter to the counter to the counter is... You get
the idea.

The ultimate fallback is a system called a one time pad. This uses a key that
is as long as the message to be sent and as the name implies it is only used
once and then thrown away. This is mathematically un-crackable
for messages of any length but has one drawback: It requires the transport from
the sender to the recipient of a key that is as long as the message and just as
sensitive. In many ways that sort of defeats the whole purpose. For this reason
one time pads are only of use when speed is critical or secure transport of
data is only possible in one direction.

Public key encryption is a cunning trick to work around the
whole problem of passing keys around. It relies on the fact that multiplying is
easy but factoring is hard. i.e. 4*7 = 28 is easy,
working out what 28 can be divided by is a bit harder. When the two numbers are
100 digit long prime numbers it's still easy to multiply them but figuring out
what the two numbers were from the result is almost impossible.

This is the basis of public key encryption, I can publish the result of the
multiplication and anyone can use it to encrypt a message. But only by using
the two original numbers (which no one else knows) is it possible to decrypt
the message.

Public key encryption has become the backbone of internet encryption but it has
one big drawback. It is always crackable. You can
make it hard to break by using very long keys but it will always be possible
given enough time. Conventional encryption is far less convenient but if done
right (e.g. a one time pad) it is virtually impossible to crack.

To give you a rough idea, a 3000 bit public key is about as hard to crack as a
128 bit conventional key. Remember adding one bit to the key doubles the
complexity to crack. Most systems on the internet use 128 bit public keys,
banks internal systems use 512 bit public keys. The current record for cracking
a 512 bit public key encrypted message is under 24 hours.

So what we need to do is come up with a key value that we
can easily exchange in advance and somehow XOR this with the message to be
sent.

First of all each letter in the message is looked up and converted into a 6 bit
binary number (6 on's and off's). We then take the
number that was entered as the key and XOR it's right
hand 6 bits with the first letter in the message.

The binary result of this XOR is looked up in a table that has the reverse of
the table originally used to convert the letter into binary,
the result of this gives us the first encrypted letter of our message.

The right hand 6 bits of the key are thrown away and we perform a couple of mathematical
operations on the remaining key to create a new key value, the purpose of these
operations are to increase the length of the key so that we never run out of
bits.

We take this new key and start everything again on the second letter of the
message.

This process is repeated until the entire message is encoded.

The strength of the encryption is decided by how "random" the sequence
of keys is, remember it can't be truly random since
the recipient needs to be able to recreate the sequence. The hacked together
method used here may be reasonable at first glance but wouldn't stand up to
anything beyond the most casual attempts to crack it.

back to the index