The Annotated Turing – Abridged

I recently read the book The Annotated Turing by Charles Petzold. It was really enlightening. Though all computer science graduates study concepts like Turing machines, Turing Completeness etc. as part of their courses in colleges, questions such as how and why did Turing come up with these concepts is not taught usually (spoiler alert: Turing did not come up with either terminology  that I just mentioned). Charles does a good job of making Turing’s classical 1937 paper more accessible without dumbing it down too much. He also gives a lot of historical context as well as background knowledge that Turing assumes his readers to have.

The book cover

The book is aptly sub-titled A Guided Tour through Alan Turing’s Historic Paper on Computability and the Turing Machine.  In it, Turing’s paper is shown as is, but is supplemented with the author’s own annotations in between. In this post, I will try to present a very concise overview of the key takeaways from the book.

Das Entscheidungs problem

Turing’s actual 1937 paper was titled On Computable Numbers, With An Application To The Entscheidungsproblem. To understand the paper, we need to understand what that problem is. Entshcheidung means decision or decidability. In 1900, during his address at the Second International Congress of Mathematicians,  one of the prominent mathematicians at the time, David Hilbert, presented 23 questions as challenges for mathematicians to solve in the next century. The 10th problem in the list was this:

10. Determination of the Solvability of a Diophantine Equation

Diophantus was a Greek mathematician who is estimated to have lived around 250 CE and wrote a book called Arithmetica. Diophantine equation has come to mean an algebraic equation in which only whole number solutions are allowed.

Given such a problem, Hilbert was interested in a process with finite number of steps to determine whether the problem has a solution or not i.e. is the problem solvable? This problem came to be known as the Entscheidungsproblem or decision problem. Note, Hilbert is not interested in a general method to solve Diophantine equations here.

Turing proves that no such general decision process exists.

 Gödel’s Incompleteness Theorem

As Turing himself remarks in the paper, his results are superficially similar to the famous Incompleteness Theorem by Kurt Gödel from 1931. However, he clarifies the difference between both results.

Gödel essentially shows that any consistent formal system is necessarily incomplete. We can find propositions P such that neither P nor ~P (negation of P) is provable within that system.

Till that time, mathematicians were attempting to create flawless systems which were both complete and consistent. Gödel’s results proved that such attempts were in vain.

Even with these results, a general decision process as described by Hilbert could conceivably exist. Given an unprovable proposition, it would identify it as unprovable. It would similarly identify provable propositions also correctly. Such a decision process is what Turing shows as cannot exist.

Turing’s solution

Turing’s approach leverages Predicate Calculus extensively. I will not try to explain the details of Turing’s proof here (for that, you need to read the book), but I will try to summarize the key ideas that he builds up and uses in his proof.

Computing Machine (M): is a machine which

  1. is supplied with a tape running through it, divided into sections (squares) capable of bearing a symbol each.
  2. whose motion at each stage is completely defined by its configuration.
  3. and can perform one of the following operations at any moment:
    1. read one symbol
    2. print one of two kinds of symbols onto a square:
      1. figures (either 0 or 1)
      2. other symbols
    3. change the square being scanned (by shifting left or right)

Here configuration is what tells the machine what action to perform when it encounters a symbol. Henceforth, I’ll refer to Computing Machines as just machines.

Circular and Circle free machines: If a machine never writes more than a finite number of symbols of the first kind (0 or 1), it is called circular. Otherwise, it is circle-free. This notion becomes one of the most crucial aspects of Turing’s proof later.

Notice that the terminology here is a bit counter-intuitive. In Turing’s formulation, a machine which does not stops is considered good (circle-free) whereas one which stops after finite steps is bad (circular).

Computable Sequences: A sequence is said to be computable if it can be computed by a circle-free machine. Turing shows that his machines are capable of computing all sequences that we generally consider as computable (such as pi).

Description Number (D.N): Turing shows how all machines can be represented using a standard form such as:

q1 S0 S1 R q2; q2 S0 S0 R q3; q3 S0 S2 R q4; q4 S0 S0 R q1;

Here the first section (separated by a ; ) => q1 S0 S1 R q2;

can be interpreted as => From q1 state, if the machine reads symbol S0, print the symbol S1, shift right and transition to state q2.

He modifies (serializes) the form to a notation that can be more easily parsed by another machine. This notation is called the Standard Description (S.D) of the machine. S.D for the above machine for eg:


He further says each such S.D can be represented by a number (by replacing A with 1, C with 2 etc.). This is the Description Number (D.N) for that machine. Note that S.D and D.N can be used inter changeably as you can always convert from to the other.

Universal Machine (U): This is one of the most brilliant insights of Alan Turing. Once he shows that any machine can be represented as a number, he creates a machine which can take the Description Number of any other machine M as input and prints the same output as M. This can be considered as the first ‘general-purpose’ computer ever conceived.

In the next section, I will describe how Turing uses this universal machine to prove that the decision problem cannot be solved. However, it is worth appreciating that we remember Turing in the present more for his conception of Computing Machines (later known as Turing Machines), the Universal Machine and its limitations rather than for proving the Decision Problem. This is one of the cases in history where the journey contributed more than the destination (end goal).

The Proof

The high level approach that Turing takes to solve the decision problem can be outlined as follows:


Each of these steps might seem like a logical jump from the previous one, but they do follow from the previous ones consistently. However, there is a lot of tedious details to the representation of these statements in predicate calculus and hence in the proofs.

I will try to present an brief explanation of the first of these four steps. However, by definition, this will not be rigorous or complete.

No general process exists to determine whether a machine is circle free

Turing proves this by reductio ad absurdum or proof by contradiction. First, he assumes that there exists a machine D which given the S.D of any machine M, tests this S.D and if M is circular, outputs “u” (unsatisfactory) and if it is circle-free, outputs “s” (satisfactory) in a finite number of steps.

If M is circle-free, we can give the S.D of M to the Universal Machine (U) to compute the output of M. Turing combines D and U to create a machine H which works as follows:

H flowchart

Here R(N) keeps track of the number of valid circle-free machines encountered so far.
If you follow along the flow-chart, you’ll see that the Nth digit of H‘s output is the Nth digit in the output of a Nth valid circle-free machine.

For e.g., if you count up from 1:

313,325,317 -> M1 - Prints 0s to the right
3,133,225,317 -> M2 - Prints 1s to the right

The first valid D.N that you hit upon is 313,325,317. Let’s call this machine M1. Its output is an unending sequence (as it is circle-free) of 0s. So the first digit in the output of H will be 0 (the first digit of M1).

Similarly, the second valid D.N that you will find will be 3,133,225,317, for M2. Its output will be an unending sequence of 1s. Hence the second digit in the output of H will be 1 (the second digit of M2).

Here, Turing points out that the machine H has to be circle-free by our construction. It is composed on D and U. D is circle-free by our assumption. U only computes finite digits of outputs for circle-free machines. Hence H, the combination of both has to be circle-free.

Now the problem arises when H reaches its own D.N, let’s call it K. Now, how will D classify K? It cannot be circular (or unsatisfactory, u) as pointed out above. But if it is circle-free, U has to print the R(K)th digit of H.

Let’s see..
How many digits would have H printed by the time it reaches K?  R(K – 1).
Now, it order to compute its R(K)th (i.e. R(K – 1) + 1) digit, it needs to evaluate itself using U, compute R(K) digits and then print its R(K)th digit. When U evaluates H and reaches the R(K)th digit, it needs to evaluate H again. This leads to an infinite recursion!

By contradiction, Turing proves that there cannot be a general process by which we can determine whether a given S.D is that of a circle-free machine.

This is the crux of Turing’s proof. A variation of this problem is what has come to be known as the Halting Problem (defined by Martin Davis in 1958).

From here, Turing proves the 2nd step by reducing it to the first. Then he proves the 3rd step by reducing it to the 2nd and so on. His machines do not have a notion of algebra or even numbers to begin with. Like I said earlier, there is extensive use of first-order predicate calculus to formulate a number system and to actually get to the decision problem from here.
Maybe that can be a subject for a second post later.

Turing Completeness

As mentioned before, this terminology was coined later and was never used by Turing himself. In Computability Theory, a system is said to be Turing Complete if it can be used to simulate any Turing machine. It implies that the system is theoretically capable of expressing all tasks accomplishable by computers. Most general-purpose programming languages are Turing Complete.

Turing and Lambda Calculus

This book brought a lot of clarity to the relationship between Lambda Calculus and Turing. Before that, for some reason, I had always assumed Lambda Calculus to be more recent than Turing.

In reality, Alonzo Church, creator of Lambda Calculus beat Turing by a very close margin in solving the decision problem. Church’s paper A Note on the Enscheidungsproblem which also proved that the decision problem was unsolvable was received by The Journal of Symbolic Logic on April 15, 1936, six weeks before Turing’s submission to the London Mathematical Society on May 28, 1936!

Turing then had to publish a 3-page appendix to his paper (in August 1936) proving that his notion of computability and Church’s notion of effective calculability are both effectively equal. Alonzo Church had got his Ph.D in mathematics from Princeton and had taught there as a professor from 1929 to 1967. In the fall of 1936, Turing went to Princeton to get his Ph.D under Alonzo Church. Among his students, Church had other famous mathematicians such as Stephen Kleene and Martin Davis who expanded on both Turing’s and Church’s works later.
Lambda Calculus has had a direct influence on functional programming languages such as Lisp, Haskell, Scheme, Clojure etc.


The Art of Thinking Clearly

I recently completed reading The Art of Thinking Clearly (link) by Rolf Dobelli. I found the book an interesting, concise and useful read on the many biases of the human mind.

If you go through the reviews of the book on Goodreads (link) or anywhere else online, you are likely to end up with mixed reviews. The negative ones mostly criticizing the author of plagiarism. In fact, N.N. Taleb, the bestselling author of many books including The Black Swan (link) has gone ahead and written a detailed account of the instances where his ideas were plagiarized by Dobelli – link.

Interestingly, these happen to be the exact reasons why I ended up reading Dobelli’s book! Let me explain myself a bit here though that would mean slightly digressing from the subject of this post.

Reading Summary Books

On multiple accounts, I had considered buying books such as Taleb’s The Black Swan, Fooled By Randomness (link) or Nobel laureate Daniel Kahneman’s Thinking Fast and Slow (link) etc. to understand more about the human mind and it’s blind spots. Each time, the sheer size of these books have made me put off the task to a distant future.

It might very well be the case that these books (among others) were the first ones to discuss many of the ideas mentioned in Dobelli’s book and that they discuss these ideas with much more rigor. But for a casual reader like me who is looking for a high-level overview of the core essence of these books without taking the actual effort of reading these, books like Dobelli’s are the best options.

On a related note, I recently came across this extremely nice Youtube channel – link that takes this concept of reading summaries a step further by presenting 5-10 minute illustrative videos that summarise the essence of various famous self-help/philosophical books.

The Art of Thinking Clearly

Getting back to the book, there were two other negative points mentioned in the reviews that I was careful to watch out for while reading:

  1. In an effort to come up with ‘100‘ limitations of the human mind, Dobelli has added many somewhat obvious/insignificant ones also to the list. This can make the real insights hard to separate out for the casual reader esp. since they are given in no particular order.
  2.  Some of the anecdotes used are contextually inappropriate.

Keeping all these in mind, I’ve been able to get some good insights out of the book. A few of the ones that come to my mind include (not including bias definitions for brevity):

  • Confirmation bias (link) being the mother of all biases. This explains why you will never be able to convince someone in arguments where the topic has inherent uncertainties which are open to interpretations. Political discussions on the internet seem to be a good example.
  • Swimmer’s Body Illusion (link) – Also answers the question ‘Does Harvard make you smarter?’
  • Action Bias (link) – where we feel doing something is more productive than doing nothing, even though what we do might be counter-productive.
  • Effort Justification (link) – where we tend to value something acquired with more effort as more valuable rather than objectively valuing the utility of the item.
  • Illusion of Attention (link) – This one was an eye-opener. Particularly the observation that drivers talking on the phone are as susceptible to accidents as a drunken driver, even if you are on hands-free.
  • Survivorship Bias (link) – probably explains why people fail to understand the risks involved in starting startups and overestimate the chances of success.
  • Forer Bias (link) –  refers to the tendency of people to rate sets of statements as highly accurate for them personally even though the statements could apply to many people.

The entire list of biases can be found here – link.