Solving Cryptic Crossword Clues by Computer
I am the only person to succeed in programming a computer to solve cryptic crossword puzzles. The technology I developed can not only solve a significant majority of commercially published clues but also explains how the clues work in plain English.
Cryptic crosswords became widespread early in the twentieth century and thus have been around for the entire history of computers. Many programs have been written to find words which match a pattern of checked and unchecked letters (e.g. "h??lo"). However, searching a word list is a trivial task for a computer. Finding the answer from the clue itself though is a completely different problem.
Cryptic crosswords are a type of crossword common in the United Kingdom, Australia, New Zealand, South Africa, India and Canada. They can also be found occasionally in the New York Times and other American publications.
Very simply a cryptic crossword describes its answer completely, often more than once, but to decode the clue you have to find a very devious way of reading it. What the clue appears to be saying (the "surface reading") is almost never anything to do with the answer and is there as a distraction. (Creating a completely convincing surface reading is also part of the "art" of setting such clues.)
For example, the clue:
It's indecent to let little Albert roam around inside (6)
reads straight as a disapproving statement about a child. To solve the clue, however, you need to find a cryptic way of reading the clue. In this case it is "It's indecent; to let the short form of "Albert" (have) ROAM (around) inside " i.e. "The answer is another word for "indecent" (or) "Al" with the letters ROAM moved around and placed inside". Another word for "indecent" is "amoral". Rearranging "ROAM" gives "MORA" and placing that inside "AL" also gives "amoral" which is the correct answer. Notice how the two halves of the clue reinforce each other so that when you find the solution you can be almost certain it is correct.
In the cryptic reading words can have their usual meaning (perhaps with a sense different from the one in the surface reading), they can denote the letters that spell the word or they can denote an operation which the solver needs to do to form the correct answer (anagramming, inserting letters, reversing the letter order etc.)
(See this introduction, written by myself, for more information on cryptic clues.)
Giving the above clue to my software would rapidly produce the following correct answer, interpretation and natural language explanation of the clue:
The answer is almost certainly amoral.
'it's indecent' is the definition.
('indecent'->'amoral' is in my crossword knowledge base)
'little albert roam around inside' is the subsidiary indication.
'little albert' becomes 'al' (short form of the name).
'around' indicates an anagram.
'inside' means one lot of letters goes inside another.
'roam' anagrammed gives 'mora'.
'mora' placed inside 'al' is 'amoral'.
Cryptic clues contrast with "straight" or "concise" clues in that for these clues it is the straight reading that counts. Examples of straight clues include "Boy's name", "type of dog", Capital city of Australia" where the answer involves only knowledge and not a devious mind.
In many ways solving cryptic crosswords is a classic Artificial Intelligence problem. In the UK alone there are millions of people each day who try to solve them. (Every daily newspaper here carries one.) They are also universally regarded as an activity which requires high levels of thought and intelligence. People attach a great deal of pride to their ability to solve crosswords - especially the more difficult ones such as from the Times and crossword competitions attract thousands of correct solutions.
When I set out to write the crossword solver I imagined I would only create a system that can solve a small minority of clues. In fact the latest version of the technology solves a significant majority of the cryptic clues found in a serious UK broadsheet (such as The Daily Telegraph).
The technology was a commercial project and I haven't yet found the time to publish technical details in the academic literature (though I'm willing to in principle if I ever find the time). However, it has been described in a review paper of Computer Language Games and their role in Artificial Intelligence. The reference is:
Michael L. Littman. Computer Language Games. 2001. in Computers and Games, pages 396--404. Lecture Notes in Computer Science. Springer-Verlag.
In the article, Michael Littman finishes his description with the summary:
"All in all, Crossword Maestro is one of the most sophisticated language-game-playing programs that has been written, successfully combining a detailed lexicon, puzzle-specific knowledge, search, and even natural language explanations into a single system."
(The Anagram Genius technology is also mentioned briefly in this review paper.)
The only other system I'm aware of for solving crosswords was written for non-cryptic clues by Michael L. Littman, Greg A. Keim, and Noam Shazeer. It was written after I published the first version of Crossword Maestro. The system is called Proverb and was designed to solve American-style non-cryptic crossword puzzles. The system did well in a crossword competition competing against humans and the authors won awards for the paper describing it.
Proverb includes a way of exploiting a large database of previous clues (American clues are frequently repeated unlike cryptic clues which are almost always new). It also includes a method of filling the grid from many candidate answers. This is needed for non-cryptic clues where there are often many possible answers for a single clue (unlike cryptics where there is usually a unique answer and a high degree of confidence the answer is correct when it is found). American grids also have every letter "checked" (appearing in both an across and a down clue) so exploiting this information is particularly important. ("British-style" grids common for cryptic clues only have around half the letters checked.)
However, solving a non-cryptic clue is in many ways a far simpler task than solving cryptic clues as the structure of a cryptic clue is far more complicated and far more analysis needs to be done to interpret it. Crossword Maestro has always been able to solve non-cryptic clues too and in fact doing this was almost a drop-out feature of Crossword Maestro: All cryptic clues contain a definition so non-cryptic clues are simply analysed with this already-existing routine. My system lacks much of the cultural knowledge (actresses, pop stars, TV shows etc.) needed to solve many American non-cryptic clues (unlike Proverb it has no database of previous American clues to consult) but it can still solve many clues when the answer relies on more general knowledge.