Thinking About AI: Part II - Early Experiences
We continue our multi-part series on AI, drawing lessons from early experiences with programs that humans perceived as human and with programs that play strategy games at championship level.
Welcome Back to Win-Win Democracy
Last time we saw some compelling chats with ChatGPT, some well-written but mostly false responses to prompts, and some dismal failures of logic and mathematical abilities. New Bing seemed to do a better job getting its facts right. I also pointed you to the work of several journalists who reported on long conversations with ChatGPT and Bing that wandered into bizarre, even disturbing, interactions in which these programs responded as if they had feelings and emotions.
In this issue, we’ll explore lessons from earlier work in AI and how those lessons might help us in thinking about the new technology that we’re seeing today.
But first: Chat is now available on the web
I mentioned in December that I opened up a few chat threads using Substack’s new chat facility for subscribers, but it was only available on phones and tablets. Several of you said that the phone/tablet requirement made it useless.
The chat facility is now also on the web at https://substack.com/chat. You'll need to be logged in to your Substack account (you already have one if you get my newsletter in your inbox).
I’ve created a new thread as a place where we can share our experiences with ChatGPT, new Bing, and other AI-based systems. You can access that thread directly here:
Just Around the Corner
Here’s a dialog between a young woman and a computer program (not ChatGPT or Bing) playing a psychotherapist (program’s responses in italics):
Men are all alike.
In what way?
They're always bugging us about something or other.
Can you think of a specific example?
Well, my boyfriend made me come here.
Your boyfriend made you come here
He says I'm depressed much of the time.
I am sorry to hear you are depressed
It's true. I am unhappy
Do you think coming here will help you not to be unhappy
I need some help, that much seems certain.
What would it mean to you if you got some help
Perhaps I could learn to get along with my mother.
Tell me more about your family
My mother takes care of me.
Who else in your family takes care of you
My father.
Your father
You are like my father in some ways.
What resemblance do you see
You are not very aggressive but I think you don't want me to notice that.
What makes you think I am not very aggressive
You don't argue with me.
Why do you think I don’t argue with you
You are afraid of me.
Does it please you to believe I am afraid of you
My father is afraid of everybody.
What else comes to your mind when you think of your father
Bullies.
Does that have anything to do with the fact that your boyfriend made you come here
Before reading on, think about this dialogue. If I hadn’t already told you that the psychotherapist was a computer program, would you find it plausible that it was a human psychotherapist?
This computer program, called ELIZA, was written by MIT Professor Joseph Weizenbaum between 1964 and 19661. ELIZA could be given a script for particular kinds of conversations. ELIZA, when running the script that yielded the above conversation, was called DOCTOR.
Shocking Responses
In his 1976 book, Computer Power and Human Reason: From Judgement to Calculation2, Weizenbaum relates (p. 5) three “shocks” (his word) that he experienced as DOCTOR became widely known:
“A number of practicing psychiatrists seriously believed the DOCTOR computer program could grow into a nearly completely automatic form of psychotherapy.”
“I was startled to see how quickly and how very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it. Once my secretary, who had watched me work on the program for many months and therefore surely knew it to be merely a computer program, started conversing with it. After only a few interchanges with it, she asked me to leave the room.”
“Another widespread, and to me surprising, reaction to the ELIZA program was the spread of a belief that it demonstrated a general solution to the problem of computer understanding of natural language. In my paper, I had tried to say that no general solution to that problem was possible, i.e., that language is understood only in contextual frameworks, that even these can be shared by people to only a limited extent, and that consequently even people are not embodiments of any such general solution. But these conclusions were often ignored. … This reaction to ELIZA showed me more vividly than anything I had seen hitherto the enormously exaggerated attributions an even well-educated audience is capable of making, even strives to make, to a technology it does not understand.”
How Did ELIZA and DOCTOR Work?
To understand why Weizenbaum was shocked at the response to DOCTOR, it’s necessary to understand a bit about how it worked. In Weizenbaum’s own words (from his 1966 paper):
Consider the sentence "I am very unhappy these days". Suppose a foreigner with only a limited knowledge of English but with a very good ear heard that sentence spoken but understood only the first two words "I am". Wishing to appear interested, perhaps even sympathetic, he may reply "How long have you been very unhappy these days?" What he must have done is to apply a kind of template to the original sentence, one part of which matched the two words "I am" and the remainder isolated the words "very unhappy these days". He must also have a reassembly kit specifically associated with that template, one that specifies that any sentence of the form "I am BLAH" can be transformed to "How long have you been BLAH", independently of the meaning of BLAH. A somewhat more complicated example is given by the sentence "It seems that you hate me". Here the foreigner understands only the words "you" and "me"; i.e., he applies a template that decomposes the sentence into the four parts
\(\begin{array}{|c|c|c|c|} \hline (1)&(2)&(3)&(4)\\ \hline \mbox{It seems that}&\mbox{you}&\mbox{hate}&\mbox{me} \\ \hline \end{array}\)of which only the second and fourth parts are understood. The reassembly rule might then be "What makes you think I hate you"; i.e., it might throw away the first component, translate the two known words ("you" to "I" and "me" to "you") and tack on a stock phrase (What makes you think) to the front of the reconstruction.
ELIZA scripts, like DOCTOR, defined a set of relevant keywords, with each keyword given a priority. When ELIZA “reads” a sentence, it looks for the highest priority keyword defined by its script and then looks for transformations — like the ones he described above — that apply to sentences with that keyword in a particular context. Add to that some default things to say if no keywords are found and you pretty much have it.
ELIZA ran on an IBM 7094 computer, a then-recently-introduced high-end computer for scientific computation. To give you a ballpark idea of the 7094’s computing power, consider that it could do 200,000 floating point arithmetic operations per second (flops). Today's NVIDIA H100, a modern processor designed for AI computations, can do more than 60 trillion flops. Microsoft’s cloud service offers configurations of thousands of H100 processors all working together for AI applications.
This extremely basic approach, running on an incredibly low-performance (for today) computer, was enough to convince people that ELIZA was capable of broad language understanding and could be the basis for many applications.
For example, MIT’s archives contain a manual from the MIT Education Research Center for writing ELIZA scripts, aimed at building an ELIZA-based system for “tutoring students and assisting them in calculations and problem solving”. A 1967 technical paper on experience using ELIZA for tutoring is available here. The effort to create an ELIZA-based tutorial system revealed that writing a new script was, in fact, quite difficult.
Lessons
Lesson #1: After experiencing apparently human-like capabilities in a narrow domain, people eagerly generalize to assume that much broader, more powerful capabilities are just around the corner. This is usually false.
ELIZA was intended to be able to “learn” new domains simply by writing new scripts but it didn’t turn out that way.
Lesson #2: Generalizing to new domains is difficult.
As we continue to sample AI’s history, we’ll see these lessons repeatedly. That’s not to say that AI technology never achieves enough to be useful — we all use some of it today — but that it is easy to be fooled into believing that the technology will advance quickly to human levels or even beyond.
Here's a well-known example of both lessons: In 2011, IBM’s Watson question-answering system beat two of the best contestants in the TV game show Jeopardy!. Following this demonstration of Watson’s question-answering prowess using a wide range of knowledge, IBM announced plans to use the technology in other domains, including medical diagnosis. It collaborated with major medical centers to develop Watson-based systems that would revolutionize medicine by helping doctors to make diagnoses using its exhaustive knowledge base of even rare diseases. After billions of dollars invested, this effort failed.
Another example of Lesson #1 is the efforts of several companies to produce self-driving automobiles. There are many compelling demonstrations that handle many driving situations, but going from 90% of the way to full self driving has thus far proven insurmountable.
For example, when, in 2019, I bought a Tesla with its $6,000 "full self-driving" feature, the promise was that it would soon be able to drive without human intervention and more safely than human drivers. The feature was in beta then and still is 3 ½ years and many software updates later. Achieving full self-driving is years behind the announced schedule and the success or failure is still to be determined.
Historical Notes
ELIZA was groundbreaking work in its day, probably the first AI technology that people perceived as human-like, although its real capabilities were extremely limited. Nevertheless, it remains a subject of fascination and there is a whole web site that is devoted to the “genealogy of ELIZA.”
Weizenbaum recognized how eager people were to believe that ELIZA and other AI technologies could easily evolve into general artificial intelligence. His book was his rebuttal.
In the 1960s and into the 1970s, there were extensive AI efforts at leading universities, with both Stanford and MIT hotbeds of exciting activity. The milieu was optimistic and full of promise. General artificial intelligence was just around the corner.
Hubert Dreyfus, a philosophy professor at UC Berkeley, was a leading critic of this optimistic view. His books were met with hostility and derision, but his major criticisms of AI have proven true. The Wikipedia article on Dreyfus’s views of AI is worth reading.
Playing Games
Playing strategy games has been a driver for artificial intelligence research since the early days of computing. Strategy games are a fruitful area of study because the best humans have extraordinary skills compared to typical players and the rules of the game are generally straightforward.
Checkers
Arthur Samuel created a checkers-playing program, which he described in detail in a 1959 paper in the IBM Journal of Research and Development. I won’t go into details of how it works except to point out that it was an early example of machine learning. The program could play what Samuel called “a fairly interesting game” ab initio but could learn to play better over time. The program could learn both by playing games (including against itself) and by being fed “book games” from checkers masters. Samuel allowed the computer 30 seconds to decide on its move, which was not much computation on the IBM 704 vacuum tube computer on which the program ran.
Samuel continued work on his program into the mid-1970s, by which time his program could challenge a respectable amateur player3. Samuel's work was among the first to demonstrate the possibility and power of machine learning, something we'll see appear over and over again.
The more modern computer program Chinook, developed at the University of Alberta in the 2000’s can beat the best human players. Interestingly, Chinook does not use machine learning. Instead, its creators explicitly wrote all of the checkers knowledge into the program. A popular account of the competition between Chinook and Marion Tinsley, considered the best human player for 40 years appeared in The Atlantic in 2017.
Chess
There is a long history dating back to the late 1950’s of computer programs playing chess. By the late 1980s, the best programs could defeat some grandmaster human players. In 1996, IBM’s Deep Blue lost to World Chess Champion Gary Kasparov. For a 1997 rematch with Kasparov, IBM updated Deep Blue’s special purpose hardware, doubling its speed, and Deep Blue defeated Kasparov. This was considered a major accomplishment in artificial intelligence.
Achieving it required not only extensive efforts to develop the algorithms for playing, including the work of expert human players, but also developing custom hardware to provide the brute force computation required to evaluate moves.
Go
Go is a strategy game that is popular especially in east Asia, with an estimated 20 million active players. Despite its simple rules, the games are extremely complex. It is played on a 19x19 grid, which means that there are many more alternatives to evaluate at each move than in chess and checkers, which are played on an 8x8 board.
Approaches that worked well for playing checkers and chess failed miserably because of the large number of possibilities that have to be considered at each move4. Beginning human players could defeat the best programs of the 1980s and 1990s, even in the early 2000s intermediate human players could defeat the best computer players.
In the late 2000s, a new approach for searching among the possible moves, called Monte Carlo tree search, was introduced, which led to Go programs being able to defeat advanced amateur human players, but still not the best human players.
In 2015, the newly-revealed program AlphaGo, created by Google’s DeepMind company, beat the European Go champion. DeepMind continued to improve, in 2017 beating the top-ranked Go player in the world.
AlphaGo was different than its predecessors in that it combined tree search methods for choosing the next move, which have been used for other games, with machine learning approaches based on what are anthropomorphically called neural networks. AlphaGo learned to play Go by playing many matches against amateur and professional human players.
A newer version, AlphaGo Zero, learned entirely by playing against itself. It is now considered the best Go player in the world. The same program has also been trained to play chess and has defeated world-champion computer chess players.
Other Games
A brief, easy-to-read, non-technical chronology of milestones (through 2006) of computers playing games is available here. It illustrates the prolonged efforts that led to computers being proficient players.
Lessons
Games have been a great vehicle for driving many advancements for several reasons:
Games make compelling demonstrations — who wouldn’t be excited by competition between the best human players and a machine, the John Henry legend played out in modern times.
Games have specific rules and clear, objective ways to measure outcomes.
Humans and computers would appear to play games very differently. Humans don’t search huge trees of alternative moves, at least not consciously, yet the best players quickly recognize patterns and situations through some mostly unknown process.
Our experience with humans suggests that a person who is a superb chess (or other strategy game) player is likely an extremely intelligent person generally.
That inference doesn’t apply to machines playing games:
Lesson #3: A machine’s ability to play a strategy game, even at world-champion levels, doesn’t imply that it can do anything else well.
What’s Next?
Humans have incredible capabilities that operate at a subconscious level. We effortlessly analyze what we see, express complex thoughts both orally and in writing, remember and organize knowledge, effortlessly move our bodies, and drive vehicles through complex, ever-changing environments. With effort we can learn new skills and thereafter perform them easily. We are creative, building on what we learn from others.
AI researchers have worked for seventy years trying to replicate these and other innate human capabilities. Despite early predictions of rapid progress, it is only in the last decade that we’ve made substantial progress on even the most basic aspects of such challenges.
Next time, we’ll discuss this progress and how it is impacting our lives now. I expect that this discussion will complete the second item on the roadmap I previously outlined.
This book is out of print, but is available digitally as an Amazon Kindle book.
This discussion is based on the Wikipedia article Computer Go.
Fascinating!
Thanks Lee, Weizenbaum had just written his book when I was an undergraduate at MIT, and at the time many faculty and students felt he was over-reacting. However, he had supporters as well, primarily from the humanities as well as the social and behavioral sciences who realized the dangers of over-attribution of intelligence to things. Also, agree with Sridhar that "responsible actors" using general purpose technologies for humanity-level good, not bad, purposes is key. To better understand one approach to taming today's large language models, this paper is instructive: Bai et al (2022) Constitutional AI: Harmlessness from AI Feedback URL: https://arxiv.org/abs/2212.08073 - search the paper for the word "Gandhi" for some especially interesting "prompt engineering" insights. In my short history of AI icons of progress, which includes Dartmouth conference, Deep Blue, Watson Jeopardy!, AlphaGo/Fold, etc. - I also include a few key papers, and just added "Constitutional AI" to that list.