Dendral was a project in artificial intelligence (AI) of the 1960s, and the computer software expert system that it produced. Its primary aim was to study hypothesis formation and discovery in science. For that, a specific task in science was chosen: help organic chemists in identifying unknown organic molecules, by analyzing their mass spectra and using knowledge of chemistry. It was done at Stanford University by Edward Feigenbaum, Bruce G. Buchanan, Joshua Lederberg, and Carl Djerassi, along with a team of highly creative research associates and students. It began in 1965 and spans approximately half the history of AI research.
The software program Dendral is considered the first expert system because it automated the decision-making process and problem-solving behavior of organic chemists. The project consisted of research on two main programs Heuristic Dendral and Meta-Dendral, and several sub-programs. It was written in the Lisp programming language, which was considered the language of AI because of its flexibility.
Many systems were derived from Dendral, including MYCIN, MOLGEN, PROSPECTOR, XCON, and STEAMER. There are many other programs today for solving the mass spectrometry inverse problem, see List of mass spectrometry software, but they are no longer described as 'artificial intelligence', just as structure searchers.
The name Dendral is an acronym of the term "Dendritic Algorithm".
Heuristic Dendral
Heuristic Dendral is a program that uses mass spectra or other experimental data together with a knowledge base of chemistry to produce a set of possible chemical structures that may be responsible for producing the data. A mass spectrum of a compound is produced by a mass spectrometer, and is used to determine its molecular weight, the sum of the masses of its atomic constituents. For example, the compound water (H2O), has a molecular weight of 18 since hydrogen has a mass of 1.01 and oxygen 16.00, and its mass spectrum has a peak at 18 units. Heuristic Dendral would use this input mass and the knowledge of atomic mass numbers and valence rules, to determine the possible combinations of atomic constituents whose mass would add up to 18. As the weight increases and the molecules become more complex, the number of possible compounds increases drastically. Thus, a program that is able to reduce this number of candidate solutions through the process of hypothesis formation is essential.
New graph-theoretic algorithms were invented by Lederberg, Harold Brown, and others that generate all graphs with a specified set of nodes and connection-types (chemical atoms and bonds) -- with or without cycles. Moreover, the team was able to prove mathematically that the generator is complete, in that it produces all graphs with the specified nodes and edges, and that it is non-redundant, in that the output contains no equivalent graphs (e.g., mirror images). The CONGEN program, as it became known, was developed largely by computational chemists Ray Carhart, Jim Nourse, and Dennis Smith. It was useful to chemists as a stand-alone program to generate chemical graphs showing a complete list of structures that satisfy the constraints specified by a user.
Meta-Dendral
Meta-Dendral is a machine learning system that receives the set of possible chemical structures and corresponding mass spectra as input, and proposes a set of rules of mass spectrometry that correlate structural features with processes that produce the mass spectrum. These rules would be fed back to Heuristic Dendral (in the planning and testing programs described below) to test their applicability. Thus, "Heuristic Dendral is a performance system and Meta-Dendral is a learning system". The program is based on two important features: the plan-generate-test paradigm and knowledge engineering.
Plan-generate-test paradigm
The plan-generate-test paradigm is the basic organization of the problem-solving method, and is a common paradigm used by both Heuristic Dendral and Meta-Dendral systems. The generator (later named CONGEN) generates potential solutions for a particular problem, which are then expressed as chemical graphs in Dendral. However, this is feasible only when the number of candidate solutions is minimal. When there are large numbers of possible solutions, Dendral has to find a way to put constraints that rules out large sets of candidate solutions. This is the primary aim of Dendral planner, which is a “hypothesis-formation” program that employs “task-specific knowledge to find constraints for the generator”. Last but not least, the tester analyzes each proposed candidate solution and discards those that fail to fulfill certain criteria. This mechanism of plan-generate-test paradigm is what holds Dendral together.
Knowledge Engineering
The primary aim of knowledge engineering is to attain a productive interaction between the available knowledge base and problem solving techniques. This is possible through development of a procedure in which large amounts of task-specific information is encoded into heuristic programs. Thus, the first essential component of knowledge engineering is a large “knowledge base.” Dendral has specific knowledge about the mass spectrometry technique, a large amount of information that forms the basis of chemistry and graph theory, and information that might be helpful in finding the solution of a particular chemical structure elucidation problem. This “knowledge base” is used both to search for possible chemical structures that match the input data, and to learn new “general rules” that help prune searches. The benefit Dendral provides the end user, even a non-expert, is a minimized set of possible solutions to check manually.
Heuristics
A heuristic is a rule of thumb, an algorithm that does not guarantee a solution, but reduces the number of possible solutions by discarding unlikely and irrelevant solutions. The use of heuristics to solve problems is called "heuristics programming", and was used in Dendral to allow it to replicate in machines the process through which human experts induce the solution to problems via rules of thumb and specific information.
Heuristics programming was a major approach and a giant step forward in artificial intelligence, as it allowed scientists to finally automate certain traits of human intelligence. It became prominent among scientists in the late 1940s through George Polya’s book, How to Solve It: A New Aspect of Mathematical Method. As Herbert A. Simon said in The Sciences of the Artificial, "if you take a heuristic conclusion as certain, you may be fooled and disappointed; but if you neglect heuristic conclusions altogether you will make no progress at all."
History
During the mid 20th century, the question "can machines think?" became intriguing and popular among scientists, primarily to add humanistic characteristics to machine behavior. John McCarthy, who was one of the prime researchers of this field, termed this concept of machine intelligence as "artificial intelligence" (AI) during the Dartmouth summer in 1956. AI is usually defined as the capacity of a machine to perform operations that are analogous to human cognitive capabilities. Much research to create AI was done during the 20th century.
Also around the mid 20th century, science, especially biology, faced a fast-increasing need to develop a "man-computer symbiosis", to aid scientists in solving problems. For example, the structural analysis of myogoblin, hemoglobin, and other proteins relentlessly needed instrumentation development due to its complexity.
In the early 1960s, Joshua Lederberg started working with computers and quickly became tremendously interested in creating interactive computers to help him in his exobiology research. Specifically, he was interested in designing computing systems to help him study alien organic compounds. Lederberg had been heading a team designing instruments for the Mars Viking lander to search for precursor molecules of life in samples of the Mars surface, using a mass spectrometer coupled with a minicomputer. As he was not an expert in either chemistry or computer programming, he collaborated with Stanford chemist Carl Djerassi to help him with chemistry, and Edward Feigenbaum with programming, to automate the process of determining chemical structures from raw mass spectrometry data. Feigenbaum was an expert in programming languages and heuristics, and helped Lederberg design a system that replicated the way Djerassi solved structure elucidation problems. They devised a system called Dendritic Algorithm (Dendral) that was able to generate possible chemical structures corresponding to the mass spectrometry data as an output.
Dendral then was still very inaccurate in assessing spectra of ketones, alcohols, and isomers of chemical compounds. Thus, Djerassi "taught" general rules to Dendral that could help eliminate most of the "chemically implausible" structures, and produce a set of structures that could now be analyzed by a "non-expert" user to determine the right structure. The new rules include more knowledge of mass spectrometry and general chemistry. He also expanded the system so it can incorporate to NMR spectroscopy data in addition to mass spectrometry data.
The Dendral team recruited Bruce Buchanan to extend the Lisp program initially written by Georgia Sutherland. Buchanan had similar ideas to Feigenbaum and Lederberg, but his special interests were scientific discovery and hypothesis formation. As Joseph November said in Digitizing Life: The Introduction of Computers to Biology and Medicine, "(Buchanan) wanted the system (Dendral) to make discoveries on its own, not just help humans make them". Buchanan, Lederberg and Feigenbaum designed "Meta-Dendral", which was a "hypothesis maker". Heuristic Dendral "would serve as a template for similar knowledge-based systems in other areas" rather than just concentrating in the field of organic chemistry. Meta-Dendral was a model for knowledge-rich learning systems that was later codified in Tom Mitchell's influential Version Space Model of learning.
By 1970, Dendral was performing structural interpretation at post-doc level. Djerassi and his group would take over the program for their own research for a decade.
Notes
- ^ November, 2006
- Oral history interview with Bruce G. Buchanan, Charles Babbage Institute, University of Minnesota.
- Lederberg, 1987
- ^ Lindsay et al., 1980
- Berk, 1985
- Lederberg, 1963
- ^ McCorduck, Pamela (2022-01-01). "The Scientific Life of Edward A. Feigenbaum". IEEE Annals of the History of Computing. 44 (1): 123–128. doi:10.1109/MAHC.2022.3145216. ISSN 1058-6180.
References
- Berk, A A. LISP: the Language of Artificial Intelligence. New York: Van Nostrand Reinhold Company, 1985. 1-25.
- Lederberg, Joshua. An Instrumentation Crisis in Biology. Stanford University Medical School. Palo Alto, 1963.
- Lederberg, Joshua. How Dendral Was Conceived and Born. ACM Symposium on the History of Medical Informatics, 5 November 1987, Rockefeller University. New York: National Library of Medicine, 1987.
- Lindsay, Robert K., Bruce G. Buchanan, Edward A. Feigenbaum, and Joshua Lederberg. Applications of Artificial Intelligence for Organic Chemistry: The Dendral Project. McGraw-Hill Book Company, 1980.
- Lindsay, Robert K., Bruce G. Buchanan, E. A. Feigenbaum, and Joshua Lederberg. DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation. Artificial Intelligence 61, 2 (1993): 209-261.
- November, Joseph A. “Digitizing Life: The Introduction of Computers to Biology and Medicine.” Doctoral dissertation, Princeton University, 2006