Artificial
General Intelligence
A
Gentle Introduction
[The video of a lecture based on this note is on-line.]
Artificial
Intelligence (AI) started with "thinking machine" of human-comparable
intelligence as the ultimate goal, as documented by the following literature:
In the past,
there were some ambitious projects aiming at this goal, though they all failed.
The best-known examples include the following ones:
Partly due to
the realized difficulty of the problem, in the 1970s-1980s mainstream AI moved
away from general-purpose intelligent systems, and turned to domain-specific
problems and special-purpose solutions, though there are opposite attitudes
toward this change:
Consequently,
the field currently called "AI" consists of many loosely related
subfields without a common foundation or framework, and suffers from an
identity crisis:
Since 2004,
calls for research on general-purpose systems returned, both inside and outside
mainstream AI.
Anniversaries
are good time to review the big picture of the field. In the following collections
and events, many well-established AI researchers raised the topic of
general-purpose and human-level intelligence:
More or less
coincidentally, from outside mainstream AI, there are several books with bold
titles and novel technical plans to produce intelligence as a whole in
computers:
There are also
several less technical but more influential books, with the same optimism on
the possibility of building general-purpose AI:
So after several
decades, "general-purpose system", "integrated AI", and
"human-level AI" become less forbidden (though far from popular) topics,
as reflected by several recent meetings (an incomplete list):
2.
AGI Overview
What is Artificial General Intelligence (AGI)
AGI research
treats "intelligence" as a whole. Therefore, "AI" and
"AGI" were originally the same, but currently different. Similar
notions include "strong
AI", "human-level AI",
"real AI", "thinking machine", and many others.
AGI differs from
mainstream AI in the following points:
·
Stressing on the general-purpose nature of intelligence,
·
Taking a holistic or integrative viewpoint on intelligence,
·
Believing the possibility of an AI that is comparable to human
intelligence in the near future.
AGI
research has a science (theory) aspect and an engineering (technique) aspect. A
complete AGI work normally includes
The book chapter
"Aspects
of Artificial General Intelligence" clarified the notion of AGI, and
responded to the following common doubts and objections of this research:
The most general
theoretical questions every AI (AGI) researcher needs to answer include:
My own answers
to these questions are summarized here.
Most AI (AGI)
researchers answer "Yes" to the 2nd and 4th questions, though some
outside people say "No" to one of them. In the following we will
compare the different answers to the 1st and 3rd questions, which are about the
research goal and technical strategy of AI (AGI), respectively.
What is the
concrete goal of AI research? Of course, it is "to make computers that are
similar to the human mind" — but in which level of description,
generalization, or abstraction should this similarity be obtained? As analyzed
in What Do You Mean by "AI"?, there are five
types of typical answer:
They are all
valid scientific research goals, but lead to quite different results!
Answers to the 3rd question, in AGI context
Though the goal
is to produce intelligence as a whole, each AGI project still needs to divide
the problem into subproblems to be solved one by one.
In doing so, existing AGI projects follow technical
paths that can be roughly divided into three types:
Common
techniques in AGI projects include, though not limited to:
Though each of
these techniques is also explored in mainstream AI, to use it in a
general-purpose system leads to very different design decisions in technical details.
3.
Representative AGI Projects
The following
projects are selected to represent existing AGI research, because each of them
(1) is clearly oriented to AGI, (2) is still very active, and (3) has ample
publications of technical details.
Each
project is linked to the project website and two selected publications, where
the following quotations are extracted. The focus of the quotations is on the
research goal (the 1st question) and technical path (the 3rd question).
Soar
[Unified Theories of Cognition, A Gentle Introduction to Soar]
The ultimate
in intelligence would be complete rationality which would imply the ability to
use all available knowledge for every task that the system encounters.
Unfortunately, the complexity of retrieving relevant knowledge puts this goal
out of reach as the body of knowledge increases, the tasks are made more
diverse, and the requirements in system response time more stringent. The best
that can be obtained currently is an approximation of complete rationality. The
design of Soar can be seen as an investigation of one such approximation.
For
many years, a secondary principle has been that the number of distinct
architectural mechanisms should be minimized. Through Soar 8, there has been a
single framework for all tasks and subtasks (problem spaces), a single
representation of permanent knowledge (productions), a single representation of
temporary knowledge (objects with attributes and values), a single mechanism
for generating goals (automatic subgoaling), and a
single learning mechanism (chunking). We have revisited this assumption as we
attempt to ensure that all available knowledge can be captured at runtime
without disrupting task performance. This is leading to multiple learning
mechanisms (chunking, reinforcement learning, episodic learning, and semantic
learning), and multiple representations of long-term knowledge (productions for
procedural knowledge, semantic memory, and episodic memory).
Two
additional principles that guide the design of Soar are functionality and
performance. Functionality involves ensuring that Soar has all of the primitive
capabilities necessary to realize the complete suite of cognitive capabilities
used by humans, including, but not limited to reactive decision making,
situational awareness, deliberate reasoning and comprehension, planning, and
all forms of learning. Performance involves ensuring that there are
computationally efficient algorithms for performing the primitive operations in
Soar, from retrieving knowledge from long-term memories, to making decisions,
to acquiring and storing new knowledge.
ACT-R [The
Atomic Components of Thought, An Integrated Theory of the Mind]
ACT-R is a
cognitive architecture: a theory for simulating and understanding human
cognition. Researchers working on ACT-R strive to understand how people
organize knowledge and produce intelligent behavior. As the research continues,
ACT-R evolves ever closer into a system which can perform the full range of
human cognitive tasks: capturing in great detail the way we perceive, think
about, and act on the world.
On
the exterior, ACT-R looks like a programming language; however, its constructs
reflect assumptions about human cognition. These assumptions are based on
numerous facts derived from psychology experiments. Like a programming
language, ACT-R is a framework: for different tasks (e.g., Tower of Hanoi,
memory for text or for list of words, language comprehension, communication,
aircraft controlling), researchers create models (aka programs) that are
written in ACT-R and that, beside incorporating the ACT-R's view of cognition,
add their own assumptions about the particular task. These assumptions can be
tested by comparing the results of the model with the results of people doing
the same tasks.
ACT-R
is a hybrid cognitive architecture. Its symbolic
structure is a production system; the subsymbolic
structure is represented by a set of massively parallel processes that can be
summarized by a number of mathematical equations. The subsymbolic
equations control many of the symbolic processes. For instance, if several
productions match the state of the buffers, a subsymbolic
utility equation estimates the relative cost and benefit associated with each
production and decides to select for execution the production with the highest
utility. Similarly, whether (or how fast) a fact can be retrieved from
declarative memory depends on subsymbolic retrieval
equations, which take into account the context and the history of usage of that
fact. Subsymbolic mechanisms are also responsible for
most learning processes in ACT-R.
Polyscheme is a cognitive architecture designed to model and achieve
human-level intelligence by integrating multiple methods of representation,
reasoning and problem solving.
A
system will be said to have human-level intelligence if it can solve the same
kinds of problems and make the same kinds of inferences that humans can, even
though it might not use mechanisms similar to those humans in the human brain.
The modifier "human-level" is intended to differentiate such systems
from artificial intelligence systems that excel in some relatively narrow
realm, but do not exhibit the wide-ranging cognitive abilities that humans do.
A
key insight ... is that AI algorithms from different subfields based on
different computational formalisms can all be conceived of as strategies
guiding attention through propositions in the multiverse
[the set of all possible worlds].
LIDA [The Lida Architecture,
A Cognitive Theory of Everything]
Implementing
and fleshing out a number of psychological and neuroscience theories of
cognition, the LIDA conceptual model aims at being a cognitive "theory of
everything." With modules or processes for perception, working memory,
episodic memories, "consciousness," procedural memory, action
selection, perceptual learning, episodic learning, deliberation, volition, and
non-routine problem solving, the LIDA model is ideally suited to provide a
working ontology that would allow for the discussion, design, and comparison of
AGI systems. The LIDA technology is based on the LIDA cognitive cycle, a sort
of "cognitive atom." The more elementary cognitive modules play a
role in each cognitive cycle. Higher-level processes are performed over
multiple cycles.
The
LIDA architecture represents perceptual entities, objects, categories,
relations, etc., using nodes and links .... These
serve as perceptual symbols acting as the common currency for information
throughout the various modules of the LIDA architecture.
SNePS [SNePS: A Logic for Natural Language Understanding and Commonsense Reasoning
, The GLAIR Cognitive Architecture]The long term
goal of the SNePS Research Group is to understand the
nature of intelligent cognitive processes by developing and experimenting with
computational cognitive agents that are able to use and understand natural
language, reason, act, and solve problems in a wide variety of domains.
The
SNePS knowledge representation, reasoning, and acting
system has several features that facilitate metacognition in SNePS-based
agents. The most prominent is the fact that propositions are represented in SNePS as terms rather than as logical sentences. The effect
is that propositions can occur as arguments of propositions, acts, and policies
without limit, and without leaving first-order logic.
Cyc [Building Large Knowledge-Based Systems,
Common Sense Reasoning]
Vast amounts
of commonsense knowledge, representing human consensus reality, would need to
be encoded to produce a general AI system. In order to mimic human reasoning, Cyc would require background knowledge regarding science,
society and culture, climate and weather, money and financial systems, health
care, history, politics, and many other domains of human experience. The Cyc Project team expected to encode at least a million
facts spanning these and many other topic areas.
The
Cyc knowledge base (KB) is a formalized
representation of a vast quantity of fundamental human knowledge: facts, rules
of thumb, and heuristics for reasoning about the objects and events of everyday
life. The medium of representation is the formal language CycL.
The KB consists of terms -- which constitute the vocabulary of CycL -- and assertions which relate those terms. These
assertions include both simple ground assertions and rules.
AIXI
[Universal
Artificial Intelligence, Universal Algorithmic Intelligence: A mathematical top->down
approach]
An important
observation is that most, if not all known facets of intelligence can be
formulated as goal driven or, more precisely, as maximizing some utility
function.
Sequential
decision theory formally solves the problem of rational agents in uncertain
worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves
the problem of sequence prediction for unknown prior distribution. We combine
both ideas and get a parameter-free theory of universal Artificial
Intelligence. We give strong arguments that the resulting AIXI model is the
most intelligent unbiased agent possible.
The
major drawback of the AIXI model is that it is uncomputable, ... which
makes an implementation impossible. To overcome this problem, we constructed a
modified model AIXItl, which is still effectively
more intelligent than any other time t and length l bounded algorithm.
NARS
[Rigid Flexibility: The Logic of Intelligence,
From NARS to a Thinking Machine]
What makes
NARS different from conventional reasoning systems is its ability to learn from
its experience and to work with insufficient knowledge and resources. NARS attempts to uniformly explain and reproduce many cognitive
facilities, including reasoning, learning, planning, etc, so as to provide a
unified theory, model, and system for AI as a whole. The ultimate goal
of this research is to build a thinking machine.
The
development of NARS takes an incremental approach consisting four major stages.
At each stage, the logic is extended to give the system a more expressive
language, a richer semantics, and a larger set of inference rules; the memory
and control mechanism are then adjusted accordingly to support the new logic.
In
NARS the notion of "reasoning" is extended to represent a system's
ability to predict the future according to the past, and to satisfy the
unlimited resources demands using the limited resources supply, by flexibly
combining justifiable micro steps into macro behaviors in a domain-independent
manner.
Novamente [The Hidden Pattern: A Patternist
Philosophy of Mind, An Integrative Architecture for General Intelligence]
Novamente incorporates aspects of many previous AI paradigms such as agent
systems, evolutionary programming, reinforcement learning, automated theorem-proving, and probabilistic reasoning. However, it is
unique in its overall architecture, which confronts the problem of creating a
holistic digital mind in a direct way that has not been done before.
General
Intelligence is the ability to achieve complex goals in complex environments.
Novamente essentially consists of a framework for
tightly integrating various AI algorithms in the context of a highly flexible
common knowledge representation, and a specific
assemblage of AI algorithms created or tweaked for tight integration in an
integrative AGI context.
HTM [On
Intelligence, Hierarchical Temporal Memory]
The brain
uses vast amounts of memory to create a model of the world. Everything you know
and have learned is stored in this model. The brain uses this memory-based
model to make continuous predictions of future events. It is the ability to
make predictions about the future that is the crux of intelligence.
Hierarchical
Temporal Memory (HTM) is a technology that replicates the structural and
algorithmic properties of the neocortex. HTM
therefore offers the promise of building machines that approach or exceed human
level performance for many cognitive tasks.
HTMs
are organized as a tree-shaped hierarchy of nodes, where each node implements a
common learning and memory function. HTMs store information throughout the
hierarchy in a way that models the world. All objects in the world, be they
cars, people, buildings, speech, or the flow of information across a computer
network, have structure. This structure is hierarchical in both space and time.
HTM memory is also hierarchical in both space and time, and therefore can
efficiently capture and model the structure of the world.
The above AGI
projects are roughly classified in the following table, according to the type
of their answers to the previously listed 1st question (on research goal) and
3rd question (on technical path).
|
goal \ path |
hybrid |
integrated |
unified |
|
principle |
|
|
AIXI, NARS |
|
function |
|
LIDA, Novamente, Polyscheme |
SNePS, Soar |
|
capability |
|
|
Cyc |
|
behavior |
|
|
ACT-R |
|
structure |
|
|
HTM |
Since
this classification is made at a high level, projects in the same entry of the
table are still quite different in the details of their research goals and
technical paths.
In summary, the
current AGI projects are based on very different theories and techniques.
4. AGI Literature and Resource
The earliest
collection of AGI works is Artificial
General Intelligence. Though this book was published in 2007, the
manuscript was finished in 2003. The publisher website provides free download
for the table
of contents and the introductory chapter "Contemporary
Approaches to Artificial General Intelligence". Most chapters in the
collection can be found at the authors' websites.
Advances in
Artificial General Intelligence: Concepts, Architectures and Algorithms is
a post-conference proceedings of the 2006 AGI
Workshop. The introductory chapter "Aspects
of Artificial General Intelligence" clarified the notion of AGI and
summarized the other chapters. The Workshop
website contains links to all the chapters in the collection, plus some
presentations and videos.
The annual AGI international conference series was
started in 2008. The conference link to all accepted papers, plus additional
materials.
Journal of Artificial General Intelligence
is a peer-reviewed journal with open access.
An AGI Network website is under
construction.
There
is a mailing list, a Google Group, and a LinkedIn Group, all dedicated
to AGI.
Many
AGI related resources are collected in the AGIRI website.
Here
only resources dedicated to AGI are listed, though there are many other related
works in AI and Cognitive Science literature. Some of them are assembled into
the following reading lists: