0203. Introduction to Artificial Intelligence

Learning: Connectionist

 

1. Neural network

Neural network (NN) gets its inspiration from the human brain, though most NNs do not tend to simulate a real neural net accurately. The emphasis is on accounting for intelligence via the statistical and dynamic regularities of highly interconnected, large-scale networks. For this reason, it is also called connectionist model or PDP (Parallel Distributed Processing) model (of intelligence or cognition).

A NN consists of interconnected nodes. Each node in the network is similar to a neuron in that it takes some input signals, takes a weighted sum of them as the total input to the unit, and generates an output according to a simple (but usually nonlinear) function. For example, it can be a threshold function of the input.

Each time the NN is used, an input activation vector is applied to the input nodes, then the activation vector value of the other nodes is calculated according to the activation sent out of the input nodes by their activation function. The same process is repeated until the network reach a stable state, then the activity values of certain nodes are taken to be the output. Overall, a neural network often corresponds to a function that map input vectors to output vectors.

 

2. Learning by back-propagation

The function implemented in a neural network depends both on the structure of the network and the weights on the links. Learning usually happens as the gradual adjustments of the weights according to training instances, that is, concrete input/output values. For example, a simple "perceptron" can learn a linear function according to given examples, by adjusting weight values according to error signals. The learning rule is explained in Section 11.2. See an on-line demo.

For more complicated functions, one popular learning algorithm is "back propagation" in multilayer perceptrons. This algorithm is used to train fully connected, layered, feedforward networks. Typically, such a NN has an input layer, a hidden layer, and an output layer. The nodes in one layer are connected to the nodes on the next layer by links, and there is a weight value attached to each link. The input values are first summarized into a weighted sun, which is feeded into a S-shaped function to generate an output value in [0, 1].

              Hidden1
   Input1  o --- o --- o Output1
            \   / \   /
             \ /   \ /
              X     X          
             / \   / \
            /   \ /   \   
   Input2  o --- o --- o Output2
              Hidden2
After the number of nodes in each layer is determined, the weights of the links are initialized to random numbers. Then, the network is trained by repeating the following procedure for each training case:
  1. Apply the input values to the input layer, and use the current weights to calculate the activation value of the hidden layer, then the output layer.
  2. Compute the difference between the actual output and the target output.
  3. The weights of the links connecting the output layer and the hidden layer are adjusted to reduce the difference as much as possible (given the current activation of the hidden layer).
  4. The previous steps are repeated on the links connecting the hidden layer and the input layer.
See Section 11.3 of the textbook for a detailed description of the learning algorithm. A demo applet is here.

In principle, a three-layer back-propagation can learn any function by the above supervised reinforcement learning algorithm. It uses repeated error-backpropagation to solve the credit assignment problem.

 

3. Hebbian coincidence learning

NN can be used for unsupervised learning, too. The approach suggested by Hebb is to increased the weight of a link between two nodes if both an activated by the same input signal, and to decrease the weight if one is activated and the other isn't.

After repeated training, an "associative memory" will be formed, such that when part of an input pattern is activated, the other part will become active, too. A demo applet is here.

Hebbian coincidence learning can also be used for supervised learning by remember the input/output pair according to Hebbian rule. It is also a kind of incremental reinforcement learning.

 

4. Capability and limitation of NN

Compared to symbolic AI, NN is characterized by its stress on learning and its tolerance to uncertainty.

Typical applications: categorization, pattern recognition, data mining, and so on.

Limitations:

Further reading: links at AAAI, a simulation software, a list of companies, a start-up company.