Reinforcement Learning
1. The Problem
learn what to do from reward and punishment
optimize utility or fitness function with selected actions
learning about actions
2. Proposed Solutions
"Reinforcement learning" in machine learning study
- Multi-step decision-making in Markov decision process: given initial state, transition model T(s, a, s') and reward function R(s)
- Policy: state-action mapping; optimal policy: maximum expected utility (sum of rewards)
- Policy learning in a complete model: value iteration (calculating expected utility for each state) or policy iteration (evaluating and improving a working policy)
- Passive learning: evaluate a fixed policy, and estimate transition model and reward function from reinforcement in trials
- Active learning: exploitation vs. exploration tradeoff, bandit problem
Sequential decision according to algorithmic probability
- Solomonoff: predicting future data given previous observations, provided the data is sampled from a computable probability distribution.
- Hutter: maximizing reward in unknown environment with computable probability distribution.
- To go through all Turing machines to find the most probable one, given observed data.
- Using the algorithmic of a Turing machine to decide its prior probability.
- Computability and computational complexity of the related models.
Evolutionary learning
3. Issues
various assumptions about the environment and the system
delayed feedback, credit/blame assignment
incomplete description, non-existing function, non-repeated decision
evolution and intelligence
4. Reading
Sections 17.2-3, 21.1-3, 4.3
5. ideas
solving problems with action sequences: static vs. dynamic
[Computation and Intelligence in Problem Solving]
knowledge about action:
- (starting and ending) states vs. (sufficient and necessary) conditions
- repeated situation vs. one-time situation
- objective probability vs. experience-based expectation
- state-based reward vs. dynamic goal structure
changing knowledge and resources restrictions may completely change the problem
intelligence and evolution:
- both lead to adaptation
- individual vs. species
- experience-driven changes vs. experience-independent changes
- both evaluate changes according to feedback --- reinforcement
- gradual modifications vs. radical modifications
- advantages and disadvantages