Today we live in the world which is very much a
man-made or artificial. In such a world there are
many systems and environments, both real and
virtual, which can be very well described by formal
models. This creates an opportunity for developing a
"synthetic intelligence" - artificial systems
which cohabit these environments with human beings
and carry out some useful function.
In this book we address some aspects of this
development in the framework of reinforcement
learning, learning how to map sensations to actions,
by trial and error from feedback. In some challenging
cases, actions may affect not only the immediate
reward, but also the next sensation
and all subsequent rewards. The general task of
reinforcement learning stated in a traditional way is
unreasonably ambitious for these two characteristics:
search by trial-and-error and delayed reward. We
investigate general ways of breaking the task of
designing a controller down to more feasible
sub-tasks which are solved independently. We propose
to consider both taking advantage of past experience
by reusing parts of other systems, and facilitating
the learning phase by employing a bias in initial
configuration.
man-made or artificial. In such a world there are
many systems and environments, both real and
virtual, which can be very well described by formal
models. This creates an opportunity for developing a
"synthetic intelligence" - artificial systems
which cohabit these environments with human beings
and carry out some useful function.
In this book we address some aspects of this
development in the framework of reinforcement
learning, learning how to map sensations to actions,
by trial and error from feedback. In some challenging
cases, actions may affect not only the immediate
reward, but also the next sensation
and all subsequent rewards. The general task of
reinforcement learning stated in a traditional way is
unreasonably ambitious for these two characteristics:
search by trial-and-error and delayed reward. We
investigate general ways of breaking the task of
designing a controller down to more feasible
sub-tasks which are solved independently. We propose
to consider both taking advantage of past experience
by reusing parts of other systems, and facilitating
the learning phase by employing a bias in initial
configuration.