In the dissertation combined reinforcement learning
(RL) and
simulated annealing (SA) concepts, problems, proposed
solutions,
algorithms and application examples are shown.
RL models a decision maker as a goal-driven agent
aiming to reach
goal states in the problem representation state
space. The agent
takes different choices among the numerous
possibilities, but each
choice can make different impact in the environment.
Each decision
has some effect being expressed in the form of
numeric honor or
dishonor, in a reward value. The agent utilizes the
feedback to
recognize which actions are honored and which are
not. The agent
then tries to govern its decision sequence into the
direction that
maximizes the environment s satisfaction .
The concept of SA is based on the analogy of how
liquids freeze.
There an initially high temperature and disordered
melt is slowly
cooled down and reaches thermal equilibrium.
While in annealing the temperature parameter bounds are
straightforward, in SA they might be dependent on the
problem and
its numeric representation.
This dissertation gives a method which can be used
for defining
temperature bounds in RL environment.
(RL) and
simulated annealing (SA) concepts, problems, proposed
solutions,
algorithms and application examples are shown.
RL models a decision maker as a goal-driven agent
aiming to reach
goal states in the problem representation state
space. The agent
takes different choices among the numerous
possibilities, but each
choice can make different impact in the environment.
Each decision
has some effect being expressed in the form of
numeric honor or
dishonor, in a reward value. The agent utilizes the
feedback to
recognize which actions are honored and which are
not. The agent
then tries to govern its decision sequence into the
direction that
maximizes the environment s satisfaction .
The concept of SA is based on the analogy of how
liquids freeze.
There an initially high temperature and disordered
melt is slowly
cooled down and reaches thermal equilibrium.
While in annealing the temperature parameter bounds are
straightforward, in SA they might be dependent on the
problem and
its numeric representation.
This dissertation gives a method which can be used
for defining
temperature bounds in RL environment.