Reinforcement learning is the problem faced by an
agent that must learn behavior through
trial-and-error interactions with a dynamic
environment. Usually, the problem to be solved
contains subtasks that repeat at different regions of
the state space. Without any guidance
an agent has to learn the solutions of all subtask
instances independently, which in turn degrades the
performance of the learning process. In this work, we
propose two novel approaches for building the
connections between different regions of the search
space. The first approach efficiently discovers
abstractions in the form of conditionally terminating
sequences and represents these abstractions compactly
as a single tree structure; this structure is then
used to determine the actions to be executed by the
agent. In the second approach, a similarity function
between states is defined based on the number of
common action sequences; by using this similarity
function, updates on the action-value function of a
state are re ected to all similar states that allows
experience acquired during learning be applied to a
broader context. The effectiveness of both approaches
is demonstrated empirically over various domains.
agent that must learn behavior through
trial-and-error interactions with a dynamic
environment. Usually, the problem to be solved
contains subtasks that repeat at different regions of
the state space. Without any guidance
an agent has to learn the solutions of all subtask
instances independently, which in turn degrades the
performance of the learning process. In this work, we
propose two novel approaches for building the
connections between different regions of the search
space. The first approach efficiently discovers
abstractions in the form of conditionally terminating
sequences and represents these abstractions compactly
as a single tree structure; this structure is then
used to determine the actions to be executed by the
agent. In the second approach, a similarity function
between states is defined based on the number of
common action sequences; by using this similarity
function, updates on the action-value function of a
state are re ected to all similar states that allows
experience acquired during learning be applied to a
broader context. The effectiveness of both approaches
is demonstrated empirically over various domains.