Saturday, July 14, 2018

Reinforcement Learning (Machine Learning)

In reinforcement learning (RL) there’s no answer key, but your reinforcement learning agent still has to decide how to act to perform its task. In the absence of existing training data, the agent learns from experience. It collects the training examples (“this action was good, that action was bad”) through trial-and-error as it attempts its task, with the goal of maximizing long-term reward.

One simple strategy for exploration would be to take the best known action most of the time (say, 80% of the time), but occasionally explore a new, randomly selected action even though it might be moving away from known reward. This strategy is called the epsilon-greedy strategy, where epsilon is the percent of the time that the agent takes a randomly selected action rather than taking the action that is most likely to maximize reward.

Markov Decision Process is a process that has specified transition probabilities from state to state.

Q-learning is a technique that evaluates which action to take based on an action-value function that determines the value of being in a certain state and taking a certain action at that state.

Policy learning is a more straightforward alternative in which we learn a policy function, which is a direct map from each state to the best corresponding action at that state. Think of it as a behavioral policy: “when I observe state s, the best thing to do is take action a”.

source: Machine Learning for Humans, Part 5: Reinforcement Learning

No comments:

Post a Comment