QLearning

Implementation of the Q-Learning algorithm.

Environment

The agent is going to learn in the following environment:

where S = Start, E = Empty, M = Mine and G = Goal

Rewards

For each step, the agent takes, it gets a reward of -1. This prevents it from taking a longer path than necessary. Furthermore, a mine causes a reward of -100 and when the agent reaches its goal it will get a positive reward of 100.

Strategies

In order to make the agent learn as much as possible, in the shortest amount of time, a version of the epsilon greedy strategy was implemented. This means, that the agent will exponentially exploit his environment more than exploring it.

Results

You can see, that the agent doesn't go onto the field (2|1), because it has a bad experience with this field, as there are two mines nearby. So, not only can the agent find the quickest, but also the safest way.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
images		images
out/production/QLearning		out/production/QLearning
src		src
QLearning.iml		QLearning.iml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QLearning

Environment

Rewards

Strategies

Results

About

Releases

Packages

Languages

ckainz11/QLearning

Folders and files

Latest commit

History

Repository files navigation

QLearning

Environment

Rewards

Strategies

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages