Skip to content
/ gambit Public

Gambit: an implementation of Multi-Arm Bandit for doing real time stochastic Reinforcement Learning

Notifications You must be signed in to change notification settings

br4nd0n/gambit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Gambit

A real-time Multi-Armed Bandit implementation that uses Thompson Sampling

To run:

  1. mvn clean install

  2. mvn exec:java -Dexec.mainClass="mab.Gambit"

The implementation of Gambit here is a proof-of-concept for doing basic real time reinforcement learning using mult-arm bandit, using Thompson Sampling as the explore/exploit policy. Running Gambit will print out in the console a series of steps & decisions where Gambit begins by trying different actions randomly, and as it gains confidence about the performance of a particular action, it will start to select that action more frequently, thus balancing the learning of performance by action with exploiting the highest performing action.

Contact: Brandon O'Brien @hakczar

About

Gambit: an implementation of Multi-Arm Bandit for doing real time stochastic Reinforcement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages