Skip to content

Latest commit

 

History

History
30 lines (18 loc) · 1.89 KB

README.md

File metadata and controls

30 lines (18 loc) · 1.89 KB

This is the course project for CS394R Reinforcement Learning -- Theory and Practice

Team Member: Xiang Gu, Alexander Matthew Keller

In this project, we explored the possibility of using reinforcement learning techniques to directional drilling in the oil&gas industry. Briefly, this is what we did:

  • We formulated the (simplified) directional drilling process as a Markov Decision Process (MDP).

  • We tried to apply several classical RL algorithms to solve this problem including Tabular n-step Sarsa, n-step Sarsa with Tile Coding, One-step Actor-Critic, Q-learning with Tile Coding, Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO). We also implemented a conventional solver for this problem using Model Predictive Control (MPC) and compared the performance with those RL algorithm. (Experiment 1)

  • We studied the robustness/generalization of the policy learned from RL algorithms. Namely, we looked at the generalization performance when there are stochasticity in the test environment and what is the best practical way to train a robust policy. (Experiment 2). In addition, we also investigated the generalization performance w.r.t. different goal position during test time using different state representation. (Experiment 3)

To run the learning algorithms to get data (already done): 1). (Tabular n-step Sarsa): python3 n_step_sarsa_tabular/main_code.py 2). (n-step Sarsa with Tile Coding): python3 n_step_sarsa_fa/main_code.py 3). (One-step Actor-Critic): python3 ActorCritic.py 4). (Q-learning with Tile Coding): python3 QLearning.py 5). (VPG, TRPO, PPO): python3 VPG_TRPO_PPO.py # change one line of code to vpg, trpo, or ppo to run different algorithms

To plot figures of Experiment 1: python3 Experiment1Plot.py

To plot figures of Experiment 2: python3 Experiment2Plot.py

To plot figures of Experiment 3: python3 Experiment3Plot.py