Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
wenzhangliu committed Dec 25, 2023
1 parent 5cd93dd commit 5723366
Show file tree
Hide file tree
Showing 89 changed files with 936 additions and 675 deletions.
212 changes: 4 additions & 208 deletions docs/source/documents/api/agents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,212 +6,8 @@ These agents can deal with both continuous and discrete actions. The MARL algori
All of the agents in XuanCe are implemented under a unified framework, which are supported by PyTorch, TensorFlow, and MindSpore.
The class of agents are listed as follows.

.. toctree::
:hidden:
.. toctree::
:maxdepth: 1

Agent <agents/drl/basic_drl_class>
MARLAgents <agents/marl/basic_marl_class>
DQN_Agent <agents/drl/dqn>
C51_Agent <agents/drl/c51>
DDQN_Agent <agents/drl/ddqn>
DuelDQN_Agent <agents/drl/dueldqn>
NoisyDQN_Agent <agents/drl/noisydqn>
PerDQN_Agent <agents/drl/perdqn>
QRDQN_Agent <agents/drl/qrdqn>
PG_Agent <agents/drl/pg>
PPG_Agent <agents/drl/ppg>
PPOCLIP_Agent <agents/drl/ppo_clip>
PPOCKL_Agent <agents/drl/ppo_kl>
PDQN_Agent <agents/drl/pdqn>
SPDQN_Agent <agents/drl/spdqn>
MPDQN_Agent <agents/drl/mpdqn>
A2C_Agent <agents/drl/a2c>
SAC_Agent <agents/drl/sac>
SACDIS_Agent <agents/drl/sac_dis>
DDPG_Agent <agents/drl/ddpg>
TD3_Agent <agents/drl/td3>

IQL_Agents <agents/marl/iql>
VDN_Agents <agents/marl/vdn>
QMIX_Agents <agents/marl/qmix>
WQMIX_Agents <agents/marl/wqmix>
QTRAN_Agents <agents/marl/qtran>
DCG_Agents <agents/marl/dcg>
IDDPG_Agents <agents/marl/iddpg>
MADDPG_Agents <agents/marl/maddpg>
ISAC_Agents <agents/marl/isac>
MASAC_Agents <agents/marl/masac>
IPPO_Agents <agents/marl/ippo>
MAPPO_Agents <agents/marl/mappo>
MATD3_Agents <agents/marl/matd3>
VDAC_Agents <agents/marl/vdac>
COMA_Agents <agents/marl/coma>
MFQ_Agents <agents/marl/mfq>
MFAC_Agents <agents/marl/mfac>

.. raw:: html

<br><hr>



.. list-table::
:header-rows: 1

* - Agent
- PyTorch
- TensorFlow
- MindSpore
* - :doc:`DQN <agents/drl/dqn>`: Deep Q-Networks
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`C51DQN <agents/drl/c51>`: Distributional Reinforcement Learning
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`Double DQN <agents/drl/ddqn>`: DQN with Double Q-learning
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`Dueling DQN <agents/drl/dueldqn>`: DQN with Dueling network
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`Noisy DQN <agents/drl/noisydqn>`: DQN with Parameter Space Noise
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`PERDQN <agents/drl/perdqn>`: DQN with Prioritized Experience Replay
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`QRDQN <agents/drl/qrdqn>`: DQN with Quantile Regression
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`VPG <agents/drl/pg>`: Vanilla Policy Gradient
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`PPG <agents/drl/ppg>`: Phasic Policy Gradient
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`PPO <agents/drl/ppo_clip>`: Proximal Policy Optimization
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`PDQN <agents/drl/pdqn>`: Parameterised DQN
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`SPDQN <agents/drl/spdqn>`: Split PDQN
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MPDQN <agents/drl/mpdqn>`: Multi-pass PDQN
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`A2C <agents/drl/a2c>`: Advantage Actor Critic
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`SAC <agents/drl/sac>`: Soft Actor-Critic
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`SAC-Dis <agents/drl/sac_dis>`: SAC for Discrete Actions
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`DDPG <agents/drl/ddpg>`: Deep Deterministic Policy Gradient
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`TD3 <agents/drl/td3>`: Twin Delayed DDPG
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`


.. list-table::
:header-rows: 1

* - Multi-Agent
- PyTorch
- TensorFlow
- MindSpore
* - :doc:`IQL <agents/drl/td3>`: Independent Q-Learning
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`VDN <agents/drl/td3>`: Value-Decomposition Networks
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`QMIX <agents/drl/td3>`: VDN with Q-Mixer
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`WQMIX <agents/drl/td3>`: Weighted QMIX
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`QTRAN <agents/drl/td3>`: Q-Transformation
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`DCG <agents/drl/td3>`: Deep Coordination Graph
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`IDDPG <agents/drl/td3>`: Independent DDPG
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MADDPG <agents/drl/td3>`: Multi-Agent DDPG
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`ISAC <agents/drl/td3>`: Independent SAC
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MASAC <agents/drl/td3>`: Multi-Agent SAC
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`IPPO <agents/drl/td3>`: Independent PPO
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MAPPO <agents/drl/td3>`: Multi-Agent PPO
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MATD3 <agents/drl/td3>`: Multi-Agent TD3
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`VDAC <agents/drl/td3>`: Value-Decomposition Actor-Critic
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`COMA <agents/drl/td3>`: Counterfacutal Multi-Agent PG
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MFQ <agents/drl/td3>`: Mean-Field Q-Learning
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
* - :doc:`MFAC <agents/drl/td3>`: Mean-Field Actor-Critic
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`
- .. centered:: :math:`\checkmark`

.. raw:: html

<br><hr>
agents/drl_agents
agents/marl_agents
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/a2c.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ A2C_Agent

<br><hr>

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agent.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, scheduler, device)
Expand Down Expand Up @@ -56,7 +57,8 @@ A2C_Agent

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflow.agent.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, device)
Expand Down Expand Up @@ -105,7 +107,8 @@ A2C_Agent

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindspore.agents.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, scheduler)
Expand Down
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/basic_drl_class.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ Agent

To create a new Agent, you should build a class inherit from ``xuance.torch.agents.agent.Agent`` , ``xuance.tensorflow.agents.agent.Agent``, or ``xuance.mindspore.agents.agent.Agent``.

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agents.agent.Agent(config, envs, policy, memory, learner, device, log_dir, model_dir)
Expand Down Expand Up @@ -110,7 +111,8 @@ To create a new Agent, you should build a class inherit from ``xuance.torch.agen

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflowtensorflow.agent.agent.Agent(config, envs, policy, memory, learner, device, log_dir, model_dir)
Expand All @@ -137,7 +139,8 @@ To create a new Agent, you should build a class inherit from ``xuance.torch.agen

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindsporetensorflow.agent.agent.Agent(envs, policy, memory, learner, device, log_dir, model_dir)
Expand Down
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/c51.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ C51_Agent

<br><hr>

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agent.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, scheduler, device)
Expand Down Expand Up @@ -59,7 +60,8 @@ C51_Agent

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflow.agent.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, device)
Expand Down Expand Up @@ -110,7 +112,8 @@ C51_Agent

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindspore.agents.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, scheduler)
Expand Down
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/ddpg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ DDPG_Agent

<br><hr>

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agent.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, scheduler, device)
Expand Down Expand Up @@ -58,7 +59,8 @@ DDPG_Agent

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflow.agent.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, device)
Expand Down Expand Up @@ -109,7 +111,8 @@ DDPG_Agent

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindspore.agents.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, scheduler)
Expand Down
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/ddqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ DQN with double q-learning trick.

<br><hr>

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agent.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, scheduler, device)
Expand Down Expand Up @@ -60,7 +61,8 @@ DQN with double q-learning trick.

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflow.agent.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, device)
Expand Down Expand Up @@ -111,7 +113,8 @@ DQN with double q-learning trick.

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindspore.agents.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, scheduler)
Expand Down
9 changes: 6 additions & 3 deletions docs/source/documents/api/agents/drl/dqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ DQN_Agent

<br><hr>

**PyTorch:**
PyTorch
------------------------------------------

.. py:class::
xuance.torch.agent.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, scheduler, device)
Expand Down Expand Up @@ -58,7 +59,8 @@ DQN_Agent

<br><hr>

**TensorFlow:**
TensorFlow
------------------------------------------

.. py:class::
xuance.tensorflow.agent.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, device)
Expand Down Expand Up @@ -109,7 +111,8 @@ DQN_Agent

<br><hr>

**MindSpore:**
MindSpore
------------------------------------------

.. py:class::
xuance.mindspore.agents.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, scheduler)
Expand Down
Loading

0 comments on commit 5723366

Please sign in to comment.