diff --git a/docs/source/documents/api/agents.rst b/docs/source/documents/api/agents.rst index b34f01501..67044237e 100644 --- a/docs/source/documents/api/agents.rst +++ b/docs/source/documents/api/agents.rst @@ -6,212 +6,8 @@ These agents can deal with both continuous and discrete actions. The MARL algori All of the agents in XuanCe are implemented under a unified framework, which are supported by PyTorch, TensorFlow, and MindSpore. The class of agents are listed as follows. -.. toctree:: - :hidden: +.. toctree:: + :maxdepth: 1 - Agent - MARLAgents - DQN_Agent - C51_Agent - DDQN_Agent - DuelDQN_Agent - NoisyDQN_Agent - PerDQN_Agent - QRDQN_Agent - PG_Agent - PPG_Agent - PPOCLIP_Agent - PPOCKL_Agent - PDQN_Agent - SPDQN_Agent - MPDQN_Agent - A2C_Agent - SAC_Agent - SACDIS_Agent - DDPG_Agent - TD3_Agent - - IQL_Agents - VDN_Agents - QMIX_Agents - WQMIX_Agents - QTRAN_Agents - DCG_Agents - IDDPG_Agents - MADDPG_Agents - ISAC_Agents - MASAC_Agents - IPPO_Agents - MAPPO_Agents - MATD3_Agents - VDAC_Agents - COMA_Agents - MFQ_Agents - MFAC_Agents - -.. raw:: html - -

- - - -.. list-table:: - :header-rows: 1 - - * - Agent - - PyTorch - - TensorFlow - - MindSpore - * - :doc:`DQN `: Deep Q-Networks - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`C51DQN `: Distributional Reinforcement Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Double DQN `: DQN with Double Q-learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Dueling DQN `: DQN with Dueling network - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Noisy DQN `: DQN with Parameter Space Noise - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PERDQN `: DQN with Prioritized Experience Replay - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QRDQN `: DQN with Quantile Regression - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VPG `: Vanilla Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PPG `: Phasic Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PPO `: Proximal Policy Optimization - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PDQN `: Parameterised DQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SPDQN `: Split PDQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MPDQN `: Multi-pass PDQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`A2C `: Advantage Actor Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SAC `: Soft Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SAC-Dis `: SAC for Discrete Actions - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`DDPG `: Deep Deterministic Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`TD3 `: Twin Delayed DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - -.. list-table:: - :header-rows: 1 - - * - Multi-Agent - - PyTorch - - TensorFlow - - MindSpore - * - :doc:`IQL `: Independent Q-Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VDN `: Value-Decomposition Networks - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QMIX `: VDN with Q-Mixer - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`WQMIX `: Weighted QMIX - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QTRAN `: Q-Transformation - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`DCG `: Deep Coordination Graph - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`IDDPG `: Independent DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MADDPG `: Multi-Agent DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`ISAC `: Independent SAC - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MASAC `: Multi-Agent SAC - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`IPPO `: Independent PPO - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MAPPO `: Multi-Agent PPO - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MATD3 `: Multi-Agent TD3 - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VDAC `: Value-Decomposition Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`COMA `: Counterfacutal Multi-Agent PG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MFQ `: Mean-Field Q-Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MFAC `: Mean-Field Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - -.. raw:: html - -

+ agents/drl_agents + agents/marl_agents diff --git a/docs/source/documents/api/agents/drl/a2c.rst b/docs/source/documents/api/agents/drl/a2c.rst index b27671471..21ec42bce 100644 --- a/docs/source/documents/api/agents/drl/a2c.rst +++ b/docs/source/documents/api/agents/drl/a2c.rst @@ -5,7 +5,8 @@ A2C_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ A2C_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ A2C_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.a2c_agent.A2C_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/basic_drl_class.rst b/docs/source/documents/api/agents/drl/basic_drl_class.rst index 16aa254ff..2d75aac59 100644 --- a/docs/source/documents/api/agents/drl/basic_drl_class.rst +++ b/docs/source/documents/api/agents/drl/basic_drl_class.rst @@ -3,7 +3,8 @@ Agent To create a new Agent, you should build a class inherit from ``xuance.torch.agents.agent.Agent`` , ``xuance.tensorflow.agents.agent.Agent``, or ``xuance.mindspore.agents.agent.Agent``. -**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agents.agent.Agent(config, envs, policy, memory, learner, device, log_dir, model_dir) @@ -110,7 +111,8 @@ To create a new Agent, you should build a class inherit from ``xuance.torch.agen

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflowtensorflow.agent.agent.Agent(config, envs, policy, memory, learner, device, log_dir, model_dir) @@ -137,7 +139,8 @@ To create a new Agent, you should build a class inherit from ``xuance.torch.agen

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindsporetensorflow.agent.agent.Agent(envs, policy, memory, learner, device, log_dir, model_dir) diff --git a/docs/source/documents/api/agents/drl/c51.rst b/docs/source/documents/api/agents/drl/c51.rst index eb852f338..0e90ab1ac 100644 --- a/docs/source/documents/api/agents/drl/c51.rst +++ b/docs/source/documents/api/agents/drl/c51.rst @@ -5,7 +5,8 @@ C51_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, scheduler, device) @@ -59,7 +60,8 @@ C51_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, device) @@ -110,7 +112,8 @@ C51_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.c51_agent.C51_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/ddpg.rst b/docs/source/documents/api/agents/drl/ddpg.rst index 90c327e10..3ee7f7882 100644 --- a/docs/source/documents/api/agents/drl/ddpg.rst +++ b/docs/source/documents/api/agents/drl/ddpg.rst @@ -5,7 +5,8 @@ DDPG_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ DDPG_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, device) @@ -109,7 +111,8 @@ DDPG_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.ddpg_agent.DDPG_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/ddqn.rst b/docs/source/documents/api/agents/drl/ddqn.rst index f3cabb3cf..57183098c 100644 --- a/docs/source/documents/api/agents/drl/ddqn.rst +++ b/docs/source/documents/api/agents/drl/ddqn.rst @@ -7,7 +7,8 @@ DQN with double q-learning trick.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -60,7 +61,8 @@ DQN with double q-learning trick.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, device) @@ -111,7 +113,8 @@ DQN with double q-learning trick.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.ddqn_agent.DDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/dqn.rst b/docs/source/documents/api/agents/drl/dqn.rst index 85e3cfd55..3b749c60e 100644 --- a/docs/source/documents/api/agents/drl/dqn.rst +++ b/docs/source/documents/api/agents/drl/dqn.rst @@ -5,7 +5,8 @@ DQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ DQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, device) @@ -109,7 +111,8 @@ DQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.dqn_agent.DQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/dueldqn.rst b/docs/source/documents/api/agents/drl/dueldqn.rst index 815b2602f..03753e346 100644 --- a/docs/source/documents/api/agents/drl/dueldqn.rst +++ b/docs/source/documents/api/agents/drl/dueldqn.rst @@ -5,7 +5,8 @@ DuelDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.dueldqn_agent.DuelDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ DuelDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.dueldqn_agent.DuelDQN_Agent(config, envs, policy, optimizer, device) @@ -109,7 +111,8 @@ DuelDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.dueldqn_agent.DuelDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/mpdqn.rst b/docs/source/documents/api/agents/drl/mpdqn.rst index d8e942823..a7b17374b 100644 --- a/docs/source/documents/api/agents/drl/mpdqn.rst +++ b/docs/source/documents/api/agents/drl/mpdqn.rst @@ -5,7 +5,8 @@ MPDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.mpdqn_agent.MPDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -68,7 +69,8 @@ MPDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.mpdqn_agent.MPDQN_Agent(config, envs, policy, optimizer, device) @@ -127,7 +129,8 @@ MPDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.mpdqn_agent.MPDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/noisydqn.rst b/docs/source/documents/api/agents/drl/noisydqn.rst index ce71ce612..05c4365b2 100644 --- a/docs/source/documents/api/agents/drl/noisydqn.rst +++ b/docs/source/documents/api/agents/drl/noisydqn.rst @@ -5,7 +5,8 @@ NoisyDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.noisydqn_agent.NoisyDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ NoisyDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.noisydqn_agent.NoisyDQN_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ NoisyDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.noisydqn_agent.NoisyDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/pdqn.rst b/docs/source/documents/api/agents/drl/pdqn.rst index 6e25b63a7..e1b3b6ed1 100644 --- a/docs/source/documents/api/agents/drl/pdqn.rst +++ b/docs/source/documents/api/agents/drl/pdqn.rst @@ -5,7 +5,8 @@ PDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.pdqn_agent.PDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -68,7 +69,8 @@ PDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.pdqn_agent.PDQN_Agent(config, envs, policy, optimizer, device) @@ -129,7 +131,8 @@ PDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.pdqn_agent.PDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/perdqn.rst b/docs/source/documents/api/agents/drl/perdqn.rst index cbb393eea..7cf6e7543 100644 --- a/docs/source/documents/api/agents/drl/perdqn.rst +++ b/docs/source/documents/api/agents/drl/perdqn.rst @@ -5,7 +5,8 @@ PerDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.perdqn_agent.PerDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ PerDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.perdqn_agent.PerDQN_Agent(config, envs, policy, optimizer, device) @@ -109,7 +111,8 @@ PerDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.perdqn_agent.PerDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/pg.rst b/docs/source/documents/api/agents/drl/pg.rst index 60124684f..ae4b0ee00 100644 --- a/docs/source/documents/api/agents/drl/pg.rst +++ b/docs/source/documents/api/agents/drl/pg.rst @@ -5,7 +5,8 @@ PG_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.pg_agent.PG_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ PG_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.pg_agent.PG_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ PG_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.pg_agent.PG_Agent(config, envs, policy, optimizer, scheduler) @@ -141,7 +144,7 @@ PG_Agent :param env_fn: The function of making environments. :param test_episodes: The number of testing episodes. :type test_episodes: int - :return: **scores** - The accumulated scores of these episodes. + :return: The accumulated scores of these episodes. :rtype: list .. raw:: html diff --git a/docs/source/documents/api/agents/drl/ppg.rst b/docs/source/documents/api/agents/drl/ppg.rst index 28a201cbc..edb3255e0 100644 --- a/docs/source/documents/api/agents/drl/ppg.rst +++ b/docs/source/documents/api/agents/drl/ppg.rst @@ -5,7 +5,8 @@ PPG_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.ppg_agent.PPG_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ PPG_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.ppg_agent.PPG_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ PPG_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.ppg_agent.PPG_Agent(config, envs, policy, optimizer, scheduler) @@ -136,12 +139,12 @@ PPG_Agent :type train_steps: int .. py:function:: - xuance.mindspore.agents.policy_gradient.pg_agent.PG_Agent.test(env_fn,test_episodes) + xuance.mindspore.agents.policy_gradient.ppg_agent.PPG_Agent.test(env_fn,test_episodes) :param env_fn: The function of making environments. :param test_episodes: The number of testing episodes. :type test_episodes: int - :return: **scores** - The accumulated scores of these episodes. + :return scores: - The accumulated scores of these episodes. :rtype: list diff --git a/docs/source/documents/api/agents/drl/ppo_clip.rst b/docs/source/documents/api/agents/drl/ppo_clip.rst index b25d891fe..d1e2b470f 100644 --- a/docs/source/documents/api/agents/drl/ppo_clip.rst +++ b/docs/source/documents/api/agents/drl/ppo_clip.rst @@ -5,7 +5,8 @@ PPOCLIP_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.ppoclip_agent.PPOCLIP_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ PPOCLIP_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.ppoclip_agent.PPOCLIP_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ PPOCLIP_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.ppoclip_agent.PPOCLIP_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/ppo_kl.rst b/docs/source/documents/api/agents/drl/ppo_kl.rst index ac2858037..1f7b85610 100644 --- a/docs/source/documents/api/agents/drl/ppo_kl.rst +++ b/docs/source/documents/api/agents/drl/ppo_kl.rst @@ -5,7 +5,8 @@ PPOKL_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.ppokl_agent.PPOKL_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ PPOKL_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.ppokl_agent.PPOKL_Agent(config, envs, policy, optimizer, device) @@ -106,7 +108,8 @@ PPOKL_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.ppokl_agent.PPOKL_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/qrdqn.rst b/docs/source/documents/api/agents/drl/qrdqn.rst index 69f70854b..ed0c1790a 100644 --- a/docs/source/documents/api/agents/drl/qrdqn.rst +++ b/docs/source/documents/api/agents/drl/qrdqn.rst @@ -5,7 +5,8 @@ QRDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.qlearning_family.qrdqn_agent.QRDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ QRDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.qlearning_family.qrdqn_agent.QRDQN_Agent(config, envs, policy, optimizer, device) @@ -109,7 +111,8 @@ QRDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.qlearning_family.qrdqn_agent.QRDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/sac.rst b/docs/source/documents/api/agents/drl/sac.rst index a614475cc..0a6cf089b 100644 --- a/docs/source/documents/api/agents/drl/sac.rst +++ b/docs/source/documents/api/agents/drl/sac.rst @@ -5,7 +5,8 @@ SAC_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.sac_agent.SAC_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ SAC_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.sac_agent.SAC_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ SAC_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.sac_agent.SAC_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/sac_dis.rst b/docs/source/documents/api/agents/drl/sac_dis.rst index 046c44b18..d46c4bd60 100644 --- a/docs/source/documents/api/agents/drl/sac_dis.rst +++ b/docs/source/documents/api/agents/drl/sac_dis.rst @@ -5,7 +5,8 @@ SACDIS_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.sacdis_agent.SACDIS_Agent(config, envs, policy, optimizer, scheduler, device) @@ -56,7 +57,8 @@ SACDIS_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.sacdis_agent.SACDIS_Agent(config, envs, policy, optimizer, device) @@ -105,7 +107,8 @@ SACDIS_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent(config, envs, policy, optimizer, scheduler) @@ -122,7 +125,7 @@ SACDIS_Agent :type scheduler: torch.optim.lr_scheduler._LRScheduler .. py:function:: - xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent(obs) + xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent._action(obs) :param obs: The observation variables. :type obs: np.ndarray @@ -130,13 +133,13 @@ SACDIS_Agent :rtype: np.ndarray .. py:function:: - xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent(train_steps) + xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent.train(train_steps) :param train_steps: The number of steps for training. :type train_steps: int .. py:function:: - xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent(env_fn,test_episodes) + xuance.mindspore.agents.policy_gradient.sacdis_agent.SACDIS_Agent.test(env_fn,test_episodes) :param env_fn: The function of making environments. :param test_episodes: The number of testing episodes. diff --git a/docs/source/documents/api/agents/drl/spdqn.rst b/docs/source/documents/api/agents/drl/spdqn.rst index 7dd6b581a..8e1075519 100644 --- a/docs/source/documents/api/agents/drl/spdqn.rst +++ b/docs/source/documents/api/agents/drl/spdqn.rst @@ -5,7 +5,8 @@ SPDQN_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.spdqn_agent.SPDQN_Agent(config, envs, policy, optimizer, scheduler, device) @@ -68,7 +69,8 @@ SPDQN_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.spdqn_agent.SPDQN_Agent(config, envs, policy, optimizer, device) @@ -127,7 +129,8 @@ SPDQN_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.spdqn_agent.SPDQN_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl/td3.rst b/docs/source/documents/api/agents/drl/td3.rst index 499346bdc..411ae19dd 100644 --- a/docs/source/documents/api/agents/drl/td3.rst +++ b/docs/source/documents/api/agents/drl/td3.rst @@ -5,7 +5,8 @@ TD3_Agent

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.policy_gradient.td3_agent.TD3_Agent(config, envs, policy, optimizer, scheduler, device) @@ -58,7 +59,8 @@ TD3_Agent

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.policy_gradient.td3_agent.TD3_Agent(config, envs, policy, optimizer, scheduler, device) @@ -111,7 +113,8 @@ TD3_Agent

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.policy_gradient.td3_agent.TD3_Agent(config, envs, policy, optimizer, scheduler) diff --git a/docs/source/documents/api/agents/drl_agents.rst b/docs/source/documents/api/agents/drl_agents.rst new file mode 100644 index 000000000..ed3dfa889 --- /dev/null +++ b/docs/source/documents/api/agents/drl_agents.rst @@ -0,0 +1,108 @@ +DRL Agents +=============================== + + +.. toctree:: + :hidden: + + Agent + DQN_Agent + C51_Agent + DDQN_Agent + DuelDQN_Agent + NoisyDQN_Agent + PerDQN_Agent + QRDQN_Agent + PG_Agent + PPG_Agent + PPOCLIP_Agent + PPOCKL_Agent + PDQN_Agent + SPDQN_Agent + MPDQN_Agent + A2C_Agent + SAC_Agent + SACDIS_Agent + DDPG_Agent + TD3_Agent + + +.. list-table:: + :header-rows: 1 + + * - Agent + - PyTorch + - TensorFlow + - MindSpore + * - :doc:`DQN `: Deep Q-Networks + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`C51DQN `: Distributional Reinforcement Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Double DQN `: DQN with Double Q-learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Dueling DQN `: DQN with Dueling network + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Noisy DQN `: DQN with Parameter Space Noise + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PERDQN `: DQN with Prioritized Experience Replay + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QRDQN `: DQN with Quantile Regression + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VPG `: Vanilla Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PPG `: Phasic Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PPO `: Proximal Policy Optimization + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PDQN `: Parameterised DQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SPDQN `: Split PDQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MPDQN `: Multi-pass PDQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`A2C `: Advantage Actor Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SAC `: Soft Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SAC-Dis `: SAC for Discrete Actions + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`DDPG `: Deep Deterministic Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`TD3 `: Twin Delayed DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` \ No newline at end of file diff --git a/docs/source/documents/api/agents/marl/basic_marl_class.rst b/docs/source/documents/api/agents/marl/basic_marl_class.rst index 31b3817eb..44553efc8 100644 --- a/docs/source/documents/api/agents/marl/basic_marl_class.rst +++ b/docs/source/documents/api/agents/marl/basic_marl_class.rst @@ -3,7 +3,8 @@ MARLAgent To create new MARL agents, you should build a class inherit from ``xuance.torch.agents.agents_marl.MARLAgent`` , ``xuance.tensorflow.agents.agents_marl.MARLAgent``, or ``xuance.mindspore.agents.agents_marl.MARLAgent``. -**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agents.agents_marl.MARLAgents(config, envs, policy, memory, learner, device, log_dir, model_dir) @@ -96,7 +97,8 @@ To create new MARL agents, you should build a class inherit from ``xuance.torch.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agents.agents_marl.MARLAgents(config, envs, policy, memory, learner, device, log_dir, model_dir) @@ -189,7 +191,8 @@ To create new MARL agents, you should build a class inherit from ``xuance.torch.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.agents_marl.MARLAgent(envs, policy, memory, learner, device, log_dir, model_dir) diff --git a/docs/source/documents/api/agents/marl/coma.rst b/docs/source/documents/api/agents/marl/coma.rst index bd327987b..b0957f0c3 100644 --- a/docs/source/documents/api/agents/marl/coma.rst +++ b/docs/source/documents/api/agents/marl/coma.rst @@ -5,7 +5,8 @@ COMA_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.coma_agents.COMA_Agents(config, envs, device) @@ -49,7 +50,8 @@ COMA_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.coma_agents.COMA_Agents(config, envs, device) @@ -94,7 +96,8 @@ COMA_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.coma_agents.COMA_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/dcg.rst b/docs/source/documents/api/agents/marl/dcg.rst index 6a2b127c0..f6d1bf282 100644 --- a/docs/source/documents/api/agents/marl/dcg.rst +++ b/docs/source/documents/api/agents/marl/dcg.rst @@ -5,7 +5,8 @@ DCG_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.dcg_agents.DCG_Agents(config, envs, device) @@ -47,7 +48,8 @@ DCG_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.dcg_agents.DCG_Agents(config, envs, device) @@ -91,7 +93,8 @@ DCG_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.dcg_agents.DCG_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/iddpg.rst b/docs/source/documents/api/agents/marl/iddpg.rst index c89b6e50e..d4d366ef9 100644 --- a/docs/source/documents/api/agents/marl/iddpg.rst +++ b/docs/source/documents/api/agents/marl/iddpg.rst @@ -5,7 +5,8 @@ IDDPG_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.iddpg_agents.IDDPG_Agents(config, envs, device) @@ -43,7 +44,8 @@ IDDPG_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.iddpg_agents.IDDPG_Agents(config, envs, device) @@ -81,7 +83,8 @@ IDDPG_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.iddpg_agents.IDDPG_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/ippo.rst b/docs/source/documents/api/agents/marl/ippo.rst index 72aa0154e..1ff0387c8 100644 --- a/docs/source/documents/api/agents/marl/ippo.rst +++ b/docs/source/documents/api/agents/marl/ippo.rst @@ -5,7 +5,8 @@ IPPO_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.ippo_agents.IPPO_Agents(config, envs, device) @@ -65,7 +66,8 @@ IPPO_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.ippo_agents.IPPO_Agents(config, envs, device) @@ -126,7 +128,8 @@ IPPO_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.ippo_agents.IPPO_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/iql.rst b/docs/source/documents/api/agents/marl/iql.rst index a48b34744..094aea504 100644 --- a/docs/source/documents/api/agents/marl/iql.rst +++ b/docs/source/documents/api/agents/marl/iql.rst @@ -5,7 +5,8 @@ IQL_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.iql_agents.IQL_Agents(config, envs, device) @@ -47,7 +48,8 @@ IQL_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.iql_agents.IQL_Agents(config, envs, device) @@ -91,7 +93,8 @@ IQL_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.iql_agents.IQL_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/isac.rst b/docs/source/documents/api/agents/marl/isac.rst index d1bf42db3..29730eb06 100644 --- a/docs/source/documents/api/agents/marl/isac.rst +++ b/docs/source/documents/api/agents/marl/isac.rst @@ -5,7 +5,8 @@ ISAC_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.isac_agents.ISAC_Agents(config, envs, device) @@ -43,7 +44,8 @@ ISAC_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.isac_agents.ISAC_Agents(config, envs, device) @@ -81,7 +83,8 @@ ISAC_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.isac_agents.ISAC_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/maddpg.rst b/docs/source/documents/api/agents/marl/maddpg.rst index 04a29ee9f..5edea5e24 100644 --- a/docs/source/documents/api/agents/marl/maddpg.rst +++ b/docs/source/documents/api/agents/marl/maddpg.rst @@ -5,7 +5,8 @@ MADDPG_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.maddpg_agents.MADDPG_Agents(config, envs, device) @@ -43,7 +44,8 @@ MADDPG_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.maddpg_agents.MADDPG_Agents(config, envs, device) @@ -81,7 +83,8 @@ MADDPG_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.maddpg_agents.MADDPG_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/mappo.rst b/docs/source/documents/api/agents/marl/mappo.rst index 2ed619527..335448043 100644 --- a/docs/source/documents/api/agents/marl/mappo.rst +++ b/docs/source/documents/api/agents/marl/mappo.rst @@ -5,7 +5,8 @@ MAPPO_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.mappo_agents.MAPPO_Agents(config, envs, device) @@ -49,7 +50,8 @@ MAPPO_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.mappo_agents.MAPPO_Agents(config, envs, device) @@ -95,7 +97,8 @@ MAPPO_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.mappo_agents.MAPPO_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/masac.rst b/docs/source/documents/api/agents/marl/masac.rst index ebbdebf62..15e445f46 100644 --- a/docs/source/documents/api/agents/marl/masac.rst +++ b/docs/source/documents/api/agents/marl/masac.rst @@ -5,7 +5,8 @@ MASAC_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.masac_agents.MASAC_Agents(config, envs, device) @@ -43,7 +44,8 @@ MASAC_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.masac_agents.MASAC_Agents(config, envs, device) @@ -81,7 +83,8 @@ MASAC_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.masac_agents.MASAC_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/matd3.rst b/docs/source/documents/api/agents/marl/matd3.rst index 70d9176bf..fe06c5a12 100644 --- a/docs/source/documents/api/agents/marl/matd3.rst +++ b/docs/source/documents/api/agents/marl/matd3.rst @@ -5,7 +5,8 @@ MATD3_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.matd3_agents.MATD3_Agents(config, envs, device) @@ -43,7 +44,8 @@ MATD3_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.matd3_agents.MATD3_Agents(config, envs, device) @@ -81,7 +83,8 @@ MATD3_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.matd3_agents.MATD3_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/mfac.rst b/docs/source/documents/api/agents/marl/mfac.rst index 44634f966..b8c6ab8a5 100644 --- a/docs/source/documents/api/agents/marl/mfac.rst +++ b/docs/source/documents/api/agents/marl/mfac.rst @@ -5,7 +5,8 @@ MFAC

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.mfac_agents.MFAC_Agents(config, envs, device) @@ -61,7 +62,8 @@ MFAC

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.mfac_agents.MFAC_Agents(config, envs, device) @@ -117,7 +119,8 @@ MFAC

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.mfac_agents.MFAC_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/mfq.rst b/docs/source/documents/api/agents/marl/mfq.rst index 999a82211..cfc2fb259 100644 --- a/docs/source/documents/api/agents/marl/mfq.rst +++ b/docs/source/documents/api/agents/marl/mfq.rst @@ -5,7 +5,8 @@ MFQ_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.mfq_agents.MFQ_Agents(config, envs, device) @@ -49,7 +50,8 @@ MFQ_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.mfq_agents.MFQ_Agents(config, envs, device) @@ -95,7 +97,8 @@ MFQ_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agent.mutli_agent_rl.mfq_agents.MFQ_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/qmix.rst b/docs/source/documents/api/agents/marl/qmix.rst index e1956bf52..c94f3fcd8 100644 --- a/docs/source/documents/api/agents/marl/qmix.rst +++ b/docs/source/documents/api/agents/marl/qmix.rst @@ -5,7 +5,8 @@ QMIX_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.qmix_agents.QMIX_Agents(config, envs, device) @@ -47,7 +48,8 @@ QMIX_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.qmix_agents.QMIX_Agents(config, envs, device) @@ -91,7 +93,8 @@ QMIX_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.qmix_agents.QMIX_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/qtran.rst b/docs/source/documents/api/agents/marl/qtran.rst index 59c328d41..0816416f8 100644 --- a/docs/source/documents/api/agents/marl/qtran.rst +++ b/docs/source/documents/api/agents/marl/qtran.rst @@ -5,7 +5,8 @@ QTRAN_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.qtran_agents.QTRAN_Agents(config, envs, device) @@ -31,7 +32,8 @@ QTRAN_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.qtran_agents.QTRAN_Agents(config, envs, device) @@ -75,7 +77,8 @@ QTRAN_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.qtran_agents.QTRAN_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/vdac.rst b/docs/source/documents/api/agents/marl/vdac.rst index d8d5ff382..584658577 100644 --- a/docs/source/documents/api/agents/marl/vdac.rst +++ b/docs/source/documents/api/agents/marl/vdac.rst @@ -5,7 +5,8 @@ VDAC

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.vdac_agents.VDAC_Agents(config, envs) @@ -49,7 +50,8 @@ VDAC

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.vdac_agents.VDAC_Agents(config, envs) @@ -93,20 +95,21 @@ VDAC

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: - xuance.torch.agent.mutli_agent_rl.vdac_agents.VDAC_Agents(config, envs, device) + xuance.mindspore.agent.mutli_agent_rl.vdac_agents.VDAC_Agents(config, envs, device) :param config: Provides hyper parameters. :type config: Namespace :param envs: The vectorized environments. :type envs: xuance.environments.vector_envs.vector_env.VecEnv :param device: Choose CPU or GPU to train the model. - :type device: str, int, torch.device + :type device: str .. py:function:: - xuance.torch.agent.mutli_agent_rl.vdac_agents.VDAC_Agents.act(obs_n, *rnn_hidden, avail_actions=None, state=None, test_mode=False) + xuance.mindspore.agent.mutli_agent_rl.vdac_agents.VDAC_Agents.act(obs_n, *rnn_hidden, avail_actions=None, state=None, test_mode=False) Calculate joint actions for N agents according to the joint observations. @@ -120,11 +123,11 @@ VDAC :type state: np.ndarray :param test_mode: is True for selecting greedy actions, is False for selecting epsilon-greedy actions. :type test_mode: bool - :return: **hidden_state**, **actions_n**, **onehot_actions** - The next hidden states of RNN, the joint actions, and the onehot actions. - :rtype: tuple(np.ndarray, np.ndarray), np.ndarray, np.ndarray + :return: A tuple that includes the next hidden states of RNN, the joint actions, and the onehot actions. + :rtype: tuple .. py:function:: - xuance.torch.agent.mutli_agent_rl.vdac_agents.VDAC_Agents.train(i_step, kwargs) + xuance.mindspore.agent.mutli_agent_rl.vdac_agents.VDAC_Agents.train(i_step, kwargs) Train the multi-agent reinforcement learning model. @@ -132,7 +135,7 @@ VDAC :type i_step: int :param kwargs: The other arguments. :type kwargs: dict - :return: **info_train** - the information of the training process. + :return: The information of the training process. :rtype: dict .. raw:: html diff --git a/docs/source/documents/api/agents/marl/vdn.rst b/docs/source/documents/api/agents/marl/vdn.rst index 044310b13..88bf501eb 100644 --- a/docs/source/documents/api/agents/marl/vdn.rst +++ b/docs/source/documents/api/agents/marl/vdn.rst @@ -5,7 +5,8 @@ VDN_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.vdn_agents.VDN_Agents(config, envs, device) @@ -47,7 +48,8 @@ VDN_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.vdn_agents.VDN_Agents(config, envs, device) @@ -76,7 +78,7 @@ VDN_Agents :rtype: tuple(np.ndarray, np.ndarray), np.ndarray .. py:function:: - xuance.torch.agent.mutli_agent_rl.vdn_agents.VDN_Agents.train(i_step, n_epoch) + xuance.tensorflow.agent.mutli_agent_rl.vdn_agents.VDN_Agents.train(i_step, n_epoch) Train the multi-agent reinforcement learning model. @@ -84,14 +86,15 @@ VDN_Agents :type i_step: int :param n_epoch: Number of training epochs. :type n_epoch: int - :return: **info_train** - the information of the training process. + :return: The information of the training process. :rtype: dict .. raw:: html

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.vdn_agents.VDN_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl/wqmix.rst b/docs/source/documents/api/agents/marl/wqmix.rst index d87d6b115..c2294b9d5 100644 --- a/docs/source/documents/api/agents/marl/wqmix.rst +++ b/docs/source/documents/api/agents/marl/wqmix.rst @@ -5,7 +5,8 @@ WQMIX_Agents

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.agent.mutli_agent_rl.wqmix_agents.WQMIX_Agents(config, envs, device) @@ -47,7 +48,8 @@ WQMIX_Agents

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.agent.mutli_agent_rl.wqmix_agents.WQMIX_Agents(config, envs, device) @@ -91,7 +93,8 @@ WQMIX_Agents

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.agents.mutli_agent_rl.wqmix_agents.WQMIX_Agents(config, envs) diff --git a/docs/source/documents/api/agents/marl_agents.rst b/docs/source/documents/api/agents/marl_agents.rst new file mode 100644 index 000000000..90bafb711 --- /dev/null +++ b/docs/source/documents/api/agents/marl_agents.rst @@ -0,0 +1,102 @@ +MARL agents +=================================== + + +.. toctree:: + :hidden: + + MARLAgents + IQL_Agents + VDN_Agents + QMIX_Agents + WQMIX_Agents + QTRAN_Agents + DCG_Agents + IDDPG_Agents + MADDPG_Agents + ISAC_Agents + MASAC_Agents + IPPO_Agents + MAPPO_Agents + MATD3_Agents + VDAC_Agents + COMA_Agents + MFQ_Agents + MFAC_Agents + + +.. list-table:: + :header-rows: 1 + + * - Multi-Agent + - PyTorch + - TensorFlow + - MindSpore + * - :doc:`IQL `: Independent Q-Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VDN `: Value-Decomposition Networks + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QMIX `: VDN with Q-Mixer + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`WQMIX `: Weighted QMIX + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QTRAN `: Q-Transformation + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`DCG `: Deep Coordination Graph + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`IDDPG `: Independent DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MADDPG `: Multi-Agent DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`ISAC `: Independent SAC + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MASAC `: Multi-Agent SAC + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`IPPO `: Independent PPO + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MAPPO `: Multi-Agent PPO + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MATD3 `: Multi-Agent TD3 + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VDAC `: Value-Decomposition Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`COMA `: Counterfacutal Multi-Agent PG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MFQ `: Mean-Field Q-Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MFAC `: Mean-Field Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` \ No newline at end of file diff --git a/docs/source/documents/api/learners.rst b/docs/source/documents/api/learners.rst index 5a13a1b80..5e3051770 100644 --- a/docs/source/documents/api/learners.rst +++ b/docs/source/documents/api/learners.rst @@ -1,202 +1,11 @@ Learners ====================== -.. toctree:: - :hidden: +.. toctree:: + :maxdepth: 1 - Learner - DQN_Learner - C51_Learner - DDQN_Learner - DuelDQN_Learner - NoisyDQN_Learner - PerDQN_Learner - QRDQN_Learner - PG_Learner - PPG_Learner - PPOCLIP_Learner - PPOCKL_Learner - PDQN_Learner - SPDQN_Learner - MPDQN_Learner - A2C_Learner - SAC_Learner - SACDIS_Learner - DDPG_Learner - TD3_Learner + learners/learner + learners/drl_learner + learners/marl_learner - IQL_Learner - VDN_Learner - QMIX_Learner - WQMIX_Learner - QTRAN_Learner - DCG_Learner - IDDPG_Learner - MADDPG_Learner - ISAC_Learner - MASAC_Learner - IPPO_Learner - MAPPO_Learner - MATD3_Learner - VDAC_Learner - COMA_Learner - MFQ_Learner - MFAC_Learner - -.. list-table:: - :header-rows: 1 - - * - Learner - - PyTorch - - TensorFlow - - MindSpore - * - :doc:`DQN `: Deep Q-Networks - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`C51DQN `: Distributional Reinforcement Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Double DQN `: DQN with Double Q-learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Dueling DQN `: DQN with Dueling network - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`Noisy DQN `: DQN with Parameter Space Noise - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PERDQN `: DQN with Prioritized Experience Replay - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QRDQN `: DQN with Quantile Regression - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VPG `: Vanilla Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PPG `: Phasic Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PPO `: Proximal Policy Optimization - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`PDQN `: Parameterised DQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SPDQN `: Split PDQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MPDQN `: Multi-pass PDQN - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`A2C `: Advantage Actor Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SAC `: Soft Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`SAC-Dis `: SAC for Discrete Actions - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`DDPG `: Deep Deterministic Policy Gradient - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`TD3 `: Twin Delayed DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - -.. list-table:: - :header-rows: 1 - - * - Multi-Agent Learner - - PyTorch - - TensorFlow - - MindSpore - * - :doc:`IQL `: Independent Q-Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VDN `: Value-Decomposition Networks - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QMIX `: VDN with Q-Mixer - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`WQMIX `: Weighted QMIX - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`QTRAN `: Q-Transformation - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`DCG `: Deep Coordination Graph - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`IDDPG `: Independent DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MADDPG `: Multi-Agent DDPG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`ISAC `: Independent SAC - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MASAC `: Multi-Agent SAC - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`IPPO `: Independent PPO - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MAPPO `: Multi-Agent PPO - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MATD3 `: Multi-Agent TD3 - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`VDAC `: Value-Decomposition Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`COMA `: Counterfacutal Multi-Agent PG - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MFQ `: Mean-Field Q-Learning - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - * - :doc:`MFAC `: Mean-Field Actor-Critic - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` - - .. centered:: :math:`\checkmark` \ No newline at end of file diff --git a/docs/source/documents/api/learners/drl/a2c.rst b/docs/source/documents/api/learners/drl/a2c.rst index 00f97dcd7..ea6b7ebe0 100644 --- a/docs/source/documents/api/learners/drl/a2c.rst +++ b/docs/source/documents/api/learners/drl/a2c.rst @@ -5,7 +5,8 @@ A2C_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.a2c_learner.A2C_Learner(policy, optimizer, scheduler, device, model_dir, vf_coef, ent_coef, clip_grad) @@ -45,7 +46,8 @@ A2C_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.a2c_learner.A2C_Learner(policy, optimizer, device, model_dir, vf_coef, ent_coef, clip_grad) @@ -83,7 +85,8 @@ A2C_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.a2c_learner.A2C_Learner(policy, optimizer, scheduler, model_dir, vf_coef, ent_coef, clip_grad, clip_type) diff --git a/docs/source/documents/api/learners/drl/c51.rst b/docs/source/documents/api/learners/drl/c51.rst index c994ae9b9..668cd3672 100644 --- a/docs/source/documents/api/learners/drl/c51.rst +++ b/docs/source/documents/api/learners/drl/c51.rst @@ -5,7 +5,8 @@ C51_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.c51_learner.C51_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,7 +46,8 @@ C51_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.c51_learner.C51_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -83,7 +85,8 @@ C51_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.c51_learner.C51_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/ddpg.rst b/docs/source/documents/api/learners/drl/ddpg.rst index f28880b02..2f05f0bdb 100644 --- a/docs/source/documents/api/learners/drl/ddpg.rst +++ b/docs/source/documents/api/learners/drl/ddpg.rst @@ -5,7 +5,8 @@ DDPG_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.ddpg_learner.DDPG_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ DDPG_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.ddpg_learner.DDPG_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -83,7 +85,8 @@ DDPG_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.ddpg_learner.DDPG_Learner(policy, optimizer, scheduler, model_dir, gamma, tau) diff --git a/docs/source/documents/api/learners/drl/ddqn.rst b/docs/source/documents/api/learners/drl/ddqn.rst index 45ea81bbb..172b87b49 100644 --- a/docs/source/documents/api/learners/drl/ddqn.rst +++ b/docs/source/documents/api/learners/drl/ddqn.rst @@ -5,7 +5,8 @@ DDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.ddqn_learner.DDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,7 +46,8 @@ DDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.ddqn_learner.DDQN_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -83,7 +85,8 @@ DDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.ddqn_learner.DDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/dqn.rst b/docs/source/documents/api/learners/drl/dqn.rst index 9795f1e74..3a9359825 100644 --- a/docs/source/documents/api/learners/drl/dqn.rst +++ b/docs/source/documents/api/learners/drl/dqn.rst @@ -5,7 +5,8 @@ DQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.dqn_learner.DQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,7 +46,8 @@ DQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.dqn_learner.DQN_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -83,7 +85,8 @@ DQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.dqn_learner.DQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/dueldqn.rst b/docs/source/documents/api/learners/drl/dueldqn.rst index 61dbbd082..b5dcd45ec 100644 --- a/docs/source/documents/api/learners/drl/dueldqn.rst +++ b/docs/source/documents/api/learners/drl/dueldqn.rst @@ -5,7 +5,8 @@ DuelDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.dueldqn_learner.DuelDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,7 +46,8 @@ DuelDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.dueldqn_learner.DuelDQN_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -85,7 +87,8 @@ DuelDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.dueldqn_learner.DuelDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/mpdqn.rst b/docs/source/documents/api/learners/drl/mpdqn.rst index 6332f4751..5f6140b3b 100644 --- a/docs/source/documents/api/learners/drl/mpdqn.rst +++ b/docs/source/documents/api/learners/drl/mpdqn.rst @@ -5,7 +5,8 @@ MPDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.mpdqn_learner.MPDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ MPDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.mpdqn_learner.MPDQN_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -83,7 +85,8 @@ MPDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.mpdqn_learner.MPDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, tau) diff --git a/docs/source/documents/api/learners/drl/noisydqn.rst b/docs/source/documents/api/learners/drl/noisydqn.rst index 297ac462c..3de3cee23 100644 --- a/docs/source/documents/api/learners/drl/noisydqn.rst +++ b/docs/source/documents/api/learners/drl/noisydqn.rst @@ -5,7 +5,8 @@ NoisyDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.noisydqn_learner.NoisyDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,13 +46,15 @@ NoisyDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. raw:: html

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.noisydqn_learner.NoisyDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/pdqn.rst b/docs/source/documents/api/learners/drl/pdqn.rst index 60eaf50aa..90f93f1b8 100644 --- a/docs/source/documents/api/learners/drl/pdqn.rst +++ b/docs/source/documents/api/learners/drl/pdqn.rst @@ -5,7 +5,8 @@ PDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.pdqn_learner.PDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ PDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.pdqn_learner.PDQN_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -83,7 +85,8 @@ PDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.pdqn_learner.PDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, tau) diff --git a/docs/source/documents/api/learners/drl/perdqn.rst b/docs/source/documents/api/learners/drl/perdqn.rst index 6b040f22a..1575d472b 100644 --- a/docs/source/documents/api/learners/drl/perdqn.rst +++ b/docs/source/documents/api/learners/drl/perdqn.rst @@ -5,7 +5,8 @@ PerDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.perdqn_learner.PerDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -45,7 +46,8 @@ PerDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.perdqn_learner.PerDQN_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -83,7 +85,8 @@ PerDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.perdqn_learner.PerDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/pg.rst b/docs/source/documents/api/learners/drl/pg.rst index 4790a7be3..226c5d727 100644 --- a/docs/source/documents/api/learners/drl/pg.rst +++ b/docs/source/documents/api/learners/drl/pg.rst @@ -5,7 +5,8 @@ PG_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.pg_learner.PG_Learner(policy, optimizer, scheduler, device, model_dir, ent_coef, clip_grad) @@ -41,7 +42,8 @@ PG_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.pg_learner.PG_Learner(policy, optimizer, device, model_dir, ent_coef, clip_grad) @@ -75,7 +77,8 @@ PG_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.pg_learner.PG_Learner(policy, optimizer, scheduler, model_dir, ent_coef, clip_grad, clip_type) diff --git a/docs/source/documents/api/learners/drl/ppg.rst b/docs/source/documents/api/learners/drl/ppg.rst index ee212d322..ac4438fce 100644 --- a/docs/source/documents/api/learners/drl/ppg.rst +++ b/docs/source/documents/api/learners/drl/ppg.rst @@ -5,7 +5,8 @@ PPG_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.ppg_learner.PPG_Learner(policy, optimizer, scheduler, device, model_dir, ent_coef, clip_range, kl_beta) @@ -79,7 +80,8 @@ PPG_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.ppg_learner.PPG_Learner(policy, optimizer, device, model_dir, ent_coef, clip_range, kl_beta) @@ -151,7 +153,8 @@ PPG_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.ppg_learner.PPG_Learner(policy, optimizer, scheduler, model_dir, ent_coef, clip_range, kl_beta) diff --git a/docs/source/documents/api/learners/drl/ppo_clip.rst b/docs/source/documents/api/learners/drl/ppo_clip.rst index 63c84628a..0845a4bb1 100644 --- a/docs/source/documents/api/learners/drl/ppo_clip.rst +++ b/docs/source/documents/api/learners/drl/ppo_clip.rst @@ -5,7 +5,8 @@ PPOCLIP_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.ppoclip_learner.PPOCLIP_Learner(policy, optimizer, scheduler, device, model_dir, vf_coef, ent_coef, clip_range, clip_grad_norm, use_grad_clip) @@ -53,7 +54,8 @@ PPOCLIP_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.ppoclip_learner.PPOCLIP_Learner(policy, optimizer, device, model_dir, vf_coef, ent_coef, clip_range) @@ -95,7 +97,8 @@ PPOCLIP_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.ppoclip_learner.PPOCLIP_Learner(policy, optimizer, scheduler, model_dir, vf_coef, ent_coef, clip_range) diff --git a/docs/source/documents/api/learners/drl/ppo_kl.rst b/docs/source/documents/api/learners/drl/ppo_kl.rst index 4c9ecbece..e35f18fac 100644 --- a/docs/source/documents/api/learners/drl/ppo_kl.rst +++ b/docs/source/documents/api/learners/drl/ppo_kl.rst @@ -5,7 +5,8 @@ PPOKL_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.ppokl_learner.PPOKL_Learner(policy, optimizer, scheduler, device, model_dir, vf_coef, ent_coef, target_kl) @@ -47,7 +48,8 @@ PPOKL_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.ppokl_learner.PPOKL_Learner(policy, optimizer, device, model_dir, vf_coef, ent_coef, target_kl) @@ -89,7 +91,8 @@ PPOKL_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.ppokl_learner.PPOKL_Learner(policy, optimizer, scheduler, summary_writer, model_dir, vf_coef, ent_coef, clip_range) diff --git a/docs/source/documents/api/learners/drl/qrdqn.rst b/docs/source/documents/api/learners/drl/qrdqn.rst index 8bed2cc5c..a5f882f87 100644 --- a/docs/source/documents/api/learners/drl/qrdqn.rst +++ b/docs/source/documents/api/learners/drl/qrdqn.rst @@ -5,7 +5,8 @@ QRDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.qlearning_family.qrdqn_learner.QRDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -43,7 +44,8 @@ QRDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.qlearning_family.qrdqn_learner.QRDQN_Learner(policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -81,7 +83,8 @@ QRDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.qlearning_family.qrdqn_learner.QRDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/drl/sac.rst b/docs/source/documents/api/learners/drl/sac.rst index d71059725..9629ce66e 100644 --- a/docs/source/documents/api/learners/drl/sac.rst +++ b/docs/source/documents/api/learners/drl/sac.rst @@ -5,7 +5,8 @@ SAC_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.sac_learner.SAC_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ SAC_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.sac_learner.SAC_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -64,7 +66,7 @@ SAC_Learner :type tau: float .. py:function:: - xuance.torch.learners.policy_gradient.sac_learner.SAC_Learner.update(obs_batch, act_batch, rew_batch, next_batch, terminal_batch) + xuance.tensorflow.learners.policy_gradient.sac_learner.SAC_Learner.update(obs_batch, act_batch, rew_batch, next_batch, terminal_batch) :param obs_batch: A batch of observations sampled from experience replay buffer. :type obs_batch: np.ndarray @@ -83,7 +85,8 @@ SAC_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.sac_learner.SAC_Learner(policy, optimizers, schedulers, model_dir, gamma, tau) diff --git a/docs/source/documents/api/learners/drl/sac_dis.rst b/docs/source/documents/api/learners/drl/sac_dis.rst index ae00eb7db..f938ef525 100644 --- a/docs/source/documents/api/learners/drl/sac_dis.rst +++ b/docs/source/documents/api/learners/drl/sac_dis.rst @@ -5,7 +5,8 @@ SACDIS_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.sacdis_learner.SACDIS_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ SACDIS_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.sacdis_learner.SACDIS_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -83,7 +85,8 @@ SACDIS_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.sacdis_learner.SACDIS_Learner(policy, optimizers, schedulers, model_dir, gamma, tau) @@ -102,7 +105,7 @@ SACDIS_Learner :type tau: float .. py:function:: - xuance.torch.learners.policy_gradient.sacdis_learner.SACDIS_Learner.update(obs_batch, act_batch, rew_batch, next_batch, terminal_batch) + xuance.mindspore.learners.policy_gradient.sacdis_learner.SACDIS_Learner.update(obs_batch, act_batch, rew_batch, next_batch, terminal_batch) :param obs_batch: A batch of observations sampled from experience replay buffer. :type obs_batch: np.ndarray diff --git a/docs/source/documents/api/learners/drl/spdqn.rst b/docs/source/documents/api/learners/drl/spdqn.rst index 1ea41420d..66209bab4 100644 --- a/docs/source/documents/api/learners/drl/spdqn.rst +++ b/docs/source/documents/api/learners/drl/spdqn.rst @@ -5,7 +5,8 @@ SPDQN_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.spdqn_learner.SPDQN_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau) @@ -45,7 +46,8 @@ SPDQN_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.spdqn_learner.SPDQN_Learner(policy, optimizer, device, model_dir, gamma, tau) @@ -85,7 +87,8 @@ SPDQN_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.spdqn_learner.SPDQN_Learner(policy, optimizer, scheduler, model_dir, gamma, tau) diff --git a/docs/source/documents/api/learners/drl/td3.rst b/docs/source/documents/api/learners/drl/td3.rst index c46504185..09db111e4 100644 --- a/docs/source/documents/api/learners/drl/td3.rst +++ b/docs/source/documents/api/learners/drl/td3.rst @@ -5,7 +5,8 @@ TD3_Learner

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.policy_gradient.td3_learner.TD3_Learner(policy, optimizer, scheduler, device, model_dir, gamma, tau, delay) @@ -47,7 +48,8 @@ TD3_Learner

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.policy_gradient.td3_learner.TD3_Learner(policy, optimizer, device, model_dir, gamma, tau, delay) @@ -87,7 +89,8 @@ TD3_Learner

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.policy_gradient.td3_learner.TD3_Learner(policy, optimizer, scheduler, model_dir, gamma, tau, delay) diff --git a/docs/source/documents/api/learners/drl_learner.rst b/docs/source/documents/api/learners/drl_learner.rst new file mode 100644 index 000000000..2a2305a29 --- /dev/null +++ b/docs/source/documents/api/learners/drl_learner.rst @@ -0,0 +1,106 @@ +Single-Agent Learner +============================================ + +.. toctree:: + :hidden: + + DQN_Learner + C51_Learner + DDQN_Learner + DuelDQN_Learner + NoisyDQN_Learner + PerDQN_Learner + QRDQN_Learner + PG_Learner + PPG_Learner + PPOCLIP_Learner + PPOCKL_Learner + PDQN_Learner + SPDQN_Learner + MPDQN_Learner + A2C_Learner + SAC_Learner + SACDIS_Learner + DDPG_Learner + TD3_Learner + + +.. list-table:: + :header-rows: 1 + + * - Learner + - PyTorch + - TensorFlow + - MindSpore + * - :doc:`DQN `: Deep Q-Networks + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`C51DQN `: Distributional Reinforcement Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Double DQN `: DQN with Double Q-learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Dueling DQN `: DQN with Dueling network + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`Noisy DQN `: DQN with Parameter Space Noise + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PERDQN `: DQN with Prioritized Experience Replay + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QRDQN `: DQN with Quantile Regression + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VPG `: Vanilla Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PPG `: Phasic Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PPO `: Proximal Policy Optimization + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`PDQN `: Parameterised DQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SPDQN `: Split PDQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MPDQN `: Multi-pass PDQN + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`A2C `: Advantage Actor Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SAC `: Soft Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`SAC-Dis `: SAC for Discrete Actions + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`DDPG `: Deep Deterministic Policy Gradient + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`TD3 `: Twin Delayed DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` \ No newline at end of file diff --git a/docs/source/documents/api/learners/learner.rst b/docs/source/documents/api/learners/learner.rst index 9fdb9e0b3..c0316b172 100644 --- a/docs/source/documents/api/learners/learner.rst +++ b/docs/source/documents/api/learners/learner.rst @@ -1,9 +1,10 @@ -Learner +Basic Learner ======================= To create new learner, you should build a class inherit from ``xuance.torch.learners.learner.Learner`` , ``xuance.tensorflow.learners.learner.Learner``, or ``xuance.mindspore.learners.learner.Learner``. -**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.learner.Learner(policy, optimizer, scheduler=None, device=None, model_dir="./") @@ -45,7 +46,8 @@ To create new learner, you should build a class inherit from ``xuance.torch.lear

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.learner.Learner(policy, optimizer, device=None, model_dir="./") @@ -85,7 +87,8 @@ To create new learner, you should build a class inherit from ``xuance.torch.lear

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.learner.Learner(policy, optimizer, scheduler=None, model_dir="./") diff --git a/docs/source/documents/api/learners/marl/coma.rst b/docs/source/documents/api/learners/marl/coma.rst index 5a5c366bf..bdc2fb270 100644 --- a/docs/source/documents/api/learners/marl/coma.rst +++ b/docs/source/documents/api/learners/marl/coma.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.coma_learner.COMA_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -57,7 +58,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.coma_learner.COMA_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -93,7 +95,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.coma_learner.COMA_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/dcg.rst b/docs/source/documents/api/learners/marl/dcg.rst index f1ea41027..95fb3660c 100644 --- a/docs/source/documents/api/learners/marl/dcg.rst +++ b/docs/source/documents/api/learners/marl/dcg.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.dcg_learner.DCG_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -107,7 +108,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.dcg_learner.DCG_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -195,7 +197,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.dcg_learner.DCG_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/iddpg.rst b/docs/source/documents/api/learners/marl/iddpg.rst index 51beb251e..d574bc605 100644 --- a/docs/source/documents/api/learners/marl/iddpg.rst +++ b/docs/source/documents/api/learners/marl/iddpg.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.iddpg_learner.IDDPG_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -43,7 +44,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.iddpg_learner.IDDPG_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -79,7 +81,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.iddpg_learner.IDDPG_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/ippo.rst b/docs/source/documents/api/learners/marl/ippo.rst index 5ebdfd93a..9acc3314f 100644 --- a/docs/source/documents/api/learners/marl/ippo.rst +++ b/docs/source/documents/api/learners/marl/ippo.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.ippo_learner.IPPO_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma) @@ -61,7 +62,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.ippo_learner.IPPO_Learner(config, policy, optimizer, device, model_dir, gamma) @@ -103,7 +105,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.ippo_learner.IPPO_Learner(config, policy, optimizer, scheduler, model_dir, gamma) diff --git a/docs/source/documents/api/learners/marl/iql.rst b/docs/source/documents/api/learners/marl/iql.rst index 36823a264..2a17ecc91 100644 --- a/docs/source/documents/api/learners/marl/iql.rst +++ b/docs/source/documents/api/learners/marl/iql.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.iql_learner.IQL_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -53,7 +54,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.iql_learner.IQL_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -87,7 +89,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.iql_learner.IQL_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/isac.rst b/docs/source/documents/api/learners/marl/isac.rst index 1899117dc..5af32b8a3 100644 --- a/docs/source/documents/api/learners/marl/isac.rst +++ b/docs/source/documents/api/learners/marl/isac.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.isac_learner.ISAC_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -43,7 +44,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.isac_learner.ISAC_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -77,7 +79,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.isac_learner.ISAC_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/maddpg.rst b/docs/source/documents/api/learners/marl/maddpg.rst index bf8196810..52a78cc52 100644 --- a/docs/source/documents/api/learners/marl/maddpg.rst +++ b/docs/source/documents/api/learners/marl/maddpg.rst @@ -1,13 +1,17 @@ MADDPG_Learner ===================================== -xxxxxx. +Multi-Agent Deep Deterministic Policy Gradient + +Paper link: +https://proceedings.neurips.cc/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf .. raw:: html

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -30,12 +34,12 @@ xxxxxx. :type sync_frequency: int .. py:function:: - xuance.torch.learners.multi_agent_rl.isac_learner.ISAC_Learner.update(sample) + xuance.torch.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner.update(sample) - xxxxxx. + Updates the policy parameters via backpropagation. - :param sample: xxxxxx. - :type sample: xxxxxx + :param sample: The sampled data. + :type sample: dict :return: The infomation of the training. :rtype: dict @@ -43,7 +47,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -64,7 +69,7 @@ xxxxxx. :type sync_frequency: int .. py:function:: - xuance.tensorflow.learners.multi_agent_rl.isac_learner.ISAC_Learner.update(sample) + xuance.tensorflow.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner.update(sample) xxxxxx. @@ -77,7 +82,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) @@ -98,7 +104,7 @@ xxxxxx. :type sync_frequency: int .. py:function:: - xuance.mindspore.learners.multi_agent_rl.isac_learner.ISAC_Learner.update(sample) + xuance.mindspore.learners.multi_agent_rl.maddpg_learner.MADDPG_Learner.update(sample) xxxxxx. diff --git a/docs/source/documents/api/learners/marl/mappo.rst b/docs/source/documents/api/learners/marl/mappo.rst index 5864b6ead..aa6eff0ce 100644 --- a/docs/source/documents/api/learners/marl/mappo.rst +++ b/docs/source/documents/api/learners/marl/mappo.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma) @@ -61,7 +62,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner(config, policy, optimizer, device, model_dir, gamma) @@ -103,7 +105,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner(config, policy, optimizer, scheduler, model_dir, gamma) @@ -124,20 +127,20 @@ xxxxxx. .. py:function:: xuance.mindspore.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner.lr_decay(i_step) - xxxxxx. + Update the parameters of the model via backpropagation. - :param i_step: xxxxxx. - :type i_step: xxxxxx + :param i_step: The current training step. + :type i_step: int :return: Current learning rate. :rtype: float .. py:function:: - xuance.torch.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner.update(sample) + xuance.mindspore.learners.multi_agent_rl.mappo_learner.MAPPO_Clip_Learner.update(sample) - xxxxxx. + Update the paramters of the model via backpropagation. - :param sample: xxxxxx. - :type sample: xxxxxx + :param sample: The sampled data. + :type sample: dict :return: The infomation of the training. :rtype: dict diff --git a/docs/source/documents/api/learners/marl/masac.rst b/docs/source/documents/api/learners/marl/masac.rst index 0b2ee5e5f..d08e6a9d8 100644 --- a/docs/source/documents/api/learners/marl/masac.rst +++ b/docs/source/documents/api/learners/marl/masac.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.masac_learner.MASAC_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -43,7 +44,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.masac_learner.MASAC_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -77,7 +79,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.masac_learner.MASAC_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/matd3.rst b/docs/source/documents/api/learners/marl/matd3.rst index ee72bd6da..88ca83ea8 100644 --- a/docs/source/documents/api/learners/marl/matd3.rst +++ b/docs/source/documents/api/learners/marl/matd3.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.matd3_learner.MATD3_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency, delay) @@ -45,7 +46,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.matd3_learner.MATD3_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency, delay) @@ -81,7 +83,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.matd3_learner.MATD3_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency, delay) diff --git a/docs/source/documents/api/learners/marl/mfac.rst b/docs/source/documents/api/learners/marl/mfac.rst index 7fe81beb3..325d2fd65 100644 --- a/docs/source/documents/api/learners/marl/mfac.rst +++ b/docs/source/documents/api/learners/marl/mfac.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.mfac_learner.MFAC_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma) @@ -41,7 +42,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.mfac_learner.MFAC_Learner(config, policy, optimizer, device, model_dir, gamma) @@ -73,7 +75,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.mfac_learner.MFAC_Learner(config, policy, optimizer, scheduler, model_dir, gamma) diff --git a/docs/source/documents/api/learners/marl/mfq.rst b/docs/source/documents/api/learners/marl/mfq.rst index 75a325fc6..6755dda6f 100644 --- a/docs/source/documents/api/learners/marl/mfq.rst +++ b/docs/source/documents/api/learners/marl/mfq.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.mfq_learner.MFQ_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -52,7 +53,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.mfq_learner.MFQ_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -96,7 +98,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.mfq_learner.MFQ_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/qmix.rst b/docs/source/documents/api/learners/marl/qmix.rst index 39ebe2055..dbaad23fb 100644 --- a/docs/source/documents/api/learners/marl/qmix.rst +++ b/docs/source/documents/api/learners/marl/qmix.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.qmix_learner.QMIX_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -53,7 +54,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.qmix_learner.QMIX_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -87,7 +89,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.qmix_learner.QMIX_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/qtran.rst b/docs/source/documents/api/learners/marl/qtran.rst index 1475e2924..fcb032af2 100644 --- a/docs/source/documents/api/learners/marl/qtran.rst +++ b/docs/source/documents/api/learners/marl/qtran.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.qtran_learner.QTRAN_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -43,7 +44,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.qtran_learner.QTRAN_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -77,7 +79,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.qtran_learner.QTRAN_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/vdac.rst b/docs/source/documents/api/learners/marl/vdac.rst index 2cad3b166..5938109c1 100644 --- a/docs/source/documents/api/learners/marl/vdac.rst +++ b/docs/source/documents/api/learners/marl/vdac.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.vdac_learner.VDAC_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma) @@ -61,7 +62,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.vdac_learner.VDAC_Learner(config, policy, optimizer, device, model_dir, gamma) @@ -103,7 +105,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.vdac_learner.VDAC_Learner(config, policy, optimizer, scheduler, model_dir, gamma) diff --git a/docs/source/documents/api/learners/marl/vdn.rst b/docs/source/documents/api/learners/marl/vdn.rst index 980bb3ba8..2f6c69ecb 100644 --- a/docs/source/documents/api/learners/marl/vdn.rst +++ b/docs/source/documents/api/learners/marl/vdn.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.vdn_learner.VDN_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -53,7 +54,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.vdn_learner.VDN_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -87,7 +89,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.vdn_learner.VDN_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl/wqmix.rst b/docs/source/documents/api/learners/marl/wqmix.rst index 7c7d42386..faf43eb9d 100644 --- a/docs/source/documents/api/learners/marl/wqmix.rst +++ b/docs/source/documents/api/learners/marl/wqmix.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.learners.multi_agent_rl.wqmix_learner.WQMIX_Learner(config, policy, optimizer, scheduler, device, model_dir, gamma, sync_frequency) @@ -53,7 +54,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.learners.multi_agent_rl.wqmix_learner.WQMIX_Learner(config, policy, optimizer, device, model_dir, gamma, sync_frequency) @@ -87,7 +89,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.learners.multi_agent_rl.wqmix_learner.WQMIX_Learner(config, policy, optimizer, scheduler, model_dir, gamma, sync_frequency) diff --git a/docs/source/documents/api/learners/marl_learner.rst b/docs/source/documents/api/learners/marl_learner.rst new file mode 100644 index 000000000..03ce10a3b --- /dev/null +++ b/docs/source/documents/api/learners/marl_learner.rst @@ -0,0 +1,100 @@ +Multi-Agent Learner +============================================= + +.. toctree:: + :hidden: + + IQL_Learner + VDN_Learner + QMIX_Learner + WQMIX_Learner + QTRAN_Learner + DCG_Learner + IDDPG_Learner + MADDPG_Learner + ISAC_Learner + MASAC_Learner + IPPO_Learner + MAPPO_Learner + MATD3_Learner + VDAC_Learner + COMA_Learner + MFQ_Learner + MFAC_Learner + + +.. list-table:: + :header-rows: 1 + + * - Multi-Agent Learner + - PyTorch + - TensorFlow + - MindSpore + * - :doc:`IQL `: Independent Q-Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VDN `: Value-Decomposition Networks + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QMIX `: VDN with Q-Mixer + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`WQMIX `: Weighted QMIX + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`QTRAN `: Q-Transformation + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`DCG `: Deep Coordination Graph + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`IDDPG `: Independent DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MADDPG `: Multi-Agent DDPG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`ISAC `: Independent SAC + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MASAC `: Multi-Agent SAC + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`IPPO `: Independent PPO + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MAPPO `: Multi-Agent PPO + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MATD3 `: Multi-Agent TD3 + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`VDAC `: Value-Decomposition Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`COMA `: Counterfacutal Multi-Agent PG + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MFQ `: Mean-Field Q-Learning + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + * - :doc:`MFAC `: Mean-Field Actor-Critic + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` + - .. centered:: :math:`\checkmark` \ No newline at end of file diff --git a/docs/source/documents/api/policies/deterministic.rst b/docs/source/documents/api/policies/deterministic.rst index e7b5597ad..09686ea10 100644 --- a/docs/source/documents/api/policies/deterministic.rst +++ b/docs/source/documents/api/policies/deterministic.rst @@ -2351,7 +2351,7 @@ MindSpore Get trainable parameters. .. py:function:: - xuance.mindspore.policies.deterministic.DuelQnetwork.copy_target(observation) + xuance.mindspore.policies.deterministic.QRDQN_Network.copy_target(observation) Copies the parameters from the evaluation representation, target representation, evaluation Q-head, and target Q-head. diff --git a/docs/source/documents/api/policies/deterministic_marl.rst b/docs/source/documents/api/policies/deterministic_marl.rst index 0beb19106..37bbc9c4c 100644 --- a/docs/source/documents/api/policies/deterministic_marl.rst +++ b/docs/source/documents/api/policies/deterministic_marl.rst @@ -1746,7 +1746,7 @@ MindSpore :rtype: tuple .. py:function:: - xuance.mindspore.policies.deterministic_marl.Weighted_MixingQnetwork.copy_target() + xuance.mindspore.policies.deterministic_marl.Qtran_MixingQnetwork.copy_target() Synchronize the target networks. diff --git a/docs/source/documents/api/runners/runner_basic.rst b/docs/source/documents/api/runners/runner_basic.rst index addef5a82..17a38c86c 100644 --- a/docs/source/documents/api/runners/runner_basic.rst +++ b/docs/source/documents/api/runners/runner_basic.rst @@ -9,7 +9,8 @@ Additionally, there's a run method that can be customzied by users.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_basic.Runner_Base(args) @@ -26,7 +27,8 @@ Additionally, there's a run method that can be customzied by users.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.runners.runner_basic.Runner_Base(args) @@ -56,7 +58,8 @@ Additionally, there's a run method that can be customzied by users.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.runners.runner_basic.Runner_Base(args) diff --git a/docs/source/documents/api/runners/runner_drl.rst b/docs/source/documents/api/runners/runner_drl.rst index c70814c27..528814142 100644 --- a/docs/source/documents/api/runners/runner_drl.rst +++ b/docs/source/documents/api/runners/runner_drl.rst @@ -7,7 +7,8 @@ xxxxxx.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_drl.Runner_DRL(args) @@ -31,7 +32,8 @@ xxxxxx.

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.runners.runner_drl.Runner_DRL(args) @@ -56,7 +58,8 @@ xxxxxx.

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.runners.runner_drl.Runner_DRL(args) diff --git a/docs/source/documents/api/runners/runner_football.rst b/docs/source/documents/api/runners/runner_football.rst index cb9c67f28..f97014ebe 100644 --- a/docs/source/documents/api/runners/runner_football.rst +++ b/docs/source/documents/api/runners/runner_football.rst @@ -7,7 +7,8 @@ A generic framework for training and testing reinforcement learning in the Footb

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_football.Football_Runner(args) diff --git a/docs/source/documents/api/runners/runner_magent.rst b/docs/source/documents/api/runners/runner_magent.rst index 686697bdd..7dbf492c3 100644 --- a/docs/source/documents/api/runners/runner_magent.rst +++ b/docs/source/documents/api/runners/runner_magent.rst @@ -8,7 +8,8 @@ MAgent_Runner makes some extensions to it.

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_magent.MAgent_Runner(args) diff --git a/docs/source/documents/api/runners/runner_pettingzoo.rst b/docs/source/documents/api/runners/runner_pettingzoo.rst index 51dcd8d8d..287a0d394 100644 --- a/docs/source/documents/api/runners/runner_pettingzoo.rst +++ b/docs/source/documents/api/runners/runner_pettingzoo.rst @@ -8,7 +8,8 @@ This script define a training and testing pipeline for multi-agent reinforcement

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_pettingzoo.Pettingzoo_Runner(args) @@ -129,7 +130,8 @@ This script define a training and testing pipeline for multi-agent reinforcement

-**TensorFlow:** +TensorFlow +------------------------------------------ .. py:class:: xuance.tensorflow.runners.runner_pettingzoo.Pettingzoo_Runner(args) @@ -250,7 +252,8 @@ This script define a training and testing pipeline for multi-agent reinforcement

-**MindSpore:** +MindSpore +------------------------------------------ .. py:class:: xuance.mindspore.runners.runner_pettingzoo.Pettingzoo_Runner(args) diff --git a/docs/source/documents/api/runners/runner_sc2.rst b/docs/source/documents/api/runners/runner_sc2.rst index 1f1422865..8880ce4fd 100644 --- a/docs/source/documents/api/runners/runner_sc2.rst +++ b/docs/source/documents/api/runners/runner_sc2.rst @@ -8,7 +8,8 @@ This part constructs a reinforcement learning framework for training and testing

-**PyTorch:** +PyTorch +------------------------------------------ .. py:class:: xuance.torch.runners.runner_sc2.SC2_Runner(args)