Skip to content

Commit

Permalink
'coma_learner_x(#2)'
Browse files Browse the repository at this point in the history
  • Loading branch information
baijinqiu committed Dec 25, 2023
1 parent 5723366 commit 4cd9bc4
Showing 1 changed file with 31 additions and 25 deletions.
56 changes: 31 additions & 25 deletions docs/source/documents/api/learners/marl/coma.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
COMA_Learner
=====================================

xxxxxx.
The implementation of a COMA (Counterfactual Multi-Agent Policy Gradients) learner.
This algorithm is used for training cooperative multi-agent systems..

.. raw:: html

Expand Down Expand Up @@ -33,25 +34,28 @@ PyTorch
.. py:function::
xuance.torch.learners.multi_agent_rl.coma_learner.COMA_Learner.update(sample, epsilon)

xxxxxx.
Update the COMA learner based on the provided sample.

:param sample: xxxxxx.
:type sample: xxxxxx
:param epsilon: xxxxxx.
:type epsilon: xxxxxx
:return: The infomation of the training.
:param sample: A dictionary containing the states, observations, actions, One-hot encoded actions,
returns obtained from the environment, Binary mask indicating active agents.
:type sample: dict
:param epsilon: Exploration parameter for the policy.
:type epsilon: float
:return: The information of the training.
:rtype: dict

.. py:function::
xuance.torch.learners.multi_agent_rl.coma_learner.COMA_Learner.update_recurrent(sample, epsilon)

xxxxxx.
Update the COMA learner using a recurrent version of the algorithm.

:param sample: xxxxxx.
:type sample: xxxxxx
:param epsilon: xxxxxx.
:type epsilon: xxxxxx
:return: The infomation of the training.
:param sample: A dictionary containing the states, observations, actions, One-hot encoded actions,
returns obtained from the environment, available actions for each agent,
binary mask indicating filled time steps.
:type sample: dict
:param epsilon: Exploration parameter for the policy.
:type epsilon: float
:return: The information of the training.
:rtype: dict

.. raw:: html
Expand Down Expand Up @@ -82,13 +86,14 @@ TensorFlow
.. py:function::
xuance.tensorflow.learners.multi_agent_rl.coma_learner.COMA_Learner.update(sample, epsilon)

xxxxxx.
Update the COMA learner using the provided sample.

:param sample: xxxxxx.
:type sample: xxxxxx
:param epsilon: xxxxxx.
:type epsilon: xxxxxx
:return: The infomation of the training.
:param sample: A dictionary containing the states, observations, actions, One-hot encoded actions,
returns obtained from the environment, binary mask indicating filled time steps.
:type sample: dict
:param epsilon: Exploration parameter for the policy.
:type epsilon: float
:return: The information of the training.
:rtype: dict

.. raw:: html
Expand Down Expand Up @@ -119,13 +124,14 @@ MindSpore
.. py:function::
xuance.mindspore.learners.multi_agent_rl.coma_learner.COMA_Learner.update(sample, epsilon)

xxxxxx.
Update the COMA learner using the provided sample..

:param sample: xxxxxx.
:type sample: xxxxxx
:param epsilon: xxxxxx.
:type epsilon: xxxxxx
:return: The infomation of the training.
:param sample: A dictionary containing the states, observations, actions, One-hot encoded actions,
returns obtained from the environment, binary mask indicating filled time steps.
:type sample: dict
:param epsilon: Exploration parameter for the policy.
:type epsilon: float
:return: The information of the training.
:rtype: dict

.. raw:: html
Expand Down

0 comments on commit 4cd9bc4

Please sign in to comment.