Batch ensemble ddpg #1633

runjerry · 2024-03-28T07:37:19Z

Update actor_network, critic_network, and ddpg_algorithm to work with batch_ensemble layers.

…and update unittests

Haichao-Zhang · 2024-03-30T00:43:11Z

alf/algorithms/ddpg_algorithm.py

@@ -139,6 +148,17 @@ def __init__(self,
                gradient dqda element-wise between ``[-dqda_clipping, dqda_clipping]``.
                Does not perform clipping if ``dqda_clipping == 0``.
            action_l2 (float): weight of squared action l2-norm on actor loss.
+            use_batch_ensemble (bool): whether to use BatchEnsemble FC and Conv2D


Ideally, we might should make these batch ensemble related parameters transparent to the ddpg_algorithm? Basically, the ddpg_algorithm should not use batch_ensemble related parameters in the ideal case.

That's a good point. Currently ddpg needs the use_batch_ensemble to do some post processing when forwarding critic networks during training. Let me think it over if there might be some alternative methods to work around.

emailweixu · 2024-04-03T16:18:39Z

alf/algorithms/ddpg_algorithm.py

@@ -281,14 +318,39 @@ def _update_random_action(spec, noisy_action):
        if self._rollout_random_action > 0:
            nest.map_structure(_update_random_action, self._action_spec,
                               pred_step.output)
-        return pred_step
+
+        if self.need_full_rollout_state():


We want the algorithm use the same ensemble_id during an entire episode. This means that it should store ensembled_id in state and use the same ensemble_id to call actor_network

Oh yes, good point, I think that is the reason why I had to tweak the ddpg_algorithm_test to pass the toy unittest. Updated.

runjerry added 2 commits March 27, 2024 22:10

make actor and critic network incorporate with batchensemble options …

1c14fc5

…and update unittests

update ddpg_algorithm to work with batchensemble and fix unittests

ad5f40e

runjerry force-pushed the batch_ensemble_ddpg branch from 8930ae0 to ad5f40e Compare March 28, 2024 07:48

runjerry added 2 commits March 29, 2024 14:18

support full_rollout_state and target_update_period in ddpg algorithm

dae6012

add input_preprocessors_ctor option to ActorNetwork

d0d82d4

runjerry marked this pull request as ready for review March 29, 2024 22:02

runjerry requested review from emailweixu and Haichao-Zhang March 29, 2024 22:02

Haichao-Zhang reviewed Mar 30, 2024

View reviewed changes

emailweixu reviewed Apr 3, 2024

View reviewed changes

address code reviews to use same ensemble_ids for ddpg during rollout

8d31c76

runjerry requested review from emailweixu and Haichao-Zhang August 2, 2024 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch ensemble ddpg #1633

Batch ensemble ddpg #1633

runjerry commented Mar 28, 2024

Haichao-Zhang Mar 30, 2024

runjerry Mar 30, 2024

emailweixu Apr 3, 2024

runjerry Apr 3, 2024

Batch ensemble ddpg #1633

Are you sure you want to change the base?

Batch ensemble ddpg #1633

Conversation

runjerry commented Mar 28, 2024

Haichao-Zhang Mar 30, 2024

Choose a reason for hiding this comment

runjerry Mar 30, 2024

Choose a reason for hiding this comment

emailweixu Apr 3, 2024

Choose a reason for hiding this comment

runjerry Apr 3, 2024

Choose a reason for hiding this comment