PVLV VSPatch learning #306

rcoreilly · 2023-08-31T21:45:19Z

rcoreilly
Aug 31, 2023
Maintainer

The VSPatch learning has never worked very well in PVLV (#305). This discussion is to record the issues and potential fixes, in context of broader logic about what it is doing.

VSPatch is the "PVi" primary value inhibition from the original version of PVLV: it inhibits DA firing at the time of the US (PV) when that US is expected. It receives input from OFC / ACC areas that should contain CS-activated information, including dynamically updated timing states, that should enable prediction of when and at what magnitude the US will occur.

It is the "P" in RPE (reward prediction error), and is the essential element in the classical Rescorla-Wagner and TD (temporal differences) learning rules, that makes DA firing proportional to "surprise" in reward outcomes, so that DA-driven learning is sensitive to changes and not stable expected outcomes.

Biologically, it corresponds to striosome (Patch) neurons in the ventral striatum that directly shunt DA firing (Joel & Weiner) and project into the LHb to drive DA dips.

Timing

Bio-logically, VSPatch needs to be active just prior to onset of US-driven inputs, so it is in time to suppress their effects. In TD, it is the prediction from t-1.

From a code perspective, having it active in the trial prior to US onset makes the entire PV computation possible at the start of the trial when the US occurs. If we didn't do this, then VSPatch activity would presumably take a while to ramp up, meaning we'd have to delay the PV computation..

Empirically, it seems that the OFC / ACC PTp layers do have sufficient info to drive VSpatch predictions (can get reasonable learning of US t vs. t-1). But they are using delayed information, like CT -- if we push the VSPatch computation to t-1, could also use super layer inputs to give more current state info.

These inputs also have implications for whether VSPatch activity is contingent on CS gating: PTp is contingent, super is not. Probably having some of each is useful.

T-1 learning

To do T-1 properly, we need to have everything based on t-1 states, which introduces some complications in terms of keeping that state info around. Can just use SpkPrv and have that use the GeIntNorm value for recv. Current impl only did t-1 on sender but that doesn't work. Use a basic 3-factor learning rule:

dWt = DA(t) * Ract(t-1) * Sact(t-1)

The adaptive threshold also uses the same rule (except for the Sact), and must also be based on t-1 values.

Extinction

todo.

rcoreilly · 2023-09-05T20:26:05Z

rcoreilly
Sep 5, 2023
Maintainer Author

Weights are saturating

As training proceeds in boa, VsPatch weights for winning units increasingly become saturated (all 1) and corresponding sensitivity to different states goes away.

In principle, when VsPatch learns to exactly predict DA, then weights stop changing, but:

we don't want this to happen too quickly -- need laggy DA to train up rest of the system
it isn't happening due to lack of engagement of other neurons -- see next issue.

Try adding the RLRate sigmoid factor as a starting point to prevent saturation of most active ones.

Not engaging other units

Previously we were getting all neurons learning in an undifferentiated fashion. Now with transition to CaSpkD-based recv signal, it is too kWTA. Go back to GeIntNorm and use TrgAvg to initialize GiBase so there are different set points and get progressive engagement of neurons across population..

0 replies

rcoreilly · 2023-09-07T17:56:25Z

rcoreilly
Sep 7, 2023
Maintainer Author

This is all working much better now: see #302 for updates.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVLV VSPatch learning #306

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

PVLV VSPatch learning #306

rcoreilly Aug 31, 2023 Maintainer

Timing

T-1 learning

Extinction

Replies: 2 comments

rcoreilly Sep 5, 2023 Maintainer Author

Weights are saturating

Not engaging other units

rcoreilly Sep 7, 2023 Maintainer Author

rcoreilly
Aug 31, 2023
Maintainer

rcoreilly
Sep 5, 2023
Maintainer Author

rcoreilly
Sep 7, 2023
Maintainer Author