Replies: 2 comments
-
Weights are saturatingAs training proceeds in boa, VsPatch weights for winning units increasingly become saturated (all 1) and corresponding sensitivity to different states goes away. In principle, when VsPatch learns to exactly predict DA, then weights stop changing, but:
Try adding the RLRate sigmoid factor as a starting point to prevent saturation of most active ones. Not engaging other unitsPreviously we were getting all neurons learning in an undifferentiated fashion. Now with transition to CaSpkD-based recv signal, it is too kWTA. Go back to GeIntNorm and use TrgAvg to initialize GiBase so there are different set points and get progressive engagement of neurons across population.. |
Beta Was this translation helpful? Give feedback.
-
This is all working much better now: see #302 for updates. |
Beta Was this translation helpful? Give feedback.
-
The VSPatch learning has never worked very well in PVLV (#305). This discussion is to record the issues and potential fixes, in context of broader logic about what it is doing.
VSPatch is the "PVi" primary value inhibition from the original version of PVLV: it inhibits DA firing at the time of the US (PV) when that US is expected. It receives input from OFC / ACC areas that should contain CS-activated information, including dynamically updated timing states, that should enable prediction of when and at what magnitude the US will occur.
It is the "P" in RPE (reward prediction error), and is the essential element in the classical Rescorla-Wagner and TD (temporal differences) learning rules, that makes DA firing proportional to "surprise" in reward outcomes, so that DA-driven learning is sensitive to changes and not stable expected outcomes.
Biologically, it corresponds to striosome (Patch) neurons in the ventral striatum that directly shunt DA firing (Joel & Weiner) and project into the LHb to drive DA dips.
Timing
Bio-logically, VSPatch needs to be active just prior to onset of US-driven inputs, so it is in time to suppress their effects. In TD, it is the prediction from t-1.
From a code perspective, having it active in the trial prior to US onset makes the entire PV computation possible at the start of the trial when the US occurs. If we didn't do this, then VSPatch activity would presumably take a while to ramp up, meaning we'd have to delay the PV computation..
Empirically, it seems that the OFC / ACC PTp layers do have sufficient info to drive VSpatch predictions (can get reasonable learning of US t vs. t-1). But they are using delayed information, like CT -- if we push the VSPatch computation to t-1, could also use super layer inputs to give more current state info.
These inputs also have implications for whether VSPatch activity is contingent on CS gating: PTp is contingent, super is not. Probably having some of each is useful.
T-1 learning
To do T-1 properly, we need to have everything based on t-1 states, which introduces some complications in terms of keeping that state info around. Can just use SpkPrv and have that use the GeIntNorm value for recv. Current impl only did t-1 on sender but that doesn't work. Use a basic 3-factor learning rule:
The adaptive threshold also uses the same rule (except for the Sact), and must also be based on t-1 values.
Extinction
todo.
Beta Was this translation helpful? Give feedback.
All reactions