More realistic inhibition mechanisms #64

rcoreilly · 2022-10-12T07:49:40Z

rcoreilly
Oct 12, 2022
Maintainer

There is now a pretty solid consensus about the differences between three major classes of inhibitory interneurons, and their functional properties, e.g., Cardin18.

PV: fast-spiking basket cells that target the cell bodies of excitatory neurons and coexpress the calcium-binding protein parvalbumin (PV). These are the "first responders", and are also rapidly depressing -- they provide quick control of activity, responding to FF new input and FB drive, allowing the first spiking pyramidal neurons to quickly shut off other competitors.
SST: low-threshold spiking cells that target the distal dendrites of excitatory neurons and coexpress the peptide somatostatin (SST). These require repetitive, facilitating, afferent input to be activated, and may regulate the dendritic integration of synaptic inputs over a longer timescale. The current dependence of FB inhib on the slower integrated Act variable, which only comes on after the first spike (in order to compute the ISI), may reflect the SST dynamics.
VIP: sparse dendrite-targeting cells that synapse onto SST interneurons and the dendrites of pyramidal neurons, and coexpress vasoactive intestinal peptide (VIP). VIP interneurons are a subset of the larger 5HT3aR-expressing interneuron class. These can provide disinhibition of SST inhibition.

So the current FFFB inhibition dynamics capture some of these dynamics, but it would be good to revisit this space again and explore more explicit implementation of these PV / SST dynamics. Using Act is not a good idea -- that is really just for display purposes. Much better to use spiking directly.

rcoreilly · 2022-10-23T06:18:49Z

rcoreilly
Oct 23, 2022
Maintainer Author

Probably want to remove Inhib.Self -- based only on Act and not as relevant for spiking vs. rate code models -- functions much like AHP and now we have that working quite robustly vs. what was in Leabra. Was used in BG but seeing that FF inhib is more relevant there.

0 replies

rcoreilly · 2022-11-06T07:57:44Z

rcoreilly
Nov 6, 2022
Maintainer Author

PV is FF, SST is FB

Cardin18 does not state this key point, but it is pretty clear from El-BoustaniSur14:

On the basis of the strong response similarity between pyramidal and PV+ neurons and the short delay of PV+ responses to small stimuli33, we modelled both these cell types as receiving direct feedforward inputs from a parallel two-dimensional layer depicting thalamorecipient layer 4 neurons (Fig. 7a) to model retinotopic receptive fields

In contrast, the weak and delayed responses of SOM+ neurons to sparse noise, as well as their lack of response during ChR2-stimulation of layer 4 excitatory cells in vitro34, suggested that SOM+ neurons do not receive feedforward sensory input but exclusively receive their excitatory drive from local excitatory neurons.

This fits very well with the behavior of the current FFFB code in axon, where FF is fast and based on Ge, while FB is slow and based on Act:

OFBi is the old FB component based on Act, and OFFi is the FF component based on Ge. Note how strongly lagged the Act response is relative to the Layer1_Spikes that drive FB -- the slowly facilitating SST (SOM) dynamics should produce something like this.

3 replies

rcoreilly Nov 6, 2022
Maintainer Author

Original FFFB in Axon is mostly about FF

e.g., ra25 can learn successfully with FB = 0

As shown in the above figure, FF is larger --- FB is very slow and mainly just shuts stuff down in the latter portion..

Key to have FF element in input layer based on static Ext input! that is missing in pure spikes-based aggregation

qemqemqem Nov 7, 2022
Maintainer

What do FF and FB stand for?

rcoreilly Nov 7, 2022
Maintainer Author

FF = feedforward, FB = feedback

rcoreilly · 2022-11-07T12:11:13Z

rcoreilly
Nov 7, 2022
Maintainer Author

Converging on good params.. that largely match OG values..

It is remarkable that despite using a purely spike-based computation with very different equations for computing the inhibition, and after flailing around in parameter space for a couple of days with almost everything I tried really not working very well at all, I ended up converging on parameters that actually work on ra25 and objrec (still a bit more to go there on objrec), which end up producing values that closely match the inhibition computed by the original Gi values from FFFB (OGi in the figures)!

This is the cycle-by-cycle plot of the new spiking inhibitory conductance (SGi) and the OGi for the axon/examples/inhib test case:

Layer 1, which gets bottom-up and top-down:

Layer 2, which gets from Layer 1 and projects back to it:

The lessons I take from this are:

The precise nature of the inhibition dynamics in these spiking nets is incredibly important for learning to work at all. Most basic parameterizations of FF and FB spikes did not work at all, based on the principles available in the existing literature. The patterns of spiking often looked reasonable to the naked eye, but they just didn't work for learning. And the fact that two "independent" attempts to find what works converged on the same place suggests that there may be just this one little spot of parameter space that works (admittedly, there are many confounds here -- it is hard to know if someone else might come up with something different).
I got incredibly lucky with the OG(i) FFFB equations, based on the Leabra model, which happened to land in this small window of parameter space that worked. A major motivation for taking on this whole exploration is that I had never actually explored this space much at all, and FFFB had just basically worked from the start, so it wasn't clear to me going into this whether there would be a huge range of params that work well (or hopefully even better than FFFB).
The fact that OG FFFB works so well is, in retrospect, now understandable. Actually it wasn't really luck, but rather using a powerful "cheat", that makes FFFB work: as noted above, most of the signal is coming from the FF component in FFFB, which is directly computed from the Ge net input that drives the excitatory neurons. This Ge reflects NMDA currents in addition to direct glutamate from incoming spikes, and is integrated over time within each neuron -- it is not "public" information. However, it is exactly what drives the Vm increase to cause spiking, and the rate of spiking is roughly linear in Ge -- it is the key "hidden" variable needed to control spiking rates.
The new FS FFFB mechanism incorporates feedback (same-pool / layer spikes) into the PV fast spiking signal (this was one key move that started to get things working -- purely FF doesn't work), and this allows the FS signal to look a lot more like Ge, by capturing the effects of the NMDA currents on spiking. For reasons I don't yet understand, integrating PV FS with a tau of 6 msec does a better job of reflecting the OG Ge-based computation, even though Ge is integrated with a tau of 5 msec, and it works better in practice. Also, the facilitating dynamics (documented in the biology) used for computing the slow SS factor, with the best-working params, end up matching the average Act activation value used for the OG FB term, for reasons that are not at all obvious from just looking at the equations, which seem quite different.
So, in summary, the new FS FFFB mechanism provides a more computationally efficient, purely spike-based, fully "legal" mechanism for computing inhibition (in a way that actual PV and SST neurons could very plausibly do), replacing the OG FFFB mechanism and driving strikingly similar overall Gi values. Interestingly, the new FS mechanism has more immediate and longer-term FB dynamics in terms of how the equations work, which could potentially have some implications (the values are not identical after all).

4 replies

qemqemqem Nov 7, 2022
Maintainer

What was the metric you used for evaluating hyperparameters? How many sets of hyperparameter values did you evaluate?

rcoreilly Nov 7, 2022
Maintainer Author

ra25 learning, then objrec learning. interactively just tried a ton of params, monitoring layer activity and other stats -- most stuff just failed to learn so it wasn't really about "optimizing" but just getting off the floor. now that it is off the floor I'm optimizing..

qemqemqem Nov 7, 2022
Maintainer

Were you targeting time to convergence for learning, or accuracy after some number of trials?

rcoreilly Nov 7, 2022
Maintainer Author

yeah firstzero, lastzero on ra25, objrec is final accuracy after 50 epochs

rcoreilly · 2022-11-08T13:30:49Z

rcoreilly
Nov 8, 2022
Maintainer Author

Parameter updates for new FS-FFFB

Input Act.Clamp.Ge = 1.5 instead of 1
Output Act.Clamp.Ge = 0.8 instead of 0.6
Gi values may need to be adjusted: in ra25 Hidden Gi had to be increased to 1.05 from 1.0 (1.1 also works), while Output which was at .9 needed to be decreased to .75. In general Gi values may need to be more precisely adjusted.

2 replies

rcoreilly Nov 9, 2022
Maintainer Author

CT / PT (higher NMDA) requires significantly higher Gi

CT and PT layers have more NMDA that used to be automatically included in Gi but now need to be specifically accounted for. Also the CtxtGe must be compensated for. Basically increase Gi to about 2.0 for copy up to 2.8 for more NMDA in long integration.

deep_move (copy, NMDA .15 Tau 100): CT.GeGain = 1, Gi=2.0 instead of 1.4.
deep_fsa (integration, NMDA .25 Tau 200): the same CT.GeGain of 0.8 was best, but with Gi = 2.2 instead of 1.4.
deep_music (long integration, NMDA .3 Tau 300): CT.GeGain = 1, Gi=2.8 instead of 1.4.

Also, in general it is looking like FB=1 is best, and ra25 is an outlier in doing better with 0.5 -- updated default.

rcoreilly Nov 14, 2022
Maintainer Author

Larger networks require FB > 1

For objrec (smallish but bigger than ra25), FB=2 seems to work well with V4 Gi = 1.0 (same as before) and IT Gi = 1.1

For LVis (much larger), FB=4 or 6 (still testing) is needed.

rcoreilly · 2022-11-09T07:07:51Z

rcoreilly
Nov 9, 2022
Maintainer Author

Differential contributions of FF (PV) and SS to Vm (soma) vs. VmDend -- not useful

Just tested this idea in ra25 and it doesn't work at all even with small deviations in soma vs dend. Also, neurons are reasonably strongly electrically coupled across dendrite and soma levels involved, so even though there may be temporary differences, they are likely to homogenize over time.

Here's the impl in case want to come back to it: 361d659

It is more likely important that SST+ population can be selectively targeted by VIP cells, and perhaps this has a transient effect on toggling NMDA maintenance in dendrites, but this is something we need to capture separately rather than baking this into the core Vm updates.

0 replies

rcoreilly · 2022-11-28T01:41:59Z

rcoreilly
Nov 28, 2022
Maintainer Author

NMDA gets stronger over course of learning, needs some kind of negative feedback

As the weights get stronger and more specifically tuned, the VmDend and NMDA gets stronger. This also leads to a positive feedback loop because the NMDA builds slowly over the theta cycle and tends to drive a positive delta (later more active than earlier) if not otherwise checked. In prior FFFB, NMDA was included in Ge and thus it auto-compensated.

The most logical mechanism is the SST directly affecting VmDend as FB activity gets stronger -- but unlike the prior attempt above, the idea here would be to only add a tiny bit of extra SS inhib to VmDend, with a threshold..

2 replies

rcoreilly Nov 28, 2022
Maintainer Author

Adding a simple multiplicative factor of SSGi inhibition to VmDend works well to control this problem, and is a great fit to the biology!

This is the objrec model comparing 3 * SSGi (black & blue) vs without (red & green) -- the Gnmda current is controlled and performance is sustained over time -- it still deteriorates slightly so there is still a bit more work to be done here, but much better.

Here's ra25 without SSGi (each line is a run, 10 runs:

and with 1 * SSGi:

rcoreilly Nov 30, 2022
Maintainer Author

With zero-sum (SubMean = 1) on the DWts (Prjn.Learn.Trace.SubMean = 1) there is now no further deterioration -- instead it continues to learn slowly over time, as shown in the figure:

This is comparing against previous best runs, with 861 in black being with the old FFFB inhibition, showing some deterioration back then as well, and 1061 was without the SSGi or any form of zero-sum SubMean. The final performance is about .075 PctErr for the latest best model (1140) with SSGi = 3 and SubMean = 1 throughout, vs about .1 best case for the others, so this does actually represent roughly a 25% reduction in error in the end.

Interestingly, starting out with SubMean = 0 and then turning it to 1 as early as epoch = 10 results in faster initial learning but significantly worse final performance -- the initial "corruption" of the weights happens quickly and has lasting effects:

qemqemqem · 2022-11-30T18:32:06Z

qemqemqem
Nov 30, 2022
Maintainer

Oh this is great! If I understand correctly, this fixes the problem where weights are unlearned in ra25? I'm not sure what these parameters do exactly. Could you give a quick explanation of why the unlearning was happening and why this change fixes it? Andrew

…

On Tue, Nov 29, 2022 at 10:13 PM rcoreilly ***@***.***> wrote: With zero-sum (SubMean = 1) on the DWts (Prjn.Learn.Trace.SubMean = 1) there is now no further deterioration -- instead it continues to learn slowly over time, as shown in the figure: [image: fig_v1 6 10_500epc_sm1_ssgidend3_vs_prvbest] <https://user-images.githubusercontent.com/9605819/204720221-07c05422-b288-473c-a73e-f8718a21ba69.png> This is comparing against previous best runs, with 861 in black being with the old FFFB inhibition, showing some deterioration back then as well, and 1061 was without the SSGi or any form of zero-sum SubMean. The final performance is about .075 PctErr for the latest best model (1140) with SSGi = 3 and SubMean = 1 throughout, vs about .1 best case for the others, so this does actually represent roughly a 25% reduction in error in the end. Interestingly, starting out with SubMean = 0 and then turning it to 1 as early as epoch = 10 results in faster initial learning but significantly *worse* final performance -- the initial "corruption" of the weights happens quickly and has lasting effects: [image: fig_v1 6 11_500epc_sm1_ssgidend3_vs_sm1_at_epc10] <https://user-images.githubusercontent.com/9605819/204721329-d17715e4-4067-4183-b299-71c3ca10baf3.png> — Reply to this email directly, view it on GitHub <#64 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC6ND3CK27OCG6VXIPTJV5LWK3WADANCNFSM6AAAAAARDAHVSU> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More realistic inhibition mechanisms #64

{{title}}

Replies: 9 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

More realistic inhibition mechanisms #64

rcoreilly Oct 12, 2022 Maintainer

Replies: 9 comments · 11 replies

rcoreilly Oct 23, 2022 Maintainer Author

rcoreilly Nov 6, 2022 Maintainer Author

PV is FF, SST is FB

rcoreilly Nov 6, 2022 Maintainer Author

Original FFFB in Axon is mostly about FF

qemqemqem Nov 7, 2022 Maintainer

rcoreilly Nov 7, 2022 Maintainer Author

rcoreilly Nov 7, 2022 Maintainer Author

Converging on good params.. that largely match OG values..

qemqemqem Nov 7, 2022 Maintainer

rcoreilly Nov 7, 2022 Maintainer Author

qemqemqem Nov 7, 2022 Maintainer

rcoreilly Nov 7, 2022 Maintainer Author

rcoreilly Nov 8, 2022 Maintainer Author

Parameter updates for new FS-FFFB

rcoreilly Nov 9, 2022 Maintainer Author

CT / PT (higher NMDA) requires significantly higher Gi

rcoreilly Nov 14, 2022 Maintainer Author

Larger networks require FB > 1

rcoreilly Nov 9, 2022 Maintainer Author

Differential contributions of FF (PV) and SS to Vm (soma) vs. VmDend -- not useful

rcoreilly Nov 28, 2022 Maintainer Author

NMDA gets stronger over course of learning, needs some kind of negative feedback

rcoreilly Nov 28, 2022 Maintainer Author

rcoreilly Nov 30, 2022 Maintainer Author

qemqemqem Nov 30, 2022 Maintainer

rcoreilly
Oct 12, 2022
Maintainer

Replies: 9 comments 11 replies

rcoreilly
Oct 23, 2022
Maintainer Author

rcoreilly
Nov 6, 2022
Maintainer Author

rcoreilly Nov 6, 2022
Maintainer Author

qemqemqem Nov 7, 2022
Maintainer

rcoreilly Nov 7, 2022
Maintainer Author

rcoreilly
Nov 7, 2022
Maintainer Author

qemqemqem Nov 7, 2022
Maintainer

rcoreilly Nov 7, 2022
Maintainer Author

qemqemqem Nov 7, 2022
Maintainer

rcoreilly Nov 7, 2022
Maintainer Author

rcoreilly
Nov 8, 2022
Maintainer Author

rcoreilly Nov 9, 2022
Maintainer Author

rcoreilly Nov 14, 2022
Maintainer Author

rcoreilly
Nov 9, 2022
Maintainer Author

rcoreilly
Nov 28, 2022
Maintainer Author

rcoreilly Nov 28, 2022
Maintainer Author

rcoreilly Nov 30, 2022
Maintainer Author

qemqemqem
Nov 30, 2022
Maintainer