Save and load SP,TM is not the same #874

Qubitza · 2020-08-25T12:38:43Z

Hi together,

I think there is a bug in the saving and loading process of the TM.

After saving and loading a TM using the writeToString and loadFromString methods:

loaded_tm = TemporalMemory()
loaded_tm.loadFromString( current_tm.writeToString() )

I recognized that the loaded TM behaves differently (in processing further data) compared to the not saved one.

I checked the binding tests and found that the method testNupicTemporalMemorySavingToString in temporal_memory_test.py is marked with a skip annotation.

However, I only get the mentioned rapidjson assertion when saving a TM before calling tm.reset(). After resetting the TM's prediction, the assertion is not raised.

Unfortunately, the loaded TM seems to be different, as the following call returns False:

str(current_tm) == str(loaded_tm) # => False

Same thing happens with saving the TM to a file:

current_tm.saveToFile('tm.tmp')

loaded_tm = TemporalMemory()
loaded_tm.loadFromFile('tm.tmp')

str(current_tm) == str(loaded_tm) # => False

Can you confirm the issue?
Is there a workaround?

Thank you in advance.

I'm currently on commit 4818064, and thus 4 commits behind the master (which seem not to address this issue). Using Python 3.7.7 64-bit on Windows.

The text was updated successfully, but these errors were encountered:

dkeeney · 2020-08-25T15:11:34Z

recognized that the loaded TM behaves differently (in processing further data) compared to the not saved one.

Oh, good catch. I thought we had that all cleaned up but I guess we did not. I will take a look at that today.
In the mean time, take a look at the "ExecuteCommand" operation and see if that could be a better way to capture your Connections as a JSON string.

dkeeney · 2020-08-25T15:20:19Z

Oh, sorry, that last sentence was for a different issue. But I will look at this today.

Qubitza · 2020-08-26T12:47:00Z

Thanks for looking into it.

Manually comparing the string of both TMs, one can only see a difference in Synapses pruned.
However, this counter seems not to be relevant.

> str(current_tm)
Temporal Memory Connections:
    Inputs (3185) ~> Outputs (32768) via Segments (5899)
    Segments on Cell Min/Mean/Max 0 / 0.180023 / 2
    Potential Synapses on Segment Min/Mean/Max 20 / 33.6776 / 64
    Connected Synapses on Segment Min/Mean/Max 11 / 32.733 / 64
    Synapses Dead (0%) Saturated (0.774937%)
    Synapses pruned (9.91192%) Segments pruned (0%)

> str(loaded_tm)
Temporal Memory Connections:
    Inputs (5015) ~> Outputs (32768) via Segments (5899)
    Segments on Cell Min/Mean/Max 0 / 0.180023 / 2
    Potential Synapses on Segment Min/Mean/Max 20 / 33.6776 / 64
    Connected Synapses on Segment Min/Mean/Max 11 / 32.733 / 64
    Synapses Dead (0%) Saturated (0.774937%)
    Synapses pruned (0%) Segments pruned (0%)

But a simple string comparison is not useful anyway. As the == operator seems not to be defined in the wrapper file, and I was not sure wether pybind11 automatically binds these operators, I added the following lines:

py_HTM.def("__eq__", [](TemporalMemory &self, TemporalMemory &other){ return self == other; });
py_HTM.def("__ne__", [](TemporalMemory &self, TemporalMemory &other){ return self != other; });

Unfortunately, the comparison current_tm == loaded_tm still returns false. Seems like any internal field or component is not properly (de)serialized.

dkeeney · 2020-08-26T17:35:35Z

The good news is that there are C++ tests for TM serialization and they do pass.
The setup is a little different between the two but basically initializing a TM, running a few cycles so there are connections, save, load, and then compare.

So I suspect the problem is in the equals operator when called from Python. I will concentrate on that area. Note that 'pruned' is not compared in the C++ == operator. So it must be something else.

Zbysekz · 2020-08-27T09:00:08Z

Just note: see "Inputs" count, besides "synapses pruned" this is also different

Temporal Memory Connections:
    Inputs (3185) ~> Outputs (32768) via Segments (5899)

Temporal Memory Connections:
    Inputs (5015) ~> Outputs (32768) via Segments (5899)

Edit: Maybe you just mixed things?

Zbysekz · 2020-08-27T11:04:03Z

If found out, that
RuntimeError: rapidjson internal assertion failure: type == kStringType is raised when calling tm.writeToString() only when inputs are > approx. 200.
This function uses the rapidjson. So this is one issue.

The second is comparing tm objects on python side.

dkeeney · 2020-08-27T15:19:00Z

This is going to be hard to find. I chased a problem with that exception in rapidjson once before. It was a different scenario but I do remember that it was very hard to find. If you can get it to occur in C++ (without python) you can put it in the debugger and walk through the function calls and see what it is that causes it.

dkeeney · 2020-08-27T15:59:25Z

My current thinking is that this is a bug in the Cereal package. This uses RapidJSON to generate the JSON version of the output but Cereal might not be calling it correctly. Have not yet found the problem.

Thanh-Binh · 2020-08-27T18:14:26Z

@dkeeney maybe very logic, because We had the same problem by testing MNIST with SP some months ago! @breznak said he solved problem but I do know how he did!

pmenn36 · 2020-09-02T05:35:46Z

I'm seeing the same problem with the spatial pooler:

tmpfile = '/tmp/sp.bin'

current_sp.saveToFile(tmpfile)

loaded_sp = SpatialPooler()
loaded_sp.loadFromFile(tmpfile)

str(current_sp) == str(loaded_sp)  # => False

Also telling: if I run an HTM model with all the same components on all the same data, but on run 1 I keep everything in memory and on run 2 I serialize/deserialize the spatial pooler every 1000 data points, the two runs will have different results.

pmenn36 · 2020-09-02T08:19:24Z

I took a look at SpatialPooler.hpp and the following fields aren't serialized:

minActiveDutyCycles_
boostedOverlaps_
version_

Looking at TemporalMemory.hpp, it does seem that all the fields are serialized. It's worth noting that if I perform the 2-run experiment detailed above with the temporal memory (instead of the spatial pooler), I get the same results from both HTMs. I'm only analyzing anomaly output though, so that doesn't necessarily prove that the serialization/deserialization process doesn't change the TM, just that it seems to have less of an adverse effect than serializing/deserializing the SP.

breznak · 2020-09-02T15:45:47Z

Thank you @N3rv0uS for a detailed bug description! 👍

My current thinking is that this is a bug in the Cereal package. This uses RapidJSON to generate the JSON version of the output but Cereal might not be calling it correctly.

So do you guys think this is "only" for py (bindings) versions? @N3rv0uS do you replicate the same issue for cpp tm,sp?

We had the same problem by testing MNIST with SP some months ago! @breznak said he solved problem but I do know how he did!

my recent serialization, comparison changes were merged in #827 which addressed Connections (more detailed op==(), more fields serialized).
But I still think I had a weird problem with SP serialization (it might have been somewhere around Synapses too) in #799 , which is not covered in the current tests. And I haven't resolved that correctly yet.
Btw, can you check with seed=1 (=fixed), and seed=0 (random)?

#874 (comment) nice spot @pmenn36 👍 those fields are only internal/re-computed, so we have to double-check if deserialization does actually recompute them correctly? (Or of it's easier, safer just to binary dump everything)

breznak · 2020-09-02T16:03:06Z

I took a look at SpatialPooler.hpp and the following fields aren't serialized:
minActiveDutyCycles_
boostedOverlaps_
version_

Those fields are only internal/re-computed, so we have to double-check if deserialization does actually recompute them correctly?

aaand.. actually not!
boostedOverlaps_.resize(numColumns_);
https://github.com/htm-community/htm.core/blob/master/src/htm/algorithms/SpatialPooler.hpp#L346

~~Guess those fields are not so empheral and should be serialized~~ ..the boostedOverlaps_ is recomputed on each compute() call. So it depends what you do after the serialization I guess. If you do one more compute(), all good. But if you directly do, say equals(), you get this err.

Seems to me the only solution is to dump the field to the file as well (?)

Thanh-Binh · 2020-09-02T18:36:08Z

.. additionally. I found that the parameter „seed“ is not saved...

breznak · 2020-09-02T19:42:50Z

It's worth noting that if I perform the 2-run experiment detailed above with the temporal memory (instead of the spatial pooler), I get the same results from both HTMs. I'm only analyzing anomaly output though, so that doesn't necessarily prove that the serialization/deserialization process doesn't change the TM

SP,TM compare now should also check Connections with detailed ==, which is essentially everything. So the TM seems it might be ok. The HTMs would differ from the bottom layer (SP) if used.

breznak · 2020-09-02T20:14:17Z

.. additionally. I found that the parameter „seed“ is not saved...

seed is only used in the constructor, for rng_, we then save completely the whole rng.

@N3rv0uS

version_ boostedOverlaps_ minActiveDutyCycles_ thanks @N3rv0uS, @pmenn36 and everyone who investigated the issue. For issue #874

breznak · 2020-09-02T20:54:05Z

Guys, can you retest, please? The SP part fixes were just merged in. PR for extended tests always welcome!

pmenn36 · 2020-09-02T22:04:16Z

Guys, can you retest, please? The SP part fixes were just merged in. PR for extended tests always welcome!

Yep! Currently reinstalling from source so I can test this

pmenn36 · 2020-09-02T22:39:48Z

@breznak both the string-equality check and my two-run test setup pass now! Thanks for fixing this

breznak · 2020-09-03T09:10:04Z

Glad to hear! Thanks to everybody investigating the issue 👍

string-equality check and my two-run test setup

a possible TODO to add to our tests, I like the example with restarted run from deserialized state.

Following #821 also affects serialization for the "whole HTM" (ScalarEncoder) and #799 .
Closing this for now, reopen if needed.

breznak added bug Something isn't working serializable TM labels Sep 2, 2020

breznak added the high label Sep 2, 2020

breznak changed the title ~~Save and load a TM~~ Save and load SP,TM is not reproducible Sep 2, 2020

breznak added a commit that referenced this issue Sep 2, 2020

SP serialization fix: added missing fields

bf9c40e

version_ boostedOverlaps_ minActiveDutyCycles_ thanks @N3rv0uS, @pmenn36 and everyone who investigated the issue. For issue #874

breznak mentioned this issue Sep 2, 2020

SP serialization fix: added missing fields #882

Merged

breznak changed the title ~~Save and load SP,TM is not reproducible~~ Save and load SP,TM is not the same Sep 2, 2020

breznak closed this as completed Sep 3, 2020

breznak mentioned this issue Sep 3, 2020

Enable pickle tests #821

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and load SP,TM is not the same #874

Save and load SP,TM is not the same #874

Qubitza commented Aug 25, 2020 •

edited

Loading

dkeeney commented Aug 25, 2020

dkeeney commented Aug 25, 2020

Qubitza commented Aug 26, 2020 •

edited

Loading

dkeeney commented Aug 26, 2020

Zbysekz commented Aug 27, 2020 •

edited

Loading

Zbysekz commented Aug 27, 2020

dkeeney commented Aug 27, 2020

dkeeney commented Aug 27, 2020

Thanh-Binh commented Aug 27, 2020

pmenn36 commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

Thanh-Binh commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

breznak commented Sep 3, 2020

Save and load SP,TM is not the same #874

Save and load SP,TM is not the same #874

Comments

Qubitza commented Aug 25, 2020 • edited Loading

dkeeney commented Aug 25, 2020

dkeeney commented Aug 25, 2020

Qubitza commented Aug 26, 2020 • edited Loading

dkeeney commented Aug 26, 2020

Zbysekz commented Aug 27, 2020 • edited Loading

Zbysekz commented Aug 27, 2020

dkeeney commented Aug 27, 2020

dkeeney commented Aug 27, 2020

Thanh-Binh commented Aug 27, 2020

pmenn36 commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

Thanh-Binh commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

breznak commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

pmenn36 commented Sep 2, 2020

breznak commented Sep 3, 2020

Qubitza commented Aug 25, 2020 •

edited

Loading

Qubitza commented Aug 26, 2020 •

edited

Loading

Zbysekz commented Aug 27, 2020 •

edited

Loading