Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load SP,TM is not the same #874

Closed
Qubitza opened this issue Aug 25, 2020 · 20 comments
Closed

Save and load SP,TM is not the same #874

Qubitza opened this issue Aug 25, 2020 · 20 comments
Labels
bug Something isn't working high serializable TM

Comments

@Qubitza
Copy link

Qubitza commented Aug 25, 2020

Hi together,

I think there is a bug in the saving and loading process of the TM.

After saving and loading a TM using the writeToString and loadFromString methods:

loaded_tm = TemporalMemory()
loaded_tm.loadFromString( current_tm.writeToString() )

I recognized that the loaded TM behaves differently (in processing further data) compared to the not saved one.

I checked the binding tests and found that the method testNupicTemporalMemorySavingToString in temporal_memory_test.py is marked with a skip annotation.

However, I only get the mentioned rapidjson assertion when saving a TM before calling tm.reset(). After resetting the TM's prediction, the assertion is not raised.

Unfortunately, the loaded TM seems to be different, as the following call returns False:

str(current_tm) == str(loaded_tm) # => False

Same thing happens with saving the TM to a file:

current_tm.saveToFile('tm.tmp')

loaded_tm = TemporalMemory()
loaded_tm.loadFromFile('tm.tmp')

str(current_tm) == str(loaded_tm) # => False

Can you confirm the issue?
Is there a workaround?

Thank you in advance.

I'm currently on commit 4818064, and thus 4 commits behind the master (which seem not to address this issue). Using Python 3.7.7 64-bit on Windows.

@dkeeney
Copy link

dkeeney commented Aug 25, 2020

recognized that the loaded TM behaves differently (in processing further data) compared to the not saved one.

Oh, good catch. I thought we had that all cleaned up but I guess we did not. I will take a look at that today.
In the mean time, take a look at the "ExecuteCommand" operation and see if that could be a better way to capture your Connections as a JSON string.

@dkeeney
Copy link

dkeeney commented Aug 25, 2020

Oh, sorry, that last sentence was for a different issue. But I will look at this today.

@Qubitza
Copy link
Author

Qubitza commented Aug 26, 2020

Thanks for looking into it.

Manually comparing the string of both TMs, one can only see a difference in Synapses pruned.
However, this counter seems not to be relevant.

> str(current_tm)
Temporal Memory Connections:
    Inputs (3185) ~> Outputs (32768) via Segments (5899)
    Segments on Cell Min/Mean/Max 0 / 0.180023 / 2
    Potential Synapses on Segment Min/Mean/Max 20 / 33.6776 / 64
    Connected Synapses on Segment Min/Mean/Max 11 / 32.733 / 64
    Synapses Dead (0%) Saturated (0.774937%)
    Synapses pruned (9.91192%) Segments pruned (0%)
> str(loaded_tm)
Temporal Memory Connections:
    Inputs (5015) ~> Outputs (32768) via Segments (5899)
    Segments on Cell Min/Mean/Max 0 / 0.180023 / 2
    Potential Synapses on Segment Min/Mean/Max 20 / 33.6776 / 64
    Connected Synapses on Segment Min/Mean/Max 11 / 32.733 / 64
    Synapses Dead (0%) Saturated (0.774937%)
    Synapses pruned (0%) Segments pruned (0%)

But a simple string comparison is not useful anyway. As the == operator seems not to be defined in the wrapper file, and I was not sure wether pybind11 automatically binds these operators, I added the following lines:

py_HTM.def("__eq__", [](TemporalMemory &self, TemporalMemory &other){ return self == other; });
py_HTM.def("__ne__", [](TemporalMemory &self, TemporalMemory &other){ return self != other; });

Unfortunately, the comparison current_tm == loaded_tm still returns false. Seems like any internal field or component is not properly (de)serialized.

@dkeeney
Copy link

dkeeney commented Aug 26, 2020

The good news is that there are C++ tests for TM serialization and they do pass.
The setup is a little different between the two but basically initializing a TM, running a few cycles so there are connections, save, load, and then compare.

So I suspect the problem is in the equals operator when called from Python. I will concentrate on that area. Note that 'pruned' is not compared in the C++ == operator. So it must be something else.

@Zbysekz
Copy link

Zbysekz commented Aug 27, 2020

Just note: see "Inputs" count, besides "synapses pruned" this is also different

Temporal Memory Connections:
    Inputs (3185) ~> Outputs (32768) via Segments (5899)
Temporal Memory Connections:
    Inputs (5015) ~> Outputs (32768) via Segments (5899)

Edit: Maybe you just mixed things?

@Zbysekz
Copy link

Zbysekz commented Aug 27, 2020

If found out, that
RuntimeError: rapidjson internal assertion failure: type == kStringType is raised when calling tm.writeToString() only when inputs are > approx. 200.
This function uses the rapidjson. So this is one issue.

The second is comparing tm objects on python side.

@dkeeney
Copy link

dkeeney commented Aug 27, 2020

This is going to be hard to find. I chased a problem with that exception in rapidjson once before. It was a different scenario but I do remember that it was very hard to find. If you can get it to occur in C++ (without python) you can put it in the debugger and walk through the function calls and see what it is that causes it.

@dkeeney
Copy link

dkeeney commented Aug 27, 2020

My current thinking is that this is a bug in the Cereal package. This uses RapidJSON to generate the JSON version of the output but Cereal might not be calling it correctly. Have not yet found the problem.

@Thanh-Binh
Copy link

@dkeeney maybe very logic, because We had the same problem by testing MNIST with SP some months ago! @breznak said he solved problem but I do know how he did!

@pmenn36
Copy link

pmenn36 commented Sep 2, 2020

I'm seeing the same problem with the spatial pooler:

tmpfile = '/tmp/sp.bin'

current_sp.saveToFile(tmpfile)

loaded_sp = SpatialPooler()
loaded_sp.loadFromFile(tmpfile)

str(current_sp) == str(loaded_sp)  # => False

Also telling: if I run an HTM model with all the same components on all the same data, but on run 1 I keep everything in memory and on run 2 I serialize/deserialize the spatial pooler every 1000 data points, the two runs will have different results.

@pmenn36
Copy link

pmenn36 commented Sep 2, 2020

I took a look at SpatialPooler.hpp and the following fields aren't serialized:

  • minActiveDutyCycles_
  • boostedOverlaps_
  • version_

Looking at TemporalMemory.hpp, it does seem that all the fields are serialized. It's worth noting that if I perform the 2-run experiment detailed above with the temporal memory (instead of the spatial pooler), I get the same results from both HTMs. I'm only analyzing anomaly output though, so that doesn't necessarily prove that the serialization/deserialization process doesn't change the TM, just that it seems to have less of an adverse effect than serializing/deserializing the SP.

@breznak breznak added bug Something isn't working serializable TM labels Sep 2, 2020
@breznak
Copy link
Member

breznak commented Sep 2, 2020

Thank you @N3rv0uS for a detailed bug description! 👍

My current thinking is that this is a bug in the Cereal package. This uses RapidJSON to generate the JSON version of the output but Cereal might not be calling it correctly.

So do you guys think this is "only" for py (bindings) versions? @N3rv0uS do you replicate the same issue for cpp tm,sp?

We had the same problem by testing MNIST with SP some months ago! @breznak said he solved problem but I do know how he did!

my recent serialization, comparison changes were merged in #827 which addressed Connections (more detailed op==(), more fields serialized).
But I still think I had a weird problem with SP serialization (it might have been somewhere around Synapses too) in #799 , which is not covered in the current tests. And I haven't resolved that correctly yet.
Btw, can you check with seed=1 (=fixed), and seed=0 (random)?

#874 (comment) nice spot @pmenn36 👍 those fields are only internal/re-computed, so we have to double-check if deserialization does actually recompute them correctly? (Or of it's easier, safer just to binary dump everything)

@breznak breznak added the high label Sep 2, 2020
@breznak
Copy link
Member

breznak commented Sep 2, 2020

I took a look at SpatialPooler.hpp and the following fields aren't serialized:
minActiveDutyCycles_
boostedOverlaps_
version_

Those fields are only internal/re-computed, so we have to double-check if deserialization does actually recompute them correctly?

aaand.. actually not!
boostedOverlaps_.resize(numColumns_);
https://github.com/htm-community/htm.core/blob/master/src/htm/algorithms/SpatialPooler.hpp#L346

Guess those fields are not so empheral and should be serialized ..the boostedOverlaps_ is recomputed on each compute() call. So it depends what you do after the serialization I guess. If you do one more compute(), all good. But if you directly do, say equals(), you get this err.

Seems to me the only solution is to dump the field to the file as well (?)

@Thanh-Binh
Copy link

.. additionally. I found that the parameter „seed“ is not saved...

@breznak breznak changed the title Save and load a TM Save and load SP,TM is not reproducible Sep 2, 2020
@breznak
Copy link
Member

breznak commented Sep 2, 2020

It's worth noting that if I perform the 2-run experiment detailed above with the temporal memory (instead of the spatial pooler), I get the same results from both HTMs. I'm only analyzing anomaly output though, so that doesn't necessarily prove that the serialization/deserialization process doesn't change the TM

SP,TM compare now should also check Connections with detailed ==, which is essentially everything. So the TM seems it might be ok. The HTMs would differ from the bottom layer (SP) if used.

@breznak
Copy link
Member

breznak commented Sep 2, 2020

.. additionally. I found that the parameter „seed“ is not saved...

seed is only used in the constructor, for rng_, we then save completely the whole rng.

breznak added a commit that referenced this issue Sep 2, 2020
version_
boostedOverlaps_
minActiveDutyCycles_

thanks @N3rv0uS, @pmenn36 and everyone who investigated the issue.
For issue #874
@breznak breznak changed the title Save and load SP,TM is not reproducible Save and load SP,TM is not the same Sep 2, 2020
@breznak
Copy link
Member

breznak commented Sep 2, 2020

Guys, can you retest, please? The SP part fixes were just merged in. PR for extended tests always welcome!

@pmenn36
Copy link

pmenn36 commented Sep 2, 2020

Guys, can you retest, please? The SP part fixes were just merged in. PR for extended tests always welcome!

Yep! Currently reinstalling from source so I can test this

@pmenn36
Copy link

pmenn36 commented Sep 2, 2020

@breznak both the string-equality check and my two-run test setup pass now! Thanks for fixing this

@breznak
Copy link
Member

breznak commented Sep 3, 2020

Glad to hear! Thanks to everybody investigating the issue 👍

string-equality check and my two-run test setup

a possible TODO to add to our tests, I like the example with restarted run from deserialized state.

Following #821 also affects serialization for the "whole HTM" (ScalarEncoder) and #799 .
Closing this for now, reopen if needed.

@breznak breznak closed this as completed Sep 3, 2020
@breznak breznak mentioned this issue Sep 3, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high serializable TM
Projects
None yet
Development

No branches or pull requests

6 participants