Add POMDP example and change HMM example to work with `DiscreteTransition` and `DirichletCollection` #12

wouterwln · 2025-02-06T12:17:04Z

@FraserP117 will check the POMDP tutorial, but I think it is in a mergeable state already

…ransition

FraserP117 · 2025-02-07T09:19:48Z

This is really great stuff! I've run the example and played around a bit. I'll lay out my thoughts and queries regarding your explanations below. Note: I have only looked at the POMDP example thus far.

Environment Setup: I found the env setup with RxEnvironments perfectly transparent. The link to the docs on WindyGridWorld was very helpful.
Model Setup:

2.1. You say: "We will use the DiscreteTransition node in RxInfer to define the state transition model." However I could not find the DiscreteTransition node definition or (dev/stable) docs in RxInfer or ReactiveMP. My guess is that this is what the Transition node used to be called? In spite of the fact that you consistently call DiscreteTransition everything works fine so I fell like I'm misunderstanding something.

2.2. I would really love to see the docs on the Transition/DiscreteTransition node. Regarding your explanation of the Transition node, I'm not sure what an "interface" is though I take it that in and out from: out ~ Transition(in, parameters, additional_interfaces...) are tensors of some kind: in being the prior on the transition model: B and out being the posterior for the transition model - for example? Hence I wonder if an "interface" should just be taken to mean any argument to the node?

2.3. Regarding the model definition, I can basically work out what everything is here though it would be nice to add in comments that explicitly label each variable.
Variational Constraints: I found these straightforward and your explanation sufficed to make it so. My only issue is that I don't understand why we have:

init = @initialization begin
    q(A) = DirichletCollection(diageye(25) .+ 0.1)
    q(B) = DirichletCollection(ones(25, 25, 4))
end

instead of

init = @initialization begin
    q(A) = DirichletCollection(diageye(36) .+ 0.1)
    q(B) = DirichletCollection(ones(36, 36, 4))
end

Given that the WindyGridWorld is a (6, 6) grid.

Priors on Model Params: This makes sense and I think you explained it well. Again, I question why p_A, p_B and the 3 methods defined here use 25 instead of 36. Perhaps because the goal is to be found within a smaller radius than the full (6, 6) grid, this means that we don't need to model the whole grid? Perhaps you originally made the grid (5, 5) and then changed your mind? More likely, I'm just misunderstanding something!
Main Loop: I think this is fairly straightforward to understand, however I have some thoughts. I like the explanation about the chosen order of operations: how we will first take an action, observe and then update belief. Regarding the actual call to infer() I absolutely do not understand why we have:

...
m_A = mean(p_A),
m_B = mean(p_B)
...

I get that you say: "The real reason we did this is because we do not want messages from the future to influence the model parameters, instead only learning the model parameters from past data. " Fair enough and thank you for that context, otherwise I would have no idea what was going on here. I still have no appreciation as to why future messages would influence inference over the model parameters, were this step to be omitted. I don't even know why that would be such a bad thing, in principle. Perhaps because the experimental setup requires it in this specific case?

There are one or two trivial spelling mistakes but that's a real nit-pick. What I think would help me most now is to see the docs on Transition/DiscreteTransition, in addition to digesting the source code. The latter of which I can do at any time.

Many thanks again! I'm keen to help out wherever on this.

FraserP117 · 2025-02-07T09:21:49Z