Standardizing APIs for RxInfer-based Agents #416

albertpod · 2025-01-31T13:01:31Z

albertpod
Jan 31, 2025
Maintainer

We're looking to establish standardized APIs for implementing Bayesian (AIF) agents using RxInfer. While there's ongoing research about the implementation details of planning functions, having a clear interface would help developers start building agents with our toolbox.

Current State

Currently, we have scattered examples like the mountain car and drone simulations that demonstrate agent capabilities, but they lack a developer-friendly API structure. The implementations mix concerns and require deep understanding of the internals.

Proposed API Structure

Here's a proposed high-level API structure for discussion:

Core Agent Interface

abstract type BayesianAgent end

struct RxInferAgent <: BayesianAgent
    model
    constraints
    state
    config
end

Key Operations

I propose standardizing these core operations:

Plan: Compute optimal actions based on current beliefs and goals

plan(agent::BayesianAgent, horizon::Int) -> Vector{Action}

Act: Execute a single action from the plan

act!(agent::BayesianAgent) -> Action

Learn: Update beliefs based on observations

learn!(agent::BayesianAgent, observation::Observation) -> Nothing

Slide: Prepare for the next timestep

slide!(agent::BayesianAgent) -> Nothing

Example Usage

Note: The following examples use an imperative loop structure for clarity. Our end goal is to provide a fully reactive implementation. These examples serve as a conceptual starting point to illustrate the core operations.

# Initialize agent
agent = RxInferAgent(
    model = my_model,
    constraints = my_constraints,
    state = initial_state,
    config = agent_config
)

# Main agent loop
for t in 1:n_steps
    # Plan future actions
    plan(agent, horizon=10)
    
    # Execute next action
    action = act!(agent)
    
    # Get environment feedback
    observation = environment.step(action)
    
    # Update agent's beliefs
    learn!(agent, observation)
    
    # Prepare for next timestep
    slide!(agent)
end

Open Questions

Configuration Options: What configuration options should be available?
Multi-Agent Support: How should we extend this API for multi-agent scenarios?

Request for Comments

We'd love to hear your thoughts on:

The overall API structure
Additional operations that should be included
Alternative approaches you've found effective
Use cases we should consider
Integration with RxEnvironments.jl

Please share your experiences and suggestions below!

Note: This is an initial proposal to start the discussion. All interfaces are subject to change based on community feedback and practical implementation experience.

FraserP117 · 2025-02-01T13:23:46Z

FraserP117
Feb 1, 2025
Collaborator

Thanks @albertpod. This is very exciting, I would like to help out with this. Standardizing the agent creation API will be an immensely valuable contribution!

I've begun some very tentative explorations in this direction by investigating how best to hook up an RxInfer model within an RxEnvironments agent/environment dyad. @wouterwln has been invaluable counsel to that end. You can access my repo of these explorations here. TL;DR on my musings is: yes, a standard imperative main loop is probably best for initial implementation and no, I have not managed RxInfer model integration yet.

I'll enumerate my thoughts and responses to your specifics as per their level of abstraction.

Global Considerations

The most abstract consideration that I would like to raise is where best to implement this agent API. Within RxInfer.jl (proper) or say RxEnvironments.jl? Should this even constitute a totally new package: "RxAgents.jl"; using RxInfer.jl and RxEnvironments.jl as dependencies?

Personally, I think that it might be wisest to implement this in a new "RxAgents.jl" package. An "RxAgents.jl" agent could then fairly seamlessly instantiate an RxEnvironments.jl RxEntity. With the subsequent specification of the environment "Entity" and the definition of the interface functions from RxEnvironments, the full dyad would be complete - regardless of the commitment to an imperative/reactive paradigm.

Core Agent Interface

I like this proposed structure. I'm afraid I don't have any other thoughts on this right now.

Key Operations

Plan: I like this, and I think this makes sense. I suspect the intent is to return a sequence of temporally-contiguous actions, the first of which is the action predicted for the next time step?: $[a_{t+1}, a_{t+2}, ... a_{T}]$.
Both Act and Learn seem like excellent ideas. No fruitful thoughts on these just yet.
Slide: I assume that slide will simply implement the same procedure as laid out by @ThijsvdLaar and @bertdv in this very helpful paper.

Open Questions:

Configuration Options: I don't have anything useful to say on this just yet. I'm not actually sure what this could refer to, other than functional-form and factorisation constraints - in (potential) addition to whether the agent is supposed to operate in the imperative/reactive context.
Multi-Agent Support: Perhaps we can avoid this problem (here) by simply letting RxEnvironments.jl handle it? As far as I understand, RxEnvironenments.jl already has the ability to handle "Entities" reactively.

Final Comments:

Regarding the overall structure, I have no enlightening comments. I do wonder how/if specific methods will have to change to deal with reactivity. I think it is best to focus on an imperative implementation first. I can't think of any additional operations that might be nice to include, other than to perhaps optionally record the agent's history of states in addition to its current state. I haven't found/used any alternative approaches though @kobus78 may well have done so in the course of his extensive implementations. Regarding integration with RxEnvironmnments, I am very much in favour of this. I think that the agent API could neatly constitute an RxEntity and it seems to me that the question of multiple agent support could be partitioned off to RxEnvironments.

I am very keen to assist with any aspect of this endeavour, going forward. I hope these thoughts are somewhat useful.

0 replies

bvdmitri · 2025-02-03T08:38:37Z

bvdmitri
Feb 3, 2025
Maintainer

Thanks for bringing this @albertpod! I also mostly agree with @FraserP117, but I would still make some practical changes to the for loop, as it looks quite odd to me.

Why the return value of the plan! function is not being used even though it is supposed to return Vector{Action} as described in the API ? Why does an agent act! on itself and why does it return an action? What is slide! supposed to mean to a newcomer? Why is environment.step used while everything else is a pure function? And why does preparation happen at the end?

Of course, I know the answers to these questions, but it would be better if they didn’t arise in the first place.

Here’s my proposed revision, which addresses these concerns:

agent = Agent()
environment = Environment()

# Main agent loop
for t in 1:n_steps
    # Prepare for the next time-step, can do nothing on the first iteration 
    # and "slide" on the next ones or do whatever it wants essentially
    prepare!(agent)

    # IMO `plan` should return "something", namely a "plan" or a series of actions
    actions = plan!(agent, horizon=10)
    
    # Execute next action (either first, or some other "picking" mechanism)
    act!(environment, agent, first(actions)) 
    
    # Get environment feedback, different agents observe different stuff
    observation = observe!(environment, agent)
    
    # Update agent's beliefs about the internal state of the world to be able to `act!` later on
    learn!(agent, observation)
end

which translates to the following core API:

Prepare: Do whatever it needs to prepare an agent, be it a slide function or something else

prepare!(agent::BayesianAgent) -> Nothing

Plan: Compute optimal actions based on current beliefs and goals

plan!(agent::BayesianAgent, horizon::Int) -> Vector{Action}

Act: Execute a single action

act!(environment::Environment, agent::BayesianAgent, action::Action) -> Nothing (or Success/Fail status?)

Observe Observe the current snapshot of the environment

observe!(environment::Environment, agent::BayesianAgent) -> Observation

Learn: Update beliefs based on observations

learn!(agent::BayesianAgent, observation::Observation) -> Nothing

I don't have a strong opinion on whenever to use ! at the end or not. We can remove them if most of us don't like it.

We also need some feedback from @wouterwln since he designed the API for RxEnvironments.jl and perhaps my proposal does not perfectly align with the API in RxEnvironments.jl.

0 replies

wouterwln · 2025-02-03T10:52:45Z

wouterwln
Feb 3, 2025
Maintainer

I'll try to structure my feedback as well as I can. I like the idea of the API, and I think indeed, as Fraser said, we should incorporate it in a separate package RxAgents.jl as much as we can. The idea of RxEnvironments.jl is that you wouldn't have to call observe! explicitly, and that RxEnvironments.jl does this for you. This does mean that you will get an additional get_latest_observation line or something, which is maybe exactly what you envision with observe. The API would look something like this:

last_action = 0.0
for t in 1:n_steps
    last_observation = get_latest_observation(agent)
    learn!(agent, observation, last_action)
    actions = plan(agent, n_steps - t)
    act!(agent, first(actions))
end

Now we have to think about some stuff. In my current POMDP implementation, I have the following structure:

@model function planning(p_A, p_B, y_current, y_future, T, p_s, u_current, goal_state)
    A ~ p_A
    B ~ p_B
    prev_state ~ p_s
    
    # Parameter inference step
    current_state ~ Transition(prev_state, B, u_current)
    y_current ~ Transition(current_state, A)
    previous_state = current_state

    # Planning step
    for i in 1:T
        u[i] ~ Categorical([0.2, 0.2, 0.2, 0.2, 0.2])
        s[i] ~ Transition(previous_state, B_future[i], u[i]) 
        y_future[i] ~ Transition(s[i], A_future[i])
        previous_state = s[i]
    end
    s[end] ~ goal_state
end

(Ignoring backwards messages from the planning stage towards A and B) this does parameter inference over A and B and planning in the same inference procedure (so learn! and plan simultaneously). We could incorporate this in our API as well, but I don't know how to name this. I think we can hide a lot of the boilerplate code though. Now, my inner loop looks a bit like this:


# Run inference with current A and B
result = infer(
    model = planning(
        p_A = A, 
        p_B = B,
        T = T - t,
        p_s = p_s,
        goal_state = Categorical([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0]),
    ),
    data = (
        u_current = UnfactorizedData(prev_u),
        y_current = UnfactorizedData(current_observation),
        y_future = UnfactorizedData(fill(missing, T - t))
    ),
    constraints = constraints,
    initialization = init,
    iterations = 10
)

# Update beliefs about environment dynamics
A = last(result.posteriors[:A])
B = last(result.posteriors[:B])

# Get next state belief and action
p_s = last(result.posteriors[:current_state])
a = mode(first(last(result.posteriors[:u])))

And most of this is stuff we can hide. Let's discuss what we can hide and what we cannot.

The API for RxEnvironments is quite easy I would say. There is send!, which sends data from one place (the agent) to some other place (the environment) and which could trigger a response. So act!(agent, action) for me would be:

act!(agent, action) = send!(get_environment(agent), agent, action)

or something similar, at least a direct alias. This will trigger the environment to send an observation back to the agent which can be accessed either explicitly, or we can subscribe to it. In the end we can also do something like this:

function do_inference_and_planning(agent, observation)
     learn!(agent, observation)
     new_plan = plan(agent, goal)
     act!(agent, first(new_plan))
end

RxEnvironments.subscribe_to_observations!(agent, x -> do_inference_and_planning(agent, x))

And do some smart stuff with receding time horizons and such. RxEnvironments keeps these options open for you and I think we can write boilerplate on top of RxInfer and RxEnvironments which marries these interfaces. Let me know what you guys think.

1 reply

bvdmitri Feb 4, 2025
Maintainer

After reading this I do think we should indeed use the reactive API from RxEnvironments.jl more

apashea · 2025-02-04T03:52:39Z

apashea
Feb 4, 2025

Thank you for your work on this and acceptance of feedback. I am in much agreement with prior points, especially Albert's loop logic/ordering (the # Initialize agent cell). Clear differences between state inference, policy/planning, and learning in that order are great. Few key points about topics unstated, just so you get some feedback on other areas:

For loops, the notion is that the agent will impact the environment with its action, thus influencing the outcome of the next observation. I think everyone has this in mind, but given I've given feedback to Dmitry and others for some time about the discrete state-space development, so I'm just assuming being very, very plain by this point might be helpful here. Want to avoid staying in what could otherwise virtually be an HMM scenario with extra internal inference steps (unless a particular user were to opt for that); dynamic interaction with an environment is the attractor.
In MATLAB and pymdp, which each still bear the VFE/EFE formulation, state inference occurs via minimizing VFE while policy inference involves scoring each action/policy by its respective EFE, the info gain and pragmatic value terms ( policy , denoted pi, is the word for a sequence of actions in the literature and the textbook we all read and most predominant ActInf published literature), and then following the equation q(pi)=softmax(-G*gamma - F - ln(E)) (each policy's posterior probability is a softmax'ed function of its respective scoring of EFE weight by policy precision parameter gamma which itself might be modulated, VFE, and its prior E) for the final posterior over policies q(pi) . You will probably find interest from many people in ActInf if you were able to implement some of these core aspects of what they're actually studying and using for their research, if not make a good case for using GFE or even better (no pressure) to specify one's own scheme but this bearing in mind many researchers are out there who are doing their PhDs, post-docs, etc. in computational neuroscience, cognitive science, robotics, psychiatry/mental health, machine learning, etc. Many are into applying reinforcement learning to problems and finding ActInf to be the framework they want to use, so actually having some kind of environment.step() function will be rather clear to that subsection of folks, not so much the act!() sort of nomenclature ((of course it's julia and your own library, I'm indifferent on this particular point but am letting you know)) or wrapping up environments so tightly elsewhere that a person, say, programming an artificial behavioral experiment can't tweak and incrementally test their environment particularly easily without reading a chapter's worth or so of information. These sorts of folks I'm referring to are indeed scientific and can learn the math; they are not trained nor ready necessarily for making a course out of learning what is ultimately a new, very particular, in-development tool without it being a bit easier and more familiar to get into it. This is the case, I believe, for those still in education and training, already in industry and looking to ActInf for real-world applications, and elsewhere. You all should of course develop what you want to develop. I just want to provide some additional perspective, as someone having led the textbook study at the Institute, done large group workshops and worked on projects with others in pymdp, messages I receive regarding how to do ____ in the library, etc.
These sorts of topline functions/APIs for getting right into it is great. Meanwhile it's the flexibility of RxInfer, however, that will attract those who are wanting to move beyond MATLAB/pymdp as they are learning and studying common notions like hierarchical, hybrid (discrete/continuous), multi-timescale, and other interesting architectures which MATLAB/pymdp don't directly support without a significant amount of rigging if at all.
On the multi-agent question, I've personally just used list/dict and other iterable objects for storing multiple agents and then either iterating thru them or having loop conditions where a particular agent might receive an observation or react based on some attribute it and/or the environment has at time t, which are rather simple. I and some of the pymdp dev's have used networkx as an interesting way to link up agents in definitive networks, that's a bit of a stretch further than necessary though. I mean, multi-agent systems are still kind of "on the rise" so if you have something particularly clever or that meshes well with your framework, you might be able to trendset a bit on this front. Nonetheless looking at recently published papers on ActInf (and RL, generally) multi-agent schemes and tasks will give interesting food for thought.

I've given a fair amount of feedback on this so, just wanted to really share my thoughts on these other ends while I'm still available. Again, great work, will watch developments.

0 replies

LearnableLoopAI · 2025-02-06T12:47:41Z

LearnableLoopAI
Feb 6, 2025
Collaborator

I would love to see this standardization effort embrace an even wider scope. As I see it, the

AI field's main concern is Machine Learning. And
- Machine Learning's main concerns are the 3 pillars:
  - Supervised Learning (Regression, Classification),
  - Unsupervised Learning (Dimensionality reduction, Clustering), and
  - Sequential/Series Learning (Reinforcement Learning, Active Inference).

As 'traditional' machine learning seems to grow towards the bayesian approach, I would love to see our thinking also embrace the above so that the API will be generic enough, in both semantics and syntax, to cover all of the above. Indeed there are examples available already that showcase the applicability of RxInfer to the non-sequential areas. I think RxInfer could really become the tool for everyone, especially now that a Python implementation is also planned. Here is an example of what I mean. To consider the next data point in the case of:

Supervised Learning: the next state starts with provision: acquisition of another state/datapoint
Unsupervised Learning: the next state starts with provision: acquisition of another state/datapoint
Sequential/Series Learning: the next state starts with provision: transition from previous state/datapoint

To get a fuller picture of my thinking, please have a look at the following 2 posts, in particular the section 'Symbols/Nomenclature/Notation (KUF)' as well as the section 'MODELING' where the implementation happens:

https://learnableloop.com/posts/LitterModel_PORT.html
https://learnableloop.com/posts/ManagerEngagements_PORT.html

Best wishes with your valuable work!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReactiveBayes

Standardizing APIs for RxInfer-based Agents #416

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

ReactiveBayes

Standardizing APIs for RxInfer-based Agents #416

albertpod Jan 31, 2025 Maintainer

Current State

Proposed API Structure

Core Agent Interface

Key Operations

Example Usage

Open Questions

Request for Comments

Replies: 5 comments · 1 reply

FraserP117 Feb 1, 2025 Collaborator

Global Considerations

Core Agent Interface

Key Operations

Open Questions:

Final Comments:

bvdmitri Feb 3, 2025 Maintainer

wouterwln Feb 3, 2025 Maintainer

bvdmitri Feb 4, 2025 Maintainer

apashea Feb 4, 2025

LearnableLoopAI Feb 6, 2025 Collaborator

albertpod
Jan 31, 2025
Maintainer

Replies: 5 comments 1 reply

FraserP117
Feb 1, 2025
Collaborator

bvdmitri
Feb 3, 2025
Maintainer

wouterwln
Feb 3, 2025
Maintainer

bvdmitri Feb 4, 2025
Maintainer

apashea
Feb 4, 2025

LearnableLoopAI
Feb 6, 2025
Collaborator