Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: filter - enhancements in subscription management #3198

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Ivansete-status
Copy link
Collaborator

@Ivansete-status Ivansete-status commented Dec 5, 2024

Description

Railgun notified about filter issues. This PR tackled that.
Railgun informed that after 4 days working properly, the nodes stopped sending messages. There were also issues that caused a max of 100 messages being sent over one connection and the reason for that was that we were dialPeering on every single message the filter server sent to the filter subscribers.

Changes

  • waku_filter_v2: idiomatic way run periodic subscription manager
  • filter subscriptions: add more debug logs
  • filter: make sure the custom start and stop procs are called
  • make sure filter protocol is started if it is mounted
  • filter: dial push connection onsubscribe only
  • waku/waku_filter_v2/client.nim: make the client to behave as a server for the FilterPush messages sent by filter server

How to test

  1. Start a nwaku with wss enabled
  2. lab.waku.org -> examples/experimental/railgun-repro -> npm run dev
  3. Press F12 and you should see messages comming without interruption

Issue

closes #3191
closes #3192

* waku_filter_v2: idiomatic way run periodic subscription manager
* filter subscriptions: add more debug logs
* filter: make sure the custom start and stop procs are called
* make sure filter protocol is started if it is mounted
* filter: dial push connection onsubscribe only
@Ivansete-status Ivansete-status changed the title filter: enhancements in subscription management fix: filter - enhancements in subscription management Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3198

Built from 9064bb8

Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woooow amazing PR! 😍

Thanks so much!

waku_px_errors.inc(labelValues = [exc.msg])
callResult = (
status_code: PeerExchangeResponseStatusCode.SERVICE_UNAVAILABLE,
status_desc: some($exc.msg),
)
finally:
debug "JJJJ"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to remove this, although I liked more when we used jamon for random debug logs 😆

@@ -272,6 +286,7 @@ proc initProtocolHandler(wpx: WakuPeerExchange) =
).isOkOr:
error "Failed to respond with TOO_MANY_REQUESTS:", error = $error
# close, no data is expected
debug "JJJJ"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here lol

Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree on re-thinking how we manage message-push and its requirements.
Can you add some explanation how this solution exactly solves the issue appeared for Railgun? I see some hole in the changed connection management as is now.
Can we brainstorm on this?


if not wf.subscriptions.isSubscribed(peerId):
debug "pinging peer has no subscriptions", peerId = peerId
error "pinging peer has no subscriptions", peerId = peerId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not pretty sure if it is a real error here. I think it can be a real scenario when a client first pings than start subscribe in case of negative answer. Like edge nodes go sleep and when come back it starts a ping to check if it needs to renew the subs or not.

# TODO: check if this condition is valid???
if pubsubTopic.isNone() or contentTopics.len == 0:
error "pubsubTopic and contentTopics must be specified", peerId = peerId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, similarly here, this is an error response here but it is not an error (maybe a warn that the service might be configured with larger pools).

return err(
FilterSubscribeError.badRequest("pubsubTopic and contentTopics must be specified")
)

if contentTopics.len > MaxContentTopicsPerRequest:
error "exceeds maximum content topics", peerId = peerId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -236,7 +250,7 @@ proc handleMessage*(
let subscribedPeers =
wf.subscriptions.findSubscribedPeers(pubsubTopic, message.contentTopic)
if subscribedPeers.len == 0:
trace "no subscribed peers found",
error "no subscribed peers found",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is an error if there is noone that subscribed to that message. Even further in general it is an annoying log as trace. Rather we log only positives in such cases.

return err("node has reached maximum number of subscriptions")
return err("node has reached maximum number of subscriptions: " & $(s.maxPeers))

let connRes = await peerManager.dialPeer(peerId, WakuFilterPushCodec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the intent to avoid unnecessary dials over message push cycles, although opening a stream at subscribe seems to me a over optimization + an attack vector where a harmful actor may subscribe to filters will never match and eat up our connections pool.
Also it can be that a subscription may not result in push in a close future without any intent to harm the service.

# Protocol specifies no response for now
return
## Notice that the client component is acting as a server of WakuFilterPushCodec messages
while not conn.atEof():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly this helps on being a server like? I though it is libp2p duty to maintain it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants