Client side routing implementation #205

madchicken · 2024-12-16T10:51:24Z

This PR is a draft implementation of the client-side routing for new4j clusters.
The PR implements the ROUTE message for both the stable bolt implementation and the unstable one. Tests are being performed on the stable version of the protocol since the new one is still giving me problems in deserialization.
What the PR is still missing:

Testing: I am currently testing the driver on a real cluster but I have no guarantee of the performance and stability of the routing algorithm
~~- [ ] Bookmarks are not supported at the moment (the array is always sent empty)~~
The maximum supported version of the protocol is now 4.4 but we miss some of the features the protocol should expose from this version (the imp_user should be added to the transaction begin message)
~~- [ ] Only the round-robin algorithm is currently implemented as a routing strategy. Maybe we want to implement a LeastUsed algo and make it configurable.~~
~~- [ ] There is no handling of the initial routing context: it is always empty.~~
The retry mechanism should probably be reviewed with routing: it does not make sense to retry an operation on a failing connection, we should select another server from the cluster instead.

Additional Notes:

The lock mechanism for refreshing the routing table is sub-optimal (to use an euphemism). We should try to be smarter and use a different method to check for a new routing table.
~~- Extra fields in the route message are not implemented yet (part of v4.4)~~ Protocol version is now 4.4

knutwalker · 2024-12-18T17:07:57Z

Hey, thanks for this PR, this helps a lot!

I went over the changes and don't have anything to add, but I also did look in detail at the actual routing protocol and algorithms.

I'm ok with merging something before dealing with all limitations. In particular, I think it's fine to not support the routing context, bookmarks, only have partial 4.3 support, only having one routing strategy at the beginning. We don't need to have everything figured out upfront and can do this in steps/multiple PRs.
I would also be ok with only adding it to the new protocol implementation.

I've been thinking about the testing a bit (or a lot) lately; I don't like the amount of integration tests compared to what the unit tests do. Especially when testing bolt protocol details of a non-happy path, it can be either impossible or very difficult to setup.
The current setup is fine to actually run something against a one-and-done database, but it doesn't scale as well as I would like it to be.

I have half an idea to rewrite the core parts with the connection IO and pooling being abstracted out. I would like to be able to do more simulation testing where exact IO operations can be simulated in the tests without having to have an actual db to setup. I'd also like to move async out from the core to some outer layer. Maybe this goes in the sans-io direction, maybe it'll look a bit different.

madchicken · 2024-12-19T08:53:34Z

Thanks for the reply @knutwalker, I will remove the implementation of the client-side routing for the stable protocol and leave it only when the unstable-bolt-protocol-impl-v2 feature is active. Now that I have the workaround for the dates deserialization bug I can do some more testing. One thing I need to understand is this: when selecting the server in the routing table, should I use the bolt protocol or stick with neo4j for the final connection? I think this is very important for the server, isn't it?

…-v2 feature Formatting

knutwalker · 2024-12-19T15:28:47Z

I will remove the implementation of the client-side routing for the stable protocol and leave it only when the unstable-bolt-protocol-impl-v2 feature is active.

great!

One thing I need to understand is this: when selecting the server in the routing table, should I use the bolt protocol or stick with neo4j for the final connection? I think this is very important for the server, isn't it?

There shouldn't be a need for any scheme. The first connection uses the scheme to determine the initial routing and the encryption to use. When connecting to a server from the routing table, you use the same encryption settings (and, when supported, routing context) and directly connect to that socket address, without going through the bolt/neo4j scheme parsing again. That is what the product drivers do, at least. That looks like this requires some changes to the pool creation, essentially using the same ConnectionInfo, but with only the host/port replaced.

madchicken · 2024-12-19T15:41:47Z

There shouldn't be a need for any scheme. The first connection uses the scheme to determine the initial routing and the encryption to use. When connecting to a server from the routing table, you use the same encryption settings (and, when supported, routing context) and directly connect to that socket address, without going through the bolt/neo4j scheme parsing again. That is what the product drivers do, at least. That looks like this requires some changes to the pool creation, essentially using the same ConnectionInfo, but with only the host/port replaced.

Yes, sorry...I realized that too.

madchicken · 2024-12-19T18:36:27Z

@knutwalker please take a look at my last commit where I had to modify the main Graph interface. I had essentially to expose the Operation::Read/Write parameter only for the new protocol implementation. I hope it makes sense.
I also added a note in the description of the PR about the retry mechanism.

…ion to perform (read or write)

…and use slices to select the right server

…ey have multiple addresses) - When creating the pool, use the common config but change the uri with the resolved server one

First basic implementation of client side routing

a8904a9

madchicken force-pushed the client-side-routing branch from 2184948 to a8904a9 Compare December 19, 2024 11:47

New client-side routing enabled only with unstable-bolt-protocol-impl…

79722b5

…-v2 feature Formatting

madchicken force-pushed the client-side-routing branch from 5eacd78 to abf4dd8 Compare December 19, 2024 18:34

Modify the graph interface to make possible select the type of operat…

98bc0ce

…ion to perform (read or write)

madchicken force-pushed the client-side-routing branch from abf4dd8 to 98bc0ce Compare December 19, 2024 18:36

madchicken added 7 commits December 29, 2024 18:05

Remove some declarations left in the stable implementation

2c5a282

Optimization: store readers, writers and routers inside the registry …

5d13a5e

…and use slices to select the right server

Fix serialization of route message

cf1f349

- Resolve each server in the routing table to a vector of servers (th…

d0289a2

…ey have multiple addresses) - When creating the pool, use the common config but change the uri with the resolved server one

Implement v4_4 to better support route message

0c561a4

Simplify Graph code

aa213a6

Consider a failing connection from pool a permanent error

73b9591

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client side routing implementation #205

Client side routing implementation #205

madchicken commented Dec 16, 2024 •

edited

Loading

knutwalker commented Dec 18, 2024

madchicken commented Dec 19, 2024

knutwalker commented Dec 19, 2024

madchicken commented Dec 19, 2024

madchicken commented Dec 19, 2024

Client side routing implementation #205

Are you sure you want to change the base?

Client side routing implementation #205

Conversation

madchicken commented Dec 16, 2024 • edited Loading

knutwalker commented Dec 18, 2024

madchicken commented Dec 19, 2024

knutwalker commented Dec 19, 2024

madchicken commented Dec 19, 2024

madchicken commented Dec 19, 2024

madchicken commented Dec 16, 2024 •

edited

Loading