-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hamming Distance KNN similarity metric for long property #193
base: master
Are you sure you want to change the base?
Conversation
Hmm looks like CI needs approval from first-time contributors. I'll boot up my linux box later to work on this! |
public static double longMetric(long left, long right) { | ||
return Long.bitcount(left ^ right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: normalize to 0.0-1.0 range.
@htmlboss what problems are you having? Several people working on this project use M1s. There were some problems with double precision but I think those were fixed--I'll ask. Feel free to open an issue for whatever problems you're seeing. |
For the record, I'm working with an M1 Mac and I'm using the Zulu distribution of the JDK, currently version However, it is possible to use JDK17 as well, and we routinely test for that also. There I would recommend the Temurin distribution, |
Thanks for the suggestions! I guess my IntelliJ Idea configuration isn't setup properly 😝 @Mats-SX I'll give your versions a shot when I get a chance! |
0d3e307
to
d297bdb
Compare
d297bdb
to
3a6e285
Compare
3a6e285
to
23a0a4c
Compare
Hey, Would love to know if there was any further blocker, if we can help you or if you just ended up with a different solution :) |
This is a draft PR so I can leverage the github action to validate my tests, since the gradle setup for this project does not work on Apple silicon (M1). Not a Java expert so I can't quickly figure out a fix 🤷 Feel free to take a look, but I will mark this as ready for review when my changes are complete!
My end goal is to enable neo4j to build a KNN graph for finding top K similar images by leveraging perceptual hashes (pHash).
A pHash is simply a 64-bit unsigned integer (i.e.
long
in Java). I will not be adding any pHash-specific code to this PR. Any two pHashes can be compared for similarity by calculating the Hamming Distance between them.Computing the Hamming Distance isn't specific to pHashes, and is useful in genetic computing, error detection/correction, text analysis, where both inputs are of equal length, and where graph structures are common. In my humble (and noob) opinion, it makes sense to offer this feature to the wider community.
From reading your docs, I believe the most natural area to extend with this metric is the
LongPropertySimilarityComputer
(https://neo4j.com/docs/graph-data-science/current/algorithms/knn/#_scalar_numbers). I intend for the existing default behaviour to remain unchanged, and to allow a new configuration of the KNN algorithm such thatTODO: expand summary of changes + questions.
Changes summary:
HAMMING_DISTANCE
, which will compute a 0-1 normalizeddouble
representing the Hamming Distance between two long properties.LONG_PROPERTY_METRIC
toNORMALIZED_ABSOLUTE_DIFFERENCE
. Not really satisfied with the new name but I'm open to suggestions!Before submitting this PR, please make sure: