Spark version and jaccard_at_thresholds distance #673
-
hello guys, first thanks to the developers for this great package for linkage. i have been using splink for linkage with duckdb , now i will try spark , and i have some questions and problems 1 - is there a specific version of spark to use? which one? 3? 2? 2 - using the same script that worked with pandas I changed it to the spark backend, at the time of the distance it is showing this here AnalysisException: Undefined function: 'JACCARD'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 28 pos 11 example of fields #import splink.duckdb.duckdb_comparison_library as cl
thanks in advance |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think you need to register the Jaccard UDF:
The UDF jar for Spark is in |
Beta Was this translation helpful? Give feedback.
I think you need to register the Jaccard UDF:
The UDF jar for Spark is in
splink/files/spark_jars
here.