Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Neo4jDynamicDocumentRetriever Initialization with Haystack Pipeline in Python #4

Open
NechbaMohammed opened this issue Nov 2, 2024 · 5 comments
Assignees

Comments

@NechbaMohammed
Copy link


from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

from neo4j_haystack import Neo4jClientConfig, Neo4jDocumentStore, Neo4jDynamicDocumentRetriever

client_config = Neo4jClientConfig(
    url="bolt://localhost:7687",
    username="neo4j",
    password="new_password",
    database="neo4j",
    
)

documents = [
    Document(content="My name is Morgan and I live in Paris.", meta={"num_of_years": 3}),
    Document(content="I am Susan and I live in Berlin.", meta={"num_of_years": 7}),
]

# Same model is used for both query and Document embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"

document_embedder = SentenceTransformersDocumentEmbedder(model=model_name)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store = Neo4jDocumentStore(client_config=client_config, embedding_dim=384,index="document-embeddings3")
document_store.write_documents(documents_with_embeddings.get("documents"))

# Same model is used for both query and Document embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"

cypher_query = """
            CALL db.index.vector.queryNodes(document-embeddings3, 5, $query_embedding)
            YIELD node as doc, score
            MATCH (doc) WHERE doc.num_of_years = 3
            RETURN doc{.*, score}, score
            ORDER BY score DESC LIMIT 5
        """

embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
retriever = Neo4jDynamicDocumentRetriever(
    client_config=client_config, runtime_parameters=["query_embedding"], doc_node_name="doc"
)

pipeline = Pipeline()
pipeline.add_component("text_embedder", embedder)
pipeline.add_component("retriever", retriever)
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = pipeline.run(
    {
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "query": cypher_query,
            "parameters": {"index": "document-embeddings3", "top_k": 5, "num_of_years": 3},
        },
    }
)

documents: List[Document] = result["retriever"]["documents"]```

Error:
```(myenv) root@44083ef058c2:/home/pps-tool/test# python newtest.py
/home/pps-tool/test/myenv/lib/python3.10/site-packages/haystack/core/errors.py:34: DeprecationWarning: PipelineMaxLoops is deprecated and will be remove in version '2.7.0'; use PipelineMaxComponentRuns instead.
  warnings.warn(
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.28it/s]
Traceback (most recent call last):
  File "/home/pps-tool/test/newtest.py", line 43, in <module>
    retriever = Neo4jDynamicDocumentRetriever(
  File "/home/pps-tool/test/myenv/lib/python3.10/site-packages/haystack/core/component/component.py", line 288, in __call__
    ComponentMeta._parse_and_set_input_sockets(cls, instance)
  File "/home/pps-tool/test/myenv/lib/python3.10/site-packages/haystack/core/component/component.py", line 244, in _parse_and_set_input_sockets
    inner(getattr(component_cls, "run"), instance.__haystack_input__)
  File "/home/pps-tool/test/myenv/lib/python3.10/site-packages/haystack/core/component/component.py", line 232, in inner
    raise ComponentError(
haystack.core.errors.ComponentError: set_input_types()/set_input_type() cannot override the parameters of the 'run' method
@Ben-D-Nelson
Copy link

I think this is a version incompatibility with the Haystack core library, beginning in Haystack version 2.6.0

Add constraints to component.set_input_type and component.set_input_types to prevent undefined behaviour when the run method does not contain a variadic keyword argument.

Neo4j Haystack is attempting to set "query" and "parameters" as input types during init, without regard to the existing signature, and the Component class is rejecting them because they don't match the signature of the "run" method.

I'm going to see if downgrading my version of Haystack to 2.5.1 is an effective workaround.

@prosto
Copy link
Owner

prosto commented Dec 2, 2024

hi @Ben-D-Nelson

I will look into it tomorrow (day time) and let you know if fix is on its way, thanks for reporting

@Ben-D-Nelson
Copy link

Ben-D-Nelson commented Dec 2, 2024

Thanks @prosto

Downgrading to 2.5.1 seems to have worked. It certainly got me past the Neo 4j Retriever component in my pipeline.

@prosto
Copy link
Owner

prosto commented Dec 3, 2024

hi,

The new version 2.2.0 of the package contains the fix. Indeed, beginning with Haystack 2.6.0 input socket definitions should not overlap when defined (dynamically) in component's constructor and run method.

Also 2.2.0 version of neo4j-haystack is now set to depend on Haystack >=2.6.0 to avoid confusion.

thanks and let me know if that helps with your issues

@prosto prosto self-assigned this Dec 3, 2024
@Ben-D-Nelson
Copy link

@prosto Thanks, version 2.2.0 fixed the issue. Sorry it took so long to verify, some other things in my working environment decided to break this afternoon, and I just got everything whipped back into shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants