Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]ChatQnA dataprep not connect to embedding service #1482

Open
2 of 8 tasks
gavinlichn opened this issue Jan 29, 2025 · 5 comments · May be fixed by #1483
Open
2 of 8 tasks

[Bug]ChatQnA dataprep not connect to embedding service #1482

gavinlichn opened this issue Jan 29, 2025 · 5 comments · May be fixed by #1483
Assignees
Labels
bug Something isn't working

Comments

@gavinlichn
Copy link
Contributor

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Kubernetes GMC
  • Other

Running nodes

Single Node

What's the version?

v1.2

Description

After changed the embedding model, dataprep always go to local embedding, and use default embedding model.

Checked the settings, dataprep always check the environment "TEI_EMBEDDING_ENDPOINT" for external embedding, but docker compose is "TEI_ENDPOINT"

Reproduce steps

  1. Change the embedding model
  2. start up ChatQnA with docker_compose
  3. Ingest a file with curl
  4. Monitor the dataprep's log, it will download default embedding model

Raw log

Attachments

No response

@gavinlichn gavinlichn added the bug Something isn't working label Jan 29, 2025
gavinlichn added a commit to gavinlichn/GenAIExamples that referenced this issue Jan 29, 2025
GenAIComps dataprep-redis read the environment "TEI_EMBEDDING_ENDPOINT"
Change the compose file to aligned.

Fixes: opea-project#1482

Signed-off-by: Li Gang <[email protected]>
@gavinlichn gavinlichn linked a pull request Jan 29, 2025 that will close this issue
4 tasks
@xiguiw
Copy link
Collaborator

xiguiw commented Feb 5, 2025

Fixed in #1483 by @lvliang-intel.

@lianhao
Copy link
Collaborator

lianhao commented Feb 5, 2025

@Ruoyu-y has just found that, using the PR #1483, files ingested into the redis vector DB through dataprep, cant NOT be retrieved by retriever at all. This issue is only for latest image of opea/dataprep and opea/retriever, 1.2 version of those 2 images doesn't have this kind of issue.

@lvliang-intel
Copy link
Collaborator

@letonghan, please check this issue.

@letonghan
Copy link
Collaborator

@letonghan, please check this issue.

Will check it soon.

letonghan added a commit to letonghan/GenAIComps that referenced this issue Feb 7, 2025
Trace:
1. The update of `langchain_huggingface.HuggingFaceEndpointEmbeddings` caused the wrong size of embedding vectors.
2. Wrong size vectors are wrongly saved into Redis database in type of
   `byte`, and the indices are not created correctly.
3. The retriever can not retrieve data from Redis using index due to the
   reasons above.
4. Then the RAG seems `not work`, for the file uploaded can not be
   retrieved from database.

Solution:
Replace all of the `langchain_huggingface.HuggingFaceEndpointEmbeddings`
to `langchain_community.embeddings.HuggingFaceInferenceAPIEmbeddings`,
and modify related READMEs and scirpts.

Related issue: opea-project/GenAIExamples#1482

Signed-off-by: letonghan <[email protected]>
@letonghan
Copy link
Collaborator

Hi @gavinlichn , the root caused of dataprep has been located and fixed in this PR.
Please check for the Trace in the Description part, and try again later after it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants