Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing Images #794

Closed
ausangshukla opened this issue Sep 27, 2024 · 3 comments · Fixed by #803
Closed

Analyzing Images #794

ausangshukla opened this issue Sep 27, 2024 · 3 comments · Fixed by #803
Labels
enhancement New feature or request

Comments

@ausangshukla
Copy link

ausangshukla commented Sep 27, 2024

Is your feature request related to a problem? Please describe.
I have a bunch of images such as passports, licenses, tax docs etc. I need to extract and validate the data that they have by asking the LLM questions such as is the Passport expired? Is the tax doc of the year 2024. These questions will be adhoc and input by the users, so cant use off the shelf OCR for it.

Describe the solution you'd like

  1. Upload the image (ex Tax documents)
  2. Ask the question is it valid for 2024?
  3. What it the total tax paid?

Describe alternatives you've considered
I know this can be done from the UI of chat gpt-4, but I dont have any other options at the moment

Additional context
The questions are adhoc, but generally centered around validating and extracting facts from the image. And the documents are all images. It may already be doable with the assistants api, but an working example is required, as Im not able to make it work.

@ausangshukla ausangshukla added the enhancement New feature or request label Sep 27, 2024
@andreibondarev
Copy link
Collaborator

@ausangshukla Yep, you'll be able to do that after this PR is merged.

@andreibondarev
Copy link
Collaborator

@ausangshukla Right now the Langchain::Assistant, when using OpenAI or MistralAI, supports sending image_url. Take a look at this example: https://gist.github.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e. I'm imagining if you're uploading the same types of documents, you could define your own tool, like a PassportDataExtractor that would extract certain values, like { full_name:, expiration_date:, issue_date: }. What do you think?

@andreibondarev
Copy link
Collaborator

Closing this issue as it's duplicate with #416.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants