You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I have a bunch of images such as passports, licenses, tax docs etc. I need to extract and validate the data that they have by asking the LLM questions such as is the Passport expired? Is the tax doc of the year 2024. These questions will be adhoc and input by the users, so cant use off the shelf OCR for it.
Describe the solution you'd like
Upload the image (ex Tax documents)
Ask the question is it valid for 2024?
What it the total tax paid?
Describe alternatives you've considered
I know this can be done from the UI of chat gpt-4, but I dont have any other options at the moment
Additional context
The questions are adhoc, but generally centered around validating and extracting facts from the image. And the documents are all images. It may already be doable with the assistants api, but an working example is required, as Im not able to make it work.
The text was updated successfully, but these errors were encountered:
@ausangshukla Right now the Langchain::Assistant, when using OpenAI or MistralAI, supports sending image_url. Take a look at this example: https://gist.github.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e. I'm imagining if you're uploading the same types of documents, you could define your own tool, like a PassportDataExtractor that would extract certain values, like { full_name:, expiration_date:, issue_date: }. What do you think?
Is your feature request related to a problem? Please describe.
I have a bunch of images such as passports, licenses, tax docs etc. I need to extract and validate the data that they have by asking the LLM questions such as is the Passport expired? Is the tax doc of the year 2024. These questions will be adhoc and input by the users, so cant use off the shelf OCR for it.
Describe the solution you'd like
Describe alternatives you've considered
I know this can be done from the UI of chat gpt-4, but I dont have any other options at the moment
Additional context
The questions are adhoc, but generally centered around validating and extracting facts from the image. And the documents are all images. It may already be doable with the assistants api, but an working example is required, as Im not able to make it work.
The text was updated successfully, but these errors were encountered: