Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to send images to the Assistant #416

Open
4 of 6 tasks
andreibondarev opened this issue Dec 9, 2023 · 3 comments · Fixed by #799
Open
4 of 6 tasks

Add ability to send images to the Assistant #416

andreibondarev opened this issue Dec 9, 2023 · 3 comments · Fixed by #799
Assignees
Labels
assistants Related to Langchain::Assistant class

Comments

@andreibondarev
Copy link
Collaborator

andreibondarev commented Dec 9, 2023

You should be able to provide an image_url to the Assistant for the supported multi-modal LLMs:

Note

Some of the LLMs do not accept an image_url rather a Base64-encoded payload (Anthropic) or a file URI uploaded to the cloud (Google Gemini). We need to figure out how to handle it.

@dghirardo
Copy link
Contributor

Hi @andreibondarev, I noticed that the current version already supports sending images to LLMs.

You just need to include the image within the messages parameter. For example, when using OpenAI models, you can include images using the image_url content type. Here's how:

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

llm.chat(
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  model: "gpt-4o"
).completion

Other LLMs only support sending the image in base64 format, but this must still be done within the messages parameter.

@andreibondarev
Copy link
Collaborator Author

andreibondarev commented Sep 30, 2024

Support for OpenAI with #799.

@andreibondarev andreibondarev reopened this Oct 1, 2024
@andreibondarev andreibondarev changed the title Add ability to send images to LLMs Add ability to send images to the Assistant Oct 1, 2024
@andreibondarev andreibondarev added the assistants Related to Langchain::Assistant class label Oct 16, 2024
@mattlindsey
Copy link
Contributor

You have probably thought about this already, but seems there are many cases to support. One solution is have all these different parameters to the Assistant:
image_url_data where image is fetched from URL into base64 first and sent to llm that way (or image_urls_data array)
image_url where just URL is sent to llm (or image_urls array)
image where single base64 data is sent (or images array)
image_filename where file is read into memory and sent (or image_filenames array)

Sorry if I'm making this too confusing/complicated. There's also image_uri I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assistants Related to Langchain::Assistant class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants