Add ability to send images to the Assistant #416

andreibondarev · 2023-12-09T00:37:11Z

You should be able to provide an image_url to the Assistant for the supported multi-modal LLMs:

OpenAI support (Langchain::Assistant when using OpenAI accepts a message with image_url #799)
Mistral AI support (Langchain::Assistant when using MistralAI accepts a message with image_url #803)
Add ability to send images to the Assistant when using Ollama #847
Add ability to send images to the Assistant when using Anthropic #848
Google Gemini (currently only supports images uploaded to Google Cloud)
Google Vertex AI (currently only supports images uploaded to Google Cloud)

Note

Some of the LLMs do not accept an image_url rather a Base64-encoded payload (Anthropic) or a file URI uploaded to the cloud (Google Gemini). We need to figure out how to handle it.

The text was updated successfully, but these errors were encountered:

dghirardo · 2024-06-04T18:28:05Z

Hi @andreibondarev, I noticed that the current version already supports sending images to LLMs.

You just need to include the image within the messages parameter. For example, when using OpenAI models, you can include images using the image_url content type. Here's how:

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

llm.chat(
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  model: "gpt-4o"
).completion

Other LLMs only support sending the image in base64 format, but this must still be done within the messages parameter.

andreibondarev · 2024-09-30T19:51:49Z

Support for OpenAI with #799.

mattlindsey · 2024-11-12T03:31:25Z

You have probably thought about this already, but seems there are many cases to support. One solution is have all these different parameters to the Assistant:
image_url_data where image is fetched from URL into base64 first and sent to llm that way (or image_urls_data array)
image_url where just URL is sent to llm (or image_urls array)
image where single base64 data is sent (or images array)
image_filename where file is read into memory and sent (or image_filenames array)

Sorry if I'm making this too confusing/complicated. There's also image_uri I suppose.

andreibondarev self-assigned this Sep 30, 2024

andreibondarev linked a pull request Sep 30, 2024 that will close this issue

Langchain::Assistant when using OpenAI accepts a message with image_url #799

Merged

andreibondarev closed this as completed in #799 Sep 30, 2024

andreibondarev reopened this Oct 1, 2024

andreibondarev changed the title ~~Add ability to send images to LLMs~~ Add ability to send images to the Assistant Oct 1, 2024

andreibondarev added the assistants Related to Langchain::Assistant class label Oct 16, 2024

andreibondarev mentioned this issue Oct 16, 2024

Analyzing Images #794

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to send images to the Assistant #416

Add ability to send images to the Assistant #416

andreibondarev commented Dec 9, 2023 •

edited

Loading

dghirardo commented Jun 4, 2024

andreibondarev commented Sep 30, 2024 •

edited

Loading

mattlindsey commented Nov 12, 2024

Add ability to send images to the Assistant #416

Add ability to send images to the Assistant #416

Comments

andreibondarev commented Dec 9, 2023 • edited Loading

Note

dghirardo commented Jun 4, 2024

andreibondarev commented Sep 30, 2024 • edited Loading

mattlindsey commented Nov 12, 2024

andreibondarev commented Dec 9, 2023 •

edited

Loading

andreibondarev commented Sep 30, 2024 •

edited

Loading