[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

stephanj · 2024-12-10T11:29:48Z

Would be nice to also support image (screenshot) uploads for multimodal enabled LLMs.

stephanj · 2024-12-12T17:34:07Z

Google Gemini Exp 1206 suggestions:

Strategy to implement an "Add Image" button next to the "Add File" button in your DevoxxGenieIDEAPlugin, along with the considerations for handling multimodal LLMs:

1. Update the ActionButtonsPanel

Create the "Add Image" Button:

// In ActionButtonsPanel.java, inside the createButtons() method:
private @NotNull JButton createAddImageButton() {
    JButton button = new JHoverButton(AddImageIcon, true); // Assuming you have an AddImageIcon
    button.setToolTipText("Add image to prompt context");
    button.addActionListener(this::selectImageForPromptContext);
    button.setMinimumSize(minSize);
    button.setMaximumSize(maxSize);
    return button;
}

Add the Button to the Panel:

// In ActionButtonsPanel.java, inside the createButtonPanel() method:
private @NotNull JPanel createButtonPanel() {
    JPanel buttonPanel = new JPanel(new GridLayout(1, 5, 5, 0)); // Increase columns for new button
    buttonPanel.add(submitBtn);
    buttonPanel.add(calcTokenCostBtn);
    buttonPanel.add(addProjectBtn);
    buttonPanel.add(addFileBtn);
    buttonPanel.add(addImageBtn); // Add the new button here
    return buttonPanel;
}

Declare the addImageBtn:

// In ActionButtonsPanel.java, add as a class member:
private JButton addImageBtn;

Initialize the Button:

// In ActionButtonsPanel.java, inside the createButtons() method:
addImageBtn = createAddImageButton();

2. Implement Image Selection Logic

Create selectImageForPromptContext method:

// In ActionButtonsPanel.java
private void selectImageForPromptContext(ActionEvent e) {
    JFileChooser fileChooser = new JFileChooser();
    fileChooser.setFileFilter(new FileNameExtensionFilter("Image files", "png", "jpg", "jpeg", "gif"));
    int result = fileChooser.showOpenDialog(this);
    if (result == JFileChooser.APPROVE_OPTION) {
        File selectedFile = fileChooser.getSelectedFile();
        VirtualFile virtualFile = LocalFileSystem.getInstance().findFileByIoFile(selectedFile);
        if (virtualFile != null) {
            // Add the image to your context (similar to how you handle files)
            // ... (Implementation to handle the image file) ...
            NotificationUtil.sendNotification(project, "Image added to prompt context: " + virtualFile.getName());
        } else {
            NotificationUtil.sendNotification(project, "Could not add image to prompt context");
        }
    }
}

3. Handle Image Data in ChatMessageContext

Modify ChatMessageContext: You'll need a way to store the selected image data in your ChatMessageContext. You could add a field like:
```
// In ChatMessageContext.java
@Getter
@Setter
private VirtualFile imageFile;
```

4. Adapt MessageCreationService

Modify createUserMessageWithContextAsync:

// In MessageCreationService.java
public @NotNull CompletableFuture<String> createUserPromptWithContextAsync(Project project, String userPrompt, @Nullable VirtualFile imageFile, List<VirtualFile> files) {
    return CompletableFuture.supplyAsync(() -> {
        StringBuilder userPromptContext = new StringBuilder();

        // ... (Existing code to handle files) ...

        // Add image handling
        if (imageFile != null) {
            try {
                byte[] imageData = imageFile.contentsToByteArray();
                String base64Image = Base64.getEncoder().encodeToString(imageData);
                userPromptContext.append("Image: ").append(imageFile.getName()).append("\n");
                userPromptContext.append("ImageData: data:image/png;base64,").append(base64Image).append("\n"); // Assuming PNG format for simplicity
            } catch (IOException e) {
                NotificationUtil.sendNotification(project, "Error reading image file: " + imageFile.getName());
            }
        }

        userPromptContext.append(userPrompt);
        return userPromptContext.toString();
    });
}

5. Update ChatMessageContextUtil

Modify createContext:

// In ChatMessageContextUtil.java
public static @NotNull ChatMessageContext createContext(Project project,
                                                        String userPromptText,
                                                        LanguageModel languageModel,
                                                        ChatModelProvider chatModelProvider,
                                                        @NotNull String actionCommand,
                                                        EditorFileButtonManager editorFileButtonManager,
                                                        String projectContext,
                                                        boolean isProjectContextAdded,
                                                        VirtualFile imageFile) { // Add imageFile parameter

    // ... (Existing code) ...

    context.setImageFile(imageFile); // Set the image file in the context

    // ... (Rest of the existing code) ...
}

6. Update ActionButtonsPanelController

Modify handlePromptSubmission:

// In ActionButtonsPanelController.java
public boolean handlePromptSubmission(String actionCommand,
                                      boolean isProjectContextAdded,
                                      String projectContext,
                                      VirtualFile imageFile) { // Add imageFile parameter

    // ... (Existing code) ...

    ChatMessageContext currentChatMessageContext =
            ChatMessageContextUtil.createContext(project,
                    userPromptText,
                    getSelectedLanguageModel(),
                    chatModelProvider,
                    actionCommand,
                    editorFileButtonManager,
                    projectContext,
                    isProjectContextAdded,
                    imageFile); // Pass imageFile to createContext

    // ... (Rest of the existing code) ...
}

7. Update PromptExecutionController

Modify handlePromptSubmission:

// In PromptExecutionController.java
public boolean handlePromptSubmission(ActionEvent actionEvent) {
    // ... (Existing code) ...

    VirtualFile imageFile = actionButtonsPanel.getImageFile(); // Get the image file from ActionButtonsPanel

    boolean response = promptExecutionController.handlePromptSubmission(actionEvent.getActionCommand(),
            projectContextController.isProjectContextAdded(),
            projectContextController.getProjectContext(),
            imageFile); // Pass imageFile

    // ... (Rest of the existing code) ...
}

8. Considerations for Multimodal LLMs

Model Selection: Ensure that the selected LLM in llmProviderPanel supports image input. You might need to add a way to filter or indicate multimodal capabilities in your LanguageModel and ModelProvider enums.
API-Specific Handling: The way you send image data to the LLM will depend on the specific API you're using (OpenAI, Google Gemini, etc.). You'll need to consult their documentation for the correct format (e.g., base64 encoded image, URL to an image).
Error Handling: Implement robust error handling for cases where the image cannot be loaded, is in an unsupported format, or if the LLM provider does not support multimodal input.

9. UI/UX Enhancements

Image Preview: Consider adding a small preview of the selected image in the ActionButtonsPanel or somewhere in the UI to provide visual feedback to the user.
Clear Image: Add a way to clear the selected image (e.g., a small "x" button next to the preview).

Example using OpenAI's Vision API (Conceptual)

If you were using OpenAI's Vision API, your MessageCreationService might need to construct a UserMessage like this (refer to their documentation for the exact format):

// ... inside MessageCreationService
List<ChatMessageContent> contents = new ArrayList<>();
contents.add(TextContent.from(userPrompt));

if (chatMessageContext.getImageFile() != null) {
    byte[] imageData = chatMessageContext.getImageFile().contentsToByteArray();
    String base64Image = Base64.getEncoder().encodeToString(imageData);
    contents.add(ImageContent.from(new ImageContent.ImageUrl("data:image/png;base64," + base64Image)));
}

UserMessage userMessage = new UserMessage(contents);

Important Notes:

Error Handling: I've added some basic error handling, but you'll need to expand this to cover various scenarios (file not found, invalid image format, etc.).
API-Specific Code: The code for sending the image data to the LLM will be highly dependent on the specific API you are using. I've provided a conceptual example using base64 encoding, but you'll need to adapt it.
Testing: Thoroughly test the image upload, prompt construction, and LLM interaction with different image types and sizes.

This comprehensive strategy should provide a solid foundation for implementing the "Add Image" functionality in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider to ensure you're using their API correctly.

Gemini Flash 2.0 EXP review after it was told how the LangChain4J API supports images...

Okay, let's integrate the Langchain4j image handling into your DevoxxGenieIDEAPlugin, building upon the previous response. We'll focus on adapting the MessageCreationService and ChatMessageContext to support ImageContent.

1. Update ChatMessageContext

Add imageFile field: We already added this in the previous response, but let's make sure it's there and has the correct type:
```
// In ChatMessageContext.java
@Getter
@Setter
private VirtualFile imageFile;
```

2. Adapt MessageCreationService

Modify createUserMessage: This is where we'll construct the UserMessage with TextContent and ImageContent if an image is present.

// In MessageCreationService.java
import dev.langchain4j.data.message.ChatMessageContent;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.io.IOException;

// ... (Existing code) ...

@NotNull
public UserMessage createUserMessage(@NotNull ChatMessageContext chatMessageContext) {
    List<ChatMessageContent> contents = new ArrayList<>();

    // Add system prompt for OpenAI o1 models
    if (ChatMessageContextUtil.isOpenAIo1Model(chatMessageContext.getLanguageModel())) {
        String systemPrompt = DevoxxGenieStateService.getInstance().getSystemPrompt();
        contents.add(TextContent.from("<SystemPrompt>" + systemPrompt + "</SystemPrompt>\n\n"));
    }

    // If git diff is enabled, add special instructions
    if (Boolean.TRUE.equals(DevoxxGenieStateService.getInstance().getGitDiffActivated())) {
        contents.add(TextContent.from("<DiffInstructions>" + GIT_DIFF_INSTRUCTIONS + "</DiffInstructions>\n\n"));
    } else if (Boolean.TRUE.equals(DevoxxGenieStateService.getInstance().getRagActivated())) {
        // Semantic search is enabled, add search results
        String semanticContext = addSemanticSearchResults(chatMessageContext);
        if (!semanticContext.isEmpty()) {
            contents.add(TextContent.from("<SemanticContext>\n" + semanticContext + "\n</SemanticContext>"));
        }
    }

    // Add the user's prompt
    contents.add(TextContent.from("<UserPrompt>" + chatMessageContext.getUserPrompt() + "</UserPrompt>\n\n"));

    // Add editor content or selected text
    String editorContent = getEditorContentOrSelectedText(chatMessageContext);
    if (!editorContent.isEmpty()) {
        contents.add(TextContent.from("<EditorContext>\n" + editorContent + "\n</EditorContext>\n\n"));
    }

    // Add image content if present
    if (chatMessageContext.getImageFile() != null) {
        try {
            byte[] imageData = chatMessageContext.getImageFile().contentsToByteArray();
            String base64Image = Base64.getEncoder().encodeToString(imageData);
            contents.add(ImageContent.from("data:image/png;base64," + base64Image)); // Assuming PNG format
        } catch (IOException e) {
            LOG.warning("Error reading image file: " + chatMessageContext.getImageFile().getName());
            // Handle the exception, maybe add a placeholder or log the error
        }
    }

    UserMessage userMessage = new UserMessage(contents);
    chatMessageContext.setUserMessage(userMessage);
    return userMessage;
}

3. Update ChatMessageContextUtil

Modify createContext: Add the imageFile parameter to the method signature and pass it to the ChatMessageContext builder.

// In ChatMessageContextUtil.java
public static @NotNull ChatMessageContext createContext(Project project,
                                                        String userPromptText,
                                                        LanguageModel languageModel,
                                                        ChatModelProvider chatModelProvider,
                                                        @NotNull String actionCommand,
                                                        EditorFileButtonManager editorFileButtonManager,
                                                        String projectContext,
                                                        boolean isProjectContextAdded,
                                                        VirtualFile imageFile) { // Add imageFile parameter

    // ... (Existing code) ...

    ChatMessageContext context = ChatMessageContext.builder()
        // ... (Existing code) ...
        .imageFile(imageFile) // Set the image file in the context
        // ... (Rest of the existing code) ...
        .build();

    // ... (Rest of the existing code) ...
}

4. Update ActionButtonsPanelController

Modify handlePromptSubmission: Pass the imageFile to the ChatMessageContextUtil.createContext method.

// In ActionButtonsPanelController.java
public boolean handlePromptSubmission(String actionCommand,
                                      boolean isProjectContextAdded,
                                      String projectContext) {

    // ... (Existing code) ...

    ChatMessageContext currentChatMessageContext =
            ChatMessageContextUtil.createContext(project,
                    userPromptText,
                    getSelectedLanguageModel(),
                    chatModelProvider,
                    actionCommand,
                    editorFileButtonManager,
                    projectContext,
                    isProjectContextAdded,
                    imageFile); // Pass imageFile to createContext

    // ... (Rest of the existing code) ...
}

5. Update ActionButtonsPanel

Add imageFile getter:

// In ActionButtonsPanel.java
@Getter
private VirtualFile imageFile;

Modify selectImageForPromptContext:

// In ActionButtonsPanel.java
private void selectImageForPromptContext(ActionEvent e) {
    JFileChooser fileChooser = new JFileChooser();
    fileChooser.setFileFilter(new FileNameExtensionFilter("Image files", "png", "jpg", "jpeg", "gif"));
    int result = fileChooser.showOpenDialog(this);
    if (result == JFileChooser.APPROVE_OPTION) {
        File selectedFile = fileChooser.getSelectedFile();
        VirtualFile virtualFile = LocalFileSystem.getInstance().findFileByIoFile(selectedFile);
        if (virtualFile != null) {
            this.imageFile = virtualFile;
            NotificationUtil.sendNotification(project, "Image added to prompt context: " + virtualFile.getName());
        } else {
            NotificationUtil.sendNotification(project, "Could not add image to prompt context");
        }
    }
}

Explanation of Changes:

ChatMessageContext: We added a VirtualFile imageFile field to store the selected image.
MessageCreationService:
- We now construct a List<ChatMessageContent> to hold both text and image content.
- If an imageFile is present in the ChatMessageContext, we read the image data, encode it as a base64 string, and create an ImageContent object.
- We then add both the TextContent and ImageContent to the UserMessage.
ChatMessageContextUtil: We added the imageFile parameter to the createContext method and pass it to the ChatMessageContext builder.
ActionButtonsPanelController: We pass the imageFile to the ChatMessageContextUtil.createContext method.
ActionButtonsPanel: We added a getter for the imageFile and set the imageFile when an image is selected.

Key Points:

Base64 Encoding: We're using base64 encoding for the image data, which is a common way to embed images directly into text-based formats like JSON.
Image Format: We're assuming the image is in PNG format for simplicity. You might need to add logic to handle other formats.
Error Handling: The try-catch block in MessageCreationService handles potential IOExceptions when reading the image file.
Multimodal LLM Support: This setup allows you to send both text and image data to a multimodal LLM. The LLM will then be able to process both types of content.

Next Steps:

Test with a Multimodal LLM: Ensure that your chosen LLM provider (e.g., OpenAI's GPT-4 Vision Preview) is correctly receiving and processing the image data.
Add UI Feedback: Implement a preview of the selected image in the UI.
Add Clear Image Option: Add a button to remove the selected image.
Handle Different Image Formats: Add support for other image formats (JPEG, GIF, etc.) if needed.
Optimize Image Handling: For large images, consider resizing them before sending them to the LLM to reduce token usage and processing time.

This detailed integration should enable you to send images along with your text prompts to multimodal LLMs in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider for the exact format they expect for image data.

stephanj · 2024-12-12T17:45:06Z

The add images

stephanj · 2024-12-16T12:53:28Z

I've made a POC which tests Google Gemini, Claude and OpenAI @ https://github.com/stephanj/DevoxxGenieLangchain4JImagePOC

The plan is to support images in DevoxxGenie, so developers can include screenshots or images in their questions to get more accurate answers

samkerr4coding · 2024-12-16T14:26:35Z

This is a great idea, the trickiest part is that there is no standard or programatic (API based) procedure to know if a random LLM supports multi-modal querying

stephanj added the enhancement New feature or request label Dec 10, 2024

stephanj changed the title ~~Support upload of images to a multimodal LLM~~ [FEATURE REQUEST] Support upload of images to a multimodal LLM Dec 10, 2024

stephanj self-assigned this Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

stephanj commented Dec 10, 2024

stephanj commented Dec 12, 2024

stephanj commented Dec 12, 2024

stephanj commented Dec 16, 2024

samkerr4coding commented Dec 16, 2024

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

Comments

stephanj commented Dec 10, 2024

stephanj commented Dec 12, 2024

stephanj commented Dec 12, 2024

stephanj commented Dec 16, 2024

samkerr4coding commented Dec 16, 2024