Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

Open
stephanj opened this issue Dec 10, 2024 · 4 comments
Open

[FEATURE REQUEST] Support upload of images to a multimodal LLM #361

stephanj opened this issue Dec 10, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@stephanj
Copy link
Contributor

Would be nice to also support image (screenshot) uploads for multimodal enabled LLMs.

@stephanj stephanj added the enhancement New feature or request label Dec 10, 2024
@stephanj stephanj changed the title Support upload of images to a multimodal LLM [FEATURE REQUEST] Support upload of images to a multimodal LLM Dec 10, 2024
@stephanj
Copy link
Contributor Author

Google Gemini Exp 1206 suggestions:

Strategy to implement an "Add Image" button next to the "Add File" button in your DevoxxGenieIDEAPlugin, along with the considerations for handling multimodal LLMs:

1. Update the ActionButtonsPanel

  • Create the "Add Image" Button:

    // In ActionButtonsPanel.java, inside the createButtons() method:
    private @NotNull JButton createAddImageButton() {
        JButton button = new JHoverButton(AddImageIcon, true); // Assuming you have an AddImageIcon
        button.setToolTipText("Add image to prompt context");
        button.addActionListener(this::selectImageForPromptContext);
        button.setMinimumSize(minSize);
        button.setMaximumSize(maxSize);
        return button;
    }
  • Add the Button to the Panel:

    // In ActionButtonsPanel.java, inside the createButtonPanel() method:
    private @NotNull JPanel createButtonPanel() {
        JPanel buttonPanel = new JPanel(new GridLayout(1, 5, 5, 0)); // Increase columns for new button
        buttonPanel.add(submitBtn);
        buttonPanel.add(calcTokenCostBtn);
        buttonPanel.add(addProjectBtn);
        buttonPanel.add(addFileBtn);
        buttonPanel.add(addImageBtn); // Add the new button here
        return buttonPanel;
    }
  • Declare the addImageBtn:

    // In ActionButtonsPanel.java, add as a class member:
    private JButton addImageBtn;
  • Initialize the Button:

    // In ActionButtonsPanel.java, inside the createButtons() method:
    addImageBtn = createAddImageButton();

2. Implement Image Selection Logic

  • Create selectImageForPromptContext method:
    // In ActionButtonsPanel.java
    private void selectImageForPromptContext(ActionEvent e) {
        JFileChooser fileChooser = new JFileChooser();
        fileChooser.setFileFilter(new FileNameExtensionFilter("Image files", "png", "jpg", "jpeg", "gif"));
        int result = fileChooser.showOpenDialog(this);
        if (result == JFileChooser.APPROVE_OPTION) {
            File selectedFile = fileChooser.getSelectedFile();
            VirtualFile virtualFile = LocalFileSystem.getInstance().findFileByIoFile(selectedFile);
            if (virtualFile != null) {
                // Add the image to your context (similar to how you handle files)
                // ... (Implementation to handle the image file) ...
                NotificationUtil.sendNotification(project, "Image added to prompt context: " + virtualFile.getName());
            } else {
                NotificationUtil.sendNotification(project, "Could not add image to prompt context");
            }
        }
    }

3. Handle Image Data in ChatMessageContext

  • Modify ChatMessageContext: You'll need a way to store the selected image data in your ChatMessageContext. You could add a field like:
    // In ChatMessageContext.java
    @Getter
    @Setter
    private VirtualFile imageFile;

4. Adapt MessageCreationService

  • Modify createUserMessageWithContextAsync:
    // In MessageCreationService.java
    public @NotNull CompletableFuture<String> createUserPromptWithContextAsync(Project project, String userPrompt, @Nullable VirtualFile imageFile, List<VirtualFile> files) {
        return CompletableFuture.supplyAsync(() -> {
            StringBuilder userPromptContext = new StringBuilder();
    
            // ... (Existing code to handle files) ...
    
            // Add image handling
            if (imageFile != null) {
                try {
                    byte[] imageData = imageFile.contentsToByteArray();
                    String base64Image = Base64.getEncoder().encodeToString(imageData);
                    userPromptContext.append("Image: ").append(imageFile.getName()).append("\n");
                    userPromptContext.append("ImageData: data:image/png;base64,").append(base64Image).append("\n"); // Assuming PNG format for simplicity
                } catch (IOException e) {
                    NotificationUtil.sendNotification(project, "Error reading image file: " + imageFile.getName());
                }
            }
    
            userPromptContext.append(userPrompt);
            return userPromptContext.toString();
        });
    }

5. Update ChatMessageContextUtil

  • Modify createContext:
    // In ChatMessageContextUtil.java
    public static @NotNull ChatMessageContext createContext(Project project,
                                                            String userPromptText,
                                                            LanguageModel languageModel,
                                                            ChatModelProvider chatModelProvider,
                                                            @NotNull String actionCommand,
                                                            EditorFileButtonManager editorFileButtonManager,
                                                            String projectContext,
                                                            boolean isProjectContextAdded,
                                                            VirtualFile imageFile) { // Add imageFile parameter
    
        // ... (Existing code) ...
    
        context.setImageFile(imageFile); // Set the image file in the context
    
        // ... (Rest of the existing code) ...
    }

6. Update ActionButtonsPanelController

  • Modify handlePromptSubmission:
    // In ActionButtonsPanelController.java
    public boolean handlePromptSubmission(String actionCommand,
                                          boolean isProjectContextAdded,
                                          String projectContext,
                                          VirtualFile imageFile) { // Add imageFile parameter
    
        // ... (Existing code) ...
    
        ChatMessageContext currentChatMessageContext =
                ChatMessageContextUtil.createContext(project,
                        userPromptText,
                        getSelectedLanguageModel(),
                        chatModelProvider,
                        actionCommand,
                        editorFileButtonManager,
                        projectContext,
                        isProjectContextAdded,
                        imageFile); // Pass imageFile to createContext
    
        // ... (Rest of the existing code) ...
    }

7. Update PromptExecutionController

  • Modify handlePromptSubmission:
    // In PromptExecutionController.java
    public boolean handlePromptSubmission(ActionEvent actionEvent) {
        // ... (Existing code) ...
    
        VirtualFile imageFile = actionButtonsPanel.getImageFile(); // Get the image file from ActionButtonsPanel
    
        boolean response = promptExecutionController.handlePromptSubmission(actionEvent.getActionCommand(),
                projectContextController.isProjectContextAdded(),
                projectContextController.getProjectContext(),
                imageFile); // Pass imageFile
    
        // ... (Rest of the existing code) ...
    }

8. Considerations for Multimodal LLMs

  • Model Selection: Ensure that the selected LLM in llmProviderPanel supports image input. You might need to add a way to filter or indicate multimodal capabilities in your LanguageModel and ModelProvider enums.
  • API-Specific Handling: The way you send image data to the LLM will depend on the specific API you're using (OpenAI, Google Gemini, etc.). You'll need to consult their documentation for the correct format (e.g., base64 encoded image, URL to an image).
  • Error Handling: Implement robust error handling for cases where the image cannot be loaded, is in an unsupported format, or if the LLM provider does not support multimodal input.

9. UI/UX Enhancements

  • Image Preview: Consider adding a small preview of the selected image in the ActionButtonsPanel or somewhere in the UI to provide visual feedback to the user.
  • Clear Image: Add a way to clear the selected image (e.g., a small "x" button next to the preview).

Example using OpenAI's Vision API (Conceptual)

If you were using OpenAI's Vision API, your MessageCreationService might need to construct a UserMessage like this (refer to their documentation for the exact format):

// ... inside MessageCreationService
List<ChatMessageContent> contents = new ArrayList<>();
contents.add(TextContent.from(userPrompt));

if (chatMessageContext.getImageFile() != null) {
    byte[] imageData = chatMessageContext.getImageFile().contentsToByteArray();
    String base64Image = Base64.getEncoder().encodeToString(imageData);
    contents.add(ImageContent.from(new ImageContent.ImageUrl("data:image/png;base64," + base64Image)));
}

UserMessage userMessage = new UserMessage(contents);

Important Notes:

  • Error Handling: I've added some basic error handling, but you'll need to expand this to cover various scenarios (file not found, invalid image format, etc.).
  • API-Specific Code: The code for sending the image data to the LLM will be highly dependent on the specific API you are using. I've provided a conceptual example using base64 encoding, but you'll need to adapt it.
  • Testing: Thoroughly test the image upload, prompt construction, and LLM interaction with different image types and sizes.

This comprehensive strategy should provide a solid foundation for implementing the "Add Image" functionality in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider to ensure you're using their API correctly.


Gemini Flash 2.0 EXP review after it was told how the LangChain4J API supports images...

Okay, let's integrate the Langchain4j image handling into your DevoxxGenieIDEAPlugin, building upon the previous response. We'll focus on adapting the MessageCreationService and ChatMessageContext to support ImageContent.

1. Update ChatMessageContext

  • Add imageFile field: We already added this in the previous response, but let's make sure it's there and has the correct type:
    // In ChatMessageContext.java
    @Getter
    @Setter
    private VirtualFile imageFile;

2. Adapt MessageCreationService

  • Modify createUserMessage: This is where we'll construct the UserMessage with TextContent and ImageContent if an image is present.
    // In MessageCreationService.java
    import dev.langchain4j.data.message.ChatMessageContent;
    import dev.langchain4j.data.message.ImageContent;
    import dev.langchain4j.data.message.TextContent;
    import java.util.ArrayList;
    import java.util.Base64;
    import java.util.List;
    import java.io.IOException;
    
    // ... (Existing code) ...
    
    @NotNull
    public UserMessage createUserMessage(@NotNull ChatMessageContext chatMessageContext) {
        List<ChatMessageContent> contents = new ArrayList<>();
    
        // Add system prompt for OpenAI o1 models
        if (ChatMessageContextUtil.isOpenAIo1Model(chatMessageContext.getLanguageModel())) {
            String systemPrompt = DevoxxGenieStateService.getInstance().getSystemPrompt();
            contents.add(TextContent.from("<SystemPrompt>" + systemPrompt + "</SystemPrompt>\n\n"));
        }
    
        // If git diff is enabled, add special instructions
        if (Boolean.TRUE.equals(DevoxxGenieStateService.getInstance().getGitDiffActivated())) {
            contents.add(TextContent.from("<DiffInstructions>" + GIT_DIFF_INSTRUCTIONS + "</DiffInstructions>\n\n"));
        } else if (Boolean.TRUE.equals(DevoxxGenieStateService.getInstance().getRagActivated())) {
            // Semantic search is enabled, add search results
            String semanticContext = addSemanticSearchResults(chatMessageContext);
            if (!semanticContext.isEmpty()) {
                contents.add(TextContent.from("<SemanticContext>\n" + semanticContext + "\n</SemanticContext>"));
            }
        }
    
        // Add the user's prompt
        contents.add(TextContent.from("<UserPrompt>" + chatMessageContext.getUserPrompt() + "</UserPrompt>\n\n"));
    
        // Add editor content or selected text
        String editorContent = getEditorContentOrSelectedText(chatMessageContext);
        if (!editorContent.isEmpty()) {
            contents.add(TextContent.from("<EditorContext>\n" + editorContent + "\n</EditorContext>\n\n"));
        }
    
        // Add image content if present
        if (chatMessageContext.getImageFile() != null) {
            try {
                byte[] imageData = chatMessageContext.getImageFile().contentsToByteArray();
                String base64Image = Base64.getEncoder().encodeToString(imageData);
                contents.add(ImageContent.from("data:image/png;base64," + base64Image)); // Assuming PNG format
            } catch (IOException e) {
                LOG.warning("Error reading image file: " + chatMessageContext.getImageFile().getName());
                // Handle the exception, maybe add a placeholder or log the error
            }
        }
    
        UserMessage userMessage = new UserMessage(contents);
        chatMessageContext.setUserMessage(userMessage);
        return userMessage;
    }

3. Update ChatMessageContextUtil

  • Modify createContext: Add the imageFile parameter to the method signature and pass it to the ChatMessageContext builder.
    // In ChatMessageContextUtil.java
    public static @NotNull ChatMessageContext createContext(Project project,
                                                            String userPromptText,
                                                            LanguageModel languageModel,
                                                            ChatModelProvider chatModelProvider,
                                                            @NotNull String actionCommand,
                                                            EditorFileButtonManager editorFileButtonManager,
                                                            String projectContext,
                                                            boolean isProjectContextAdded,
                                                            VirtualFile imageFile) { // Add imageFile parameter
    
        // ... (Existing code) ...
    
        ChatMessageContext context = ChatMessageContext.builder()
            // ... (Existing code) ...
            .imageFile(imageFile) // Set the image file in the context
            // ... (Rest of the existing code) ...
            .build();
    
        // ... (Rest of the existing code) ...
    }

4. Update ActionButtonsPanelController

  • Modify handlePromptSubmission: Pass the imageFile to the ChatMessageContextUtil.createContext method.
    // In ActionButtonsPanelController.java
    public boolean handlePromptSubmission(String actionCommand,
                                          boolean isProjectContextAdded,
                                          String projectContext) {
    
        // ... (Existing code) ...
    
        ChatMessageContext currentChatMessageContext =
                ChatMessageContextUtil.createContext(project,
                        userPromptText,
                        getSelectedLanguageModel(),
                        chatModelProvider,
                        actionCommand,
                        editorFileButtonManager,
                        projectContext,
                        isProjectContextAdded,
                        imageFile); // Pass imageFile to createContext
    
        // ... (Rest of the existing code) ...
    }

5. Update ActionButtonsPanel

  • Add imageFile getter:

    // In ActionButtonsPanel.java
    @Getter
    private VirtualFile imageFile;
  • Modify selectImageForPromptContext:

    // In ActionButtonsPanel.java
    private void selectImageForPromptContext(ActionEvent e) {
        JFileChooser fileChooser = new JFileChooser();
        fileChooser.setFileFilter(new FileNameExtensionFilter("Image files", "png", "jpg", "jpeg", "gif"));
        int result = fileChooser.showOpenDialog(this);
        if (result == JFileChooser.APPROVE_OPTION) {
            File selectedFile = fileChooser.getSelectedFile();
            VirtualFile virtualFile = LocalFileSystem.getInstance().findFileByIoFile(selectedFile);
            if (virtualFile != null) {
                this.imageFile = virtualFile;
                NotificationUtil.sendNotification(project, "Image added to prompt context: " + virtualFile.getName());
            } else {
                NotificationUtil.sendNotification(project, "Could not add image to prompt context");
            }
        }
    }

Explanation of Changes:

  • ChatMessageContext: We added a VirtualFile imageFile field to store the selected image.
  • MessageCreationService:
    • We now construct a List<ChatMessageContent> to hold both text and image content.
    • If an imageFile is present in the ChatMessageContext, we read the image data, encode it as a base64 string, and create an ImageContent object.
    • We then add both the TextContent and ImageContent to the UserMessage.
  • ChatMessageContextUtil: We added the imageFile parameter to the createContext method and pass it to the ChatMessageContext builder.
  • ActionButtonsPanelController: We pass the imageFile to the ChatMessageContextUtil.createContext method.
  • ActionButtonsPanel: We added a getter for the imageFile and set the imageFile when an image is selected.

Key Points:

  • Base64 Encoding: We're using base64 encoding for the image data, which is a common way to embed images directly into text-based formats like JSON.
  • Image Format: We're assuming the image is in PNG format for simplicity. You might need to add logic to handle other formats.
  • Error Handling: The try-catch block in MessageCreationService handles potential IOExceptions when reading the image file.
  • Multimodal LLM Support: This setup allows you to send both text and image data to a multimodal LLM. The LLM will then be able to process both types of content.

Next Steps:

  1. Test with a Multimodal LLM: Ensure that your chosen LLM provider (e.g., OpenAI's GPT-4 Vision Preview) is correctly receiving and processing the image data.
  2. Add UI Feedback: Implement a preview of the selected image in the UI.
  3. Add Clear Image Option: Add a button to remove the selected image.
  4. Handle Different Image Formats: Add support for other image formats (JPEG, GIF, etc.) if needed.
  5. Optimize Image Handling: For large images, consider resizing them before sending them to the LLM to reduce token usage and processing time.

This detailed integration should enable you to send images along with your text prompts to multimodal LLMs in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider for the exact format they expect for image data.

@stephanj
Copy link
Contributor Author

The add images
addImage_dark
addImage

@stephanj stephanj self-assigned this Dec 14, 2024
@stephanj
Copy link
Contributor Author

I've made a POC which tests Google Gemini, Claude and OpenAI @ https://github.com/stephanj/DevoxxGenieLangchain4JImagePOC

The plan is to support images in DevoxxGenie, so developers can include screenshots or images in their questions to get more accurate answers

@samkerr4coding
Copy link
Contributor

This is a great idea, the trickiest part is that there is no standard or programatic (API based) procedure to know if a random LLM supports multi-modal querying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants