-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Support upload of images to a multimodal LLM #361
Comments
Google Gemini Exp 1206 suggestions: Strategy to implement an "Add Image" button next to the "Add File" button in your DevoxxGenieIDEAPlugin, along with the considerations for handling multimodal LLMs: 1. Update the
2. Implement Image Selection Logic
3. Handle Image Data in
4. Adapt
5. Update
6. Update
7. Update
8. Considerations for Multimodal LLMs
9. UI/UX Enhancements
Example using OpenAI's Vision API (Conceptual) If you were using OpenAI's Vision API, your // ... inside MessageCreationService
List<ChatMessageContent> contents = new ArrayList<>();
contents.add(TextContent.from(userPrompt));
if (chatMessageContext.getImageFile() != null) {
byte[] imageData = chatMessageContext.getImageFile().contentsToByteArray();
String base64Image = Base64.getEncoder().encodeToString(imageData);
contents.add(ImageContent.from(new ImageContent.ImageUrl("data:image/png;base64," + base64Image)));
}
UserMessage userMessage = new UserMessage(contents); Important Notes:
This comprehensive strategy should provide a solid foundation for implementing the "Add Image" functionality in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider to ensure you're using their API correctly. Gemini Flash 2.0 EXP review after it was told how the LangChain4J API supports images... Okay, let's integrate the Langchain4j image handling into your DevoxxGenieIDEAPlugin, building upon the previous response. We'll focus on adapting the 1. Update
2. Adapt
3. Update
4. Update
5. Update
Explanation of Changes:
Key Points:
Next Steps:
This detailed integration should enable you to send images along with your text prompts to multimodal LLMs in your DevoxxGenieIDEAPlugin. Remember to consult the documentation for your chosen LLM provider for the exact format they expect for image data. |
I've made a POC which tests Google Gemini, Claude and OpenAI @ https://github.com/stephanj/DevoxxGenieLangchain4JImagePOC The plan is to support images in DevoxxGenie, so developers can include screenshots or images in their questions to get more accurate answers |
This is a great idea, the trickiest part is that there is no standard or programatic (API based) procedure to know if a random LLM supports multi-modal querying |
Would be nice to also support image (screenshot) uploads for multimodal enabled LLMs.
The text was updated successfully, but these errors were encountered: