vision_server

Vision server for Ditto assistant clients. Holds models for image captioning and image Q/A. This server is used by the Image LLM Agent in nlp_server.

Installation and Running Locally

navigate into the project directory and run pip install -r requirements.txt
run python main.py to start the vision server. The server will be running on port 22032.

Image Captioning and Q/A Models

The image captioning and Q/A models are used when assistant clients are prompted to describe an image. The image captioning model is used to generate a caption for the image, and the Q/A model is used to answer questions about the image. The models used are hosted on HuggingFace's model hub. The models are downloaded and cached locally when the vision server is started.

LLM Agent over an Image (Demonstration)

Below is a demonstration of how assistant can use the nlp_server's Image (Retrieval Augmented Generation) RAG Agent to carry out a conversation over an image. The RAG Agent is a few-shot prompting model that gives the agent access to the captioning and image Q/A models to generate responses. The RAG Agent is used when the user asks the assistant something requiring context from the assistan's environment.

Example Image

Conversation

Below shows the LLM Agent's thought process when answering a user's query about the image. The LLM Agent first generates a caption for the image, then uses the Q/A model to answer the user's questions about the image. The LLM Agent then uses the caption and Q/A model however it sees fit to generate a response to the user's query. Anything with <> surrounding it is a decision made by the LLM Agent.

User's Query: Can you describe this image? I want to know where it is, what time of day it is, and what the weather is like.

Caption Model: landscape with a river and a tree vector art illustration

LLM Agent's Decisions:
<QA> Where is the image taken?
<QA Response> outside
<QA> What is the weather like?
<QA Response> sunny
<QA> What time of day is it?
<QA Response> evening
<DONE> I see a landscape with a river and a tree. The image was taken outside during the evening. The weather is sunny.

Final Response: I see a landscape with a river and a tree. The image was taken outside during the evening. The weather is sunny.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
modules		modules
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
server.py		server.py
start_server.py		start_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vision_server

Installation and Running Locally

Image Captioning and Q/A Models

LLM Agent over an Image (Demonstration)

Example Image

Conversation

About

Releases

Packages

Languages

ditto-assistant/vision_server

Folders and files

Latest commit

History

Repository files navigation

vision_server

Installation and Running Locally

Image Captioning and Q/A Models

LLM Agent over an Image (Demonstration)

Example Image

Conversation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages