-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images are not visible in the console and are invalid in the ChatML / string representation. #1077
Comments
If this helps, I've implemented a hook, so the URLs get expanded, as a workaround for vLLM / OpenAI hosted VLMs: def hook(request):
j = json.loads(request.content)
def process_content(input_str):
# Regular expression to split text and img tags, capturing the image URL.
parts = re.split(r'(<img\s+[^>]*src=[\'"]([hfd].*?)[\'"][^>]*>)', input_str)
content, image_url = [], None
for i in range(len(parts)):
if i % 3 == 0: # Text part (regular text before or after <img> tags)
text_part = parts[i].strip()
if text_part:
content.append({"type": "text", "text": text_part})
elif i % 3 == 2: # Image URL part (captured inside the <img> tag)
image_url = parts[i].strip()
content.append({"type": "image_url", "image_url": {"url": image_url}})
if not image_url:
return input_str
return content
# Split the content of the message into text and image_url parts
for message in j['messages']:
if isinstance(message["content"], str):
message["content"] = process_content(message["content"])
modified_content = json.dumps(j).encode('utf-8')
request.stream = httpx.ByteStream(modified_content)
request.headers['Content-Length'] = str(len(modified_content)) It can be used with: ...
import httpx, json
llm = models.OpenAI(...
http_client=httpx.Client(event_hooks={'request': [hook]}))
llm = llm + f"Is there a dog on this image? " + f"<img src=' ... '>" + "?"
... |
Thank you for the feedback. We are actively working on a full-stack rework of multimodal support in Guidance. Part of this rework involves reformatting the way that prompt data is represented internally, which should enable us to have more flexibility in how we present the data to users. Can you share more info on how you are trying to use the output of |
I expect str(lm) to result in a valid ChatML / Markdown string, like it was
in the text-only case. It seems that ChatML was designed to work well in
Markdown (de-facto standard for LLM output formatting), I was assuming that
Guidance also used that convention.
I apply a small bugfix to str(lm) output, re-arranging spacing around the
tags, so they'd render the same in the Markdown viewer as in the ChatML
documentation
<https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chat-markup-language#working-with-chat-markup-language-chatml>.
And
for LaTeX, formatting, images or videos I use either Markdown/HTML tags.
This allows the use of a rich ecosystem of Markdown parsers, renderers and
editors. In the case of images, I prefer to use <img> tag, as they are
supported in all Markdown implementations and allow to use richer src URI
syntax, including local file, b64 data or URLs.
<|im_start|>system
Provide some context and/or instructions to the model.
<|im_end|>
<|im_start|>user
The user’s message goes here
<|im_end|>
<|im_start|>assistant
|
The bug
While images are visible in Jupyter, the ChatML/string representation becomes invalid once the runtime stops, and the images are not visible in the console.
The current image representation is a custom ChatML tag - <|_image:94590504830032|> with the number valid only during runtime, referencing the local runtime object. As a result, the produced ChatML log
str(lm)
is effectively invalid. Note also, <|_image:xxx|> is not a valid markdown or HTML, and as a result, regular markdown viewers that can view the rest of the ChatML exchange can't display images.It'd be good to normalize the representation and use valid Markdown tags
<img src="https://...">
or<img src="file://">
or: <img src="data:image/jpeg
instead of newly introduced<|_image: |>
tag.To Reproduce
System info (please complete the following information):
The text was updated successfully, but these errors were encountered: