#

visual-question-answering

Here are 190 public repositories matching this topic...

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

Updated Aug 5, 2024
Jupyter Notebook

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese image-captioning pretrained-models visual-question-answering multimodal text-to-image-synthesis vision-language pretraining referring-expression-comprehension prompt-tuning

Updated Apr 24, 2024
Python

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

caffe vqa faster-rcnn image-captioning captioning-images mscoco mscoco-dataset visual-question-answering

Updated Feb 3, 2023
Jupyter Notebook

lucidrains / flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

deep-learning transformers artificial-intelligence attention-mechanism visual-question-answering

Updated Oct 18, 2022
Python

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated Nov 11, 2024

jnhwkim / ban-vqa

Bilinear attention networks for visual question answering

attention visual-question-answering bilinear-pooling pytorch-implmention

Updated Oct 30, 2023
Python

MILVLG / mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

attention visual-reasoning visual-question-answering

Updated Dec 16, 2020
Python

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated Dec 29, 2024
Python

zjukg / KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

information-extraction survey knowledge-graph awsome image-classification image-generation surveys entity-linking knowledge-graph-embeddings visual-question-answering entity-alignment paper-list awsome-list cross-modal-retrieval multi-modal-learning multi-modal-fusion large-language-models multi-modal-knowledge-graph

Updated Dec 10, 2024

davidmascharka / tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

visualization machine-learning deep-learning pytorch neural-networks vqa visual-question-answering

Updated Dec 7, 2021
Jupyter Notebook

MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research

benchmark deep-learning pytorch vqa visual-question-answering

Updated Sep 3, 2021
Python

MILVLG / prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa

Updated May 23, 2023
Python

lupantech / MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

science machine-learning mathematics visual-question-answering mathqa large-language-models ai4math large-multimadality-models

Updated Nov 29, 2024
Jupyter Notebook

Cyanogenoid / pytorch-vqa

Strong baseline for visual question answering

pytorch vqa baseline visual-question-answering

Updated Mar 13, 2023
Python

HanXinzi-AI / awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Updated Jun 3, 2024

qiantianwen / NuScenes-QA

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

autonomous-driving visual-question-answering vision-language

Updated Nov 1, 2024
Python

markdtw / vqa-winner-cvprw-2017

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

pytorch visual-question-answering

Updated Feb 8, 2019
Python

MMStar-Benchmark / MMStar

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation multimodality multimodal-learning visual-question-answering multimodal large-language-models llm llms large-vision-language-model large-vision-language-models large-multimodal-models lvlms lvlm

Updated Sep 26, 2024
Python

antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa video-understanding weakly-supervised-learning multimodal-learning visual-question-answering vision-and-language videoqa pre-training video-question-answering large-language-models

Updated Dec 9, 2024
Python

Improve this page

Add a description, image, and links to the visual-question-answering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the visual-question-answering topic, visit your repo's landing page and select "manage topics."