-
Notifications
You must be signed in to change notification settings - Fork 58
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from yashkant/update-py3
Add VisDial Code, Add VisDial-Captioning Worker, Remove Legacy Code
- Loading branch information
Showing
58 changed files
with
3,458 additions
and
2,134 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
[submodule "neuraltalk2"] | ||
path = neuraltalk2 | ||
url = https://github.com/karpathy/neuraltalk2.git | ||
[submodule "viscap/captioning/vqa-maskrcnn-benchmark"] | ||
path = viscap/captioning/vqa-maskrcnn-benchmark | ||
url = https://gitlab.com/yashkant/vqa-maskrcnn-benchmark/ | ||
[submodule "viscap/captioning/fastText"] | ||
path = viscap/captioning/fastText | ||
url = https://github.com/facebookresearch/fastText | ||
[submodule "viscap/captioning/pythia"] | ||
path = viscap/captioning/pythia | ||
url = https://github.com/facebookresearch/pythia/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,120 +1,133 @@ | ||
# Visual Chatbot | ||
|
||
## Introduction | ||
Visual Chatbot | ||
============ | ||
Demo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version see [tag]()**). | ||
|
||
Demo for the paper | ||
|
||
**[Visual Dialog][1]** | ||
**[Visual Dialog][1]** (CVPR 2017 [Spotlight][4]) </br> | ||
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra | ||
[arxiv.org/abs/1611.08669][1] | ||
[CVPR 2017][4] (Spotlight) | ||
|
||
Arxiv Link: [arxiv.org/abs/1611.08669][1] | ||
Live demo: http://visualchatbot.cloudcv.org | ||
|
||
[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot") | ||
|
||
Introduction | ||
--------------- | ||
**Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’! | ||
|
||
[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot") | ||
What has changed since the last version? | ||
--------------------------------------------------- | ||
The model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code. | ||
|
||
## Installation Instructions | ||
Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [tag](). | ||
|
||
### Installing the Essential requirements | ||
Setup and Dependencies | ||
------------------------------ | ||
Start with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6]. | ||
```sh | ||
sudo apt-get update | ||
|
||
```shell | ||
# download and install build essentials | ||
sudo apt-get install -y git python-pip python-dev | ||
sudo apt-get install -y python-dev | ||
sudo apt-get install -y autoconf automake libtool curl make g++ unzip | ||
sudo apt-get install -y autoconf automake libtool | ||
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev | ||
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler | ||
``` | ||
|
||
### Install Torch | ||
|
||
```shell | ||
git clone https://github.com/torch/distro.git ~/torch --recursive | ||
cd ~/torch; bash install-deps; | ||
./install.sh | ||
source ~/.bashrc | ||
``` | ||
|
||
### Install PyTorch(Python Lua Wrapper) | ||
|
||
```shell | ||
git clone https://github.com/hughperkins/pytorch.git | ||
cd pytorch | ||
source ~/torch/install/bin/torch-activate | ||
./build.sh | ||
``` | ||
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler | ||
|
||
### Install RabbitMQ and Redis Server | ||
|
||
```shell | ||
# download and install redis-server and rabbitmq-server | ||
sudo apt-get install -y redis-server rabbitmq-server | ||
sudo rabbitmq-plugins enable rabbitmq_management | ||
sudo service rabbitmq-server restart | ||
sudo service redis-server restart | ||
``` | ||
|
||
### Lua dependencies | ||
|
||
```shell | ||
luarocks install loadcaffe | ||
``` | ||
|
||
The below two dependencies are only required if you are going to use GPU | ||
#### Environment Setup | ||
|
||
```shell | ||
luarocks install cudnn | ||
luarocks install cunn | ||
``` | ||
|
||
### Cuda Installation | ||
You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below. | ||
|
||
Note: CUDA and cuDNN is only required if you are going to use GPU | ||
|
||
Download and install CUDA and cuDNN from [nvidia website](https://developer.nvidia.com/cuda-downloads) | ||
```sh | ||
# clone and download submodules | ||
git clone --recursive https://www.github.com/yashkant/visual-chatbot.git | ||
|
||
### Install dependencies | ||
# create and activate new environment | ||
conda create -n vischat python=3.6.8 | ||
conda activate vischat | ||
|
||
```shell | ||
git clone https://github.com/Cloud-CV/visual-chatbot.git | ||
cd visual-chatbot | ||
git submodule init && git submodule update | ||
sh models/download_models.sh | ||
# install the requirements of chatbot and visdial-starter code | ||
cd visual-chatbot/ | ||
pip install -r requirements.txt | ||
``` | ||
|
||
If you have not used nltk before, you will need to download a tokenization model. | ||
#### Downloads | ||
Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files. | ||
```sh | ||
sh viscap/download_models.sh | ||
``` | ||
|
||
```shell | ||
python -m nltk.downloader punkt | ||
#### Install Submodules | ||
Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction. | ||
```sh | ||
# install fastText (dependency of pythia) | ||
cd viscap/captioning/fastText | ||
pip install -e . | ||
|
||
# install pythia for using butd model | ||
cd ../pythia/ | ||
sed -i '/torch/d' requirements.txt | ||
pip install -e . | ||
|
||
# install maskrcnn-benchmark for feature extraction | ||
cd ../vqa-maskrcnn-benchmark/ | ||
python setup.py build | ||
python setup.py develop | ||
cd ../../../ | ||
``` | ||
#### Cuda Installation | ||
|
||
Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following: | ||
Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18]. | ||
|
||
```shell | ||
local utils = require 'neuraltalk2.misc.utils' | ||
local net_utils = require 'neuraltalk2.misc.net_utils' | ||
local LSTM = require 'neuraltalk2.misc.LSTM' | ||
#### NLTK | ||
We use `PunktSentenceTokenizer` from nltk, download it if you haven't already. | ||
```sh | ||
python -c "import nltk; nltk.download('punkt')" | ||
``` | ||
|
||
### Create the database | ||
|
||
```shell | ||
## Let's run this now! | ||
#### Setup the database | ||
``` | ||
# create the database | ||
python manage.py makemigrations chat | ||
python manage.py migrate | ||
``` | ||
#### Run server and worker | ||
Launch two separate terminals and run worker and server code. | ||
```sh | ||
# run rabbitmq worker on first terminal | ||
# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing | ||
python worker_viscap.py | ||
|
||
# run development server on second terminal | ||
python manage.py runserver | ||
``` | ||
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully. | ||
|
||
### Running the RabbitMQ workers and Development Server | ||
## Issues | ||
If you run into incompatibility issues, please take a look [here][7] and [here][8]. | ||
|
||
Open 3 different terminal sessions and run the following commands: | ||
## Model Checkpoint and Features Used | ||
Performance on `v1.0 test-std` (trained on `v1.0` train + val): | ||
|
||
```shell | ||
python worker.py | ||
python worker_captioning.py | ||
python manage.py runserver | ||
``` | ||
Model | R@1 | R@5 | R@10 | MeanR | MRR | NDCG | | ||
------- | ------ | ------ | ------ | ------ | ------ | ------ | | ||
[lf-gen-mask-rcnn-x101-demo][20] | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 | | ||
|
||
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully. | ||
Extracted features from `VisDial v1.0` used to train the above model are here: | ||
|
||
- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split. | ||
- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split. | ||
- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split. | ||
|
||
*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`. | ||
|
||
## Cite this work | ||
|
||
|
@@ -131,24 +144,43 @@ If you find this code useful, consider citing our work: | |
``` | ||
|
||
## Contributors | ||
|
||
* [Yash Kant][19] ([email protected]) | ||
* [Deshraj Yadav][2] ([email protected]) | ||
* [Abhishek Das][3] ([email protected]) | ||
|
||
## License | ||
|
||
BSD | ||
|
||
## Helpful Issues | ||
Problems installing uwsgi: https://github.com/unbit/uwsgi/issues/1770 | ||
|
||
Problems with asgiref: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer | ||
## Credits | ||
## Credits and Acknowledgements | ||
|
||
- Visual Chatbot Image: "[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) | ||
|
||
- The beam-search implementation was borrowed as it is from [AllenNLP](15). | ||
- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository. | ||
- The VisDial model is borrowed from [visdial-starter-challenge ][14]. | ||
- The BUTD captioning model comes from this awesome repository [Pythia][10]. | ||
|
||
[1]: https://arxiv.org/abs/1611.08669 | ||
[2]: http://deshraj.github.io | ||
[3]: https://abhishekdas.com | ||
[4]: http://cvpr2017.thecvf.com/ | ||
[5]: https://redis.io/ | ||
[6]: https://www.rabbitmq.com/ | ||
[7]: https://github.com/unbit/uwsgi/issues/1770 | ||
[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer | ||
[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark) | ||
[10]: https://github.com/facebookresearch/pythia/ | ||
[11]: https://github.com/facebookresearch/fastText/ | ||
[12]: https://arxiv.org/abs/1707.07998 | ||
[13]: https://github.com/facebookresearch/maskrcnn-benchmark | ||
[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/ | ||
[15]: https://www.github.com/allenai/allennlp | ||
[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/ | ||
[17]: https://conda.io/docs/user-guide/install/download.html | ||
[18]: https://developer.nvidia.com/cuda-downloads | ||
[19]: https://github.com/yashkant | ||
[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth | ||
[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5 | ||
[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5 | ||
[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5 | ||
|
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.