Skip to content

Commit

Permalink
Merge pull request #2 from yashkant/update-py3
Browse files Browse the repository at this point in the history
Add VisDial Code, Add VisDial-Captioning Worker, Remove Legacy Code
  • Loading branch information
yashkant authored Aug 21, 2019
2 parents ad95136 + e5d26d5 commit 73af9f7
Show file tree
Hide file tree
Showing 58 changed files with 3,458 additions and 2,134 deletions.
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
data/
# Demo
/data/
media/
viscap/captioning/detectron/
viscap/captioning/model_data/
viscap/checkpoints/
viscap/data/

*.pyc
db.sqlite3
Expand Down
12 changes: 9 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
[submodule "neuraltalk2"]
path = neuraltalk2
url = https://github.com/karpathy/neuraltalk2.git
[submodule "viscap/captioning/vqa-maskrcnn-benchmark"]
path = viscap/captioning/vqa-maskrcnn-benchmark
url = https://gitlab.com/yashkant/vqa-maskrcnn-benchmark/
[submodule "viscap/captioning/fastText"]
path = viscap/captioning/fastText
url = https://github.com/facebookresearch/fastText
[submodule "viscap/captioning/pythia"]
path = viscap/captioning/pythia
url = https://github.com/facebookresearch/pythia/
198 changes: 115 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,120 +1,133 @@
# Visual Chatbot

## Introduction
Visual Chatbot
============
Demo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version see [tag]()**).

Demo for the paper

**[Visual Dialog][1]**
**[Visual Dialog][1]** (CVPR 2017 [Spotlight][4]) </br>
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
[arxiv.org/abs/1611.08669][1]
[CVPR 2017][4] (Spotlight)

Arxiv Link: [arxiv.org/abs/1611.08669][1]
Live demo: http://visualchatbot.cloudcv.org

[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")

Introduction
---------------
**Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!

[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
What has changed since the last version?
---------------------------------------------------
The model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code.

## Installation Instructions
Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [tag]().

### Installing the Essential requirements
Setup and Dependencies
------------------------------
Start with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6].
```sh
sudo apt-get update

```shell
# download and install build essentials
sudo apt-get install -y git python-pip python-dev
sudo apt-get install -y python-dev
sudo apt-get install -y autoconf automake libtool curl make g++ unzip
sudo apt-get install -y autoconf automake libtool
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
```

### Install Torch

```shell
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
source ~/.bashrc
```

### Install PyTorch(Python Lua Wrapper)

```shell
git clone https://github.com/hughperkins/pytorch.git
cd pytorch
source ~/torch/install/bin/torch-activate
./build.sh
```
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

### Install RabbitMQ and Redis Server

```shell
# download and install redis-server and rabbitmq-server
sudo apt-get install -y redis-server rabbitmq-server
sudo rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart
sudo service redis-server restart
```

### Lua dependencies

```shell
luarocks install loadcaffe
```

The below two dependencies are only required if you are going to use GPU
#### Environment Setup

```shell
luarocks install cudnn
luarocks install cunn
```

### Cuda Installation
You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below.

Note: CUDA and cuDNN is only required if you are going to use GPU

Download and install CUDA and cuDNN from [nvidia website](https://developer.nvidia.com/cuda-downloads)
```sh
# clone and download submodules
git clone --recursive https://www.github.com/yashkant/visual-chatbot.git

### Install dependencies
# create and activate new environment
conda create -n vischat python=3.6.8
conda activate vischat

```shell
git clone https://github.com/Cloud-CV/visual-chatbot.git
cd visual-chatbot
git submodule init && git submodule update
sh models/download_models.sh
# install the requirements of chatbot and visdial-starter code
cd visual-chatbot/
pip install -r requirements.txt
```

If you have not used nltk before, you will need to download a tokenization model.
#### Downloads
Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.
```sh
sh viscap/download_models.sh
```

```shell
python -m nltk.downloader punkt
#### Install Submodules
Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction.
```sh
# install fastText (dependency of pythia)
cd viscap/captioning/fastText
pip install -e .

# install pythia for using butd model
cd ../pythia/
sed -i '/torch/d' requirements.txt
pip install -e .

# install maskrcnn-benchmark for feature extraction
cd ../vqa-maskrcnn-benchmark/
python setup.py build
python setup.py develop
cd ../../../
```
#### Cuda Installation

Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following:
Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18].

```shell
local utils = require 'neuraltalk2.misc.utils'
local net_utils = require 'neuraltalk2.misc.net_utils'
local LSTM = require 'neuraltalk2.misc.LSTM'
#### NLTK
We use `PunktSentenceTokenizer` from nltk, download it if you haven't already.
```sh
python -c "import nltk; nltk.download('punkt')"
```

### Create the database

```shell
## Let's run this now!
#### Setup the database
```
# create the database
python manage.py makemigrations chat
python manage.py migrate
```
#### Run server and worker
Launch two separate terminals and run worker and server code.
```sh
# run rabbitmq worker on first terminal
# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing
python worker_viscap.py

# run development server on second terminal
python manage.py runserver
```
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.

### Running the RabbitMQ workers and Development Server
## Issues
If you run into incompatibility issues, please take a look [here][7] and [here][8].

Open 3 different terminal sessions and run the following commands:
## Model Checkpoint and Features Used
Performance on `v1.0 test-std` (trained on `v1.0` train + val):

```shell
python worker.py
python worker_captioning.py
python manage.py runserver
```
Model | R@1 | R@5 | R@10 | MeanR | MRR | NDCG |
------- | ------ | ------ | ------ | ------ | ------ | ------ |
[lf-gen-mask-rcnn-x101-demo][20] | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |

You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
Extracted features from `VisDial v1.0` used to train the above model are here:

- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split.
- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split.
- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split.

*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`.

## Cite this work

Expand All @@ -131,24 +144,43 @@ If you find this code useful, consider citing our work:
```

## Contributors

* [Yash Kant][19] ([email protected])
* [Deshraj Yadav][2] ([email protected])
* [Abhishek Das][3] ([email protected])

## License

BSD

## Helpful Issues
Problems installing uwsgi: https://github.com/unbit/uwsgi/issues/1770

Problems with asgiref: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
## Credits
## Credits and Acknowledgements

- Visual Chatbot Image: "[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)

- The beam-search implementation was borrowed as it is from [AllenNLP](15).
- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository.
- The VisDial model is borrowed from [visdial-starter-challenge ][14].
- The BUTD captioning model comes from this awesome repository [Pythia][10].

[1]: https://arxiv.org/abs/1611.08669
[2]: http://deshraj.github.io
[3]: https://abhishekdas.com
[4]: http://cvpr2017.thecvf.com/
[5]: https://redis.io/
[6]: https://www.rabbitmq.com/
[7]: https://github.com/unbit/uwsgi/issues/1770
[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)
[10]: https://github.com/facebookresearch/pythia/
[11]: https://github.com/facebookresearch/fastText/
[12]: https://arxiv.org/abs/1707.07998
[13]: https://github.com/facebookresearch/maskrcnn-benchmark
[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/
[15]: https://www.github.com/allenai/allennlp
[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/
[17]: https://conda.io/docs/user-guide/install/download.html
[18]: https://developer.nvidia.com/cuda-downloads
[19]: https://github.com/yashkant
[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth
[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5
[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5
[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5

100 changes: 0 additions & 100 deletions captioning.lua

This file was deleted.

Loading

0 comments on commit 73af9f7

Please sign in to comment.