Skip to content

Commit

Permalink
update recent version
Browse files Browse the repository at this point in the history
  • Loading branch information
igules committed Dec 26, 2022
1 parent 67dfc82 commit b495a25
Show file tree
Hide file tree
Showing 327 changed files with 59,612 additions and 14 deletions.
29 changes: 16 additions & 13 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,38 @@ __pycache__
_ext
*.pyc
*.so
maskrcnn_benchmark.egg-info/
build/
dist/
SGG_from_NLS/maskrcnn_benchmark.egg-info/
SGG_from_NLS/build/
SGG_from_NLS/dist/

# ipython/jupyter notebooks
#*.ipynb
**/.ipynb_checkpoints/

# checkpoint
checkpoint/
checkpoints/*/
visualization/*/
visualization/*.npy
visualization/*.json
SGG_from_NLS/checkpoint/
SGG_from_NLS/checkpoints/*/
SGG_from_NLS/visualization/*/
SGG_from_NLS/visualization/*.npy
SGG_from_NLS/visualization/*.json

# dataset
#datasets/vg/
datasets/vg/*.h5
datasets/vg/VG_100K
datasets/vg/cc_detection_results_oid
datasets/vg/COCO_detection_results_oid
datasets/vg/VG_detection_results_oid_original
SGG_from_NLS/datasets/vg/*.h5
SGG_from_NLS/datasets/vg/VG_100K
SGG_from_NLS/datasets/vg/cc_detection_results_oid
SGG_from_NLS/datasets/vg/COCO_detection_results_oid
SGG_from_NLS/datasets/vg/VG_detection_results_oid
SGG_from_NLS/datasets/vg/VG_detection_results_oid_original
SGG_from_NLS/preprocess/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12
*.h5
*.pt
*.png
*.jpg
*.jpeg
*.npy
*.npz
*.zip
glove/
!demo/*.png

Expand Down
1 change: 0 additions & 1 deletion SGG_from_NLS
Submodule SGG_from_NLS deleted from ef8146
8 changes: 8 additions & 0 deletions SGG_from_NLS/.flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# This is an example .flake8 config, used when developing *Black* itself.
# Keep in sync with setup.cfg which is used for source packages.

[flake8]
ignore = E203, E266, E501, W503
max-line-length = 80
max-complexity = 18
select = B,C,E,F,W,T4,B9
2 changes: 2 additions & 0 deletions SGG_from_NLS/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
50 changes: 50 additions & 0 deletions SGG_from_NLS/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# compilation and distribution
__pycache__
_ext
*.pyc
*.so
maskrcnn_benchmark.egg-info/
build/
dist/

# ipython/jupyter notebooks
#*.ipynb
**/.ipynb_checkpoints/

# checkpoint
checkpoint/
checkpoints/*/
visualization/*/
visualization/*.npy
visualization/*.json

# dataset
#datasets/vg/
datasets/vg/*.h5
datasets/vg/VG_100K
datasets/vg/cc_detection_results_oid
datasets/vg/COCO_detection_results_oid
datasets/vg/VG_detection_results_oid
*.h5
*.pt
*.png
*.jpg
*.jpeg
glove/
!demo/*.png

# Editor temporaries
*.swn
*.swo
*.swp
*~

# Pycharm editor settings
.idea

# vscode editor settings
.vscode

# MacOS
.DS_Store

65 changes: 65 additions & 0 deletions SGG_from_NLS/ABSTRACTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
## Abstractions
The main abstractions introduced by `maskrcnn_benchmark` that are useful to
have in mind are the following:

### ImageList
In PyTorch, the first dimension of the input to the network generally represents
the batch dimension, and thus all elements of the same batch have the same
height / width.
In order to support images with different sizes and aspect ratios in the same
batch, we created the `ImageList` class, which holds internally a batch of
images (os possibly different sizes). The images are padded with zeros such that
they have the same final size and batched over the first dimension. The original
sizes of the images before padding are stored in the `image_sizes` attribute,
and the batched tensor in `tensors`.
We provide a convenience function `to_image_list` that accepts a few different
input types, including a list of tensors, and returns an `ImageList` object.

```python
from maskrcnn_benchmark.structures.image_list import to_image_list

images = [torch.rand(3, 100, 200), torch.rand(3, 150, 170)]
batched_images = to_image_list(images)

# it is also possible to make the final batched image be a multiple of a number
batched_images_32 = to_image_list(images, size_divisible=32)
```

### BoxList
The `BoxList` class holds a set of bounding boxes (represented as a `Nx4` tensor) for
a specific image, as well as the size of the image as a `(width, height)` tuple.
It also contains a set of methods that allow to perform geometric
transformations to the bounding boxes (such as cropping, scaling and flipping).
The class accepts bounding boxes from two different input formats:
- `xyxy`, where each box is encoded as a `x1`, `y1`, `x2` and `y2` coordinates, and
- `xywh`, where each box is encoded as `x1`, `y1`, `w` and `h`.

Additionally, each `BoxList` instance can also hold arbitrary additional information
for each bounding box, such as labels, visibility, probability scores etc.

Here is an example on how to create a `BoxList` from a list of coordinates:
```python
from maskrcnn_benchmark.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT

width = 100
height = 200
boxes = [
[0, 10, 50, 50],
[50, 20, 90, 60],
[10, 10, 50, 50]
]
# create a BoxList with 3 boxes
bbox = BoxList(boxes, image_size=(width, height), mode='xyxy')

# perform some box transformations, has similar API as PIL.Image
bbox_scaled = bbox.resize((width * 2, height * 3))
bbox_flipped = bbox.transpose(FLIP_LEFT_RIGHT)

# add labels for each bbox
labels = torch.tensor([0, 10, 1])
bbox.add_field('labels', labels)

# bbox also support a few operations, like indexing
# here, selects boxes 0 and 2
bbox_subset = bbox[[0, 2]]
```
5 changes: 5 additions & 0 deletions SGG_from_NLS/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Code of Conduct

Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
Please read the [full text](https://code.fb.com/codeofconduct/)
so that you can understand what actions will and will not be tolerated.
23 changes: 23 additions & 0 deletions SGG_from_NLS/DATASET.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## DATASET
The following is adapted from [Scene-Graph-Benchmark](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/master/DATASET.md), [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs).

### Download:
1. Download the VG images [part1 (9 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) [part2 (5 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip). Extract these images to the file `datasets/vg/VG_100K`. If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`.
2. Download the [scene graphs labels](https://1drv.ms/u/s!AmRLLNf6bzcir8xf9oC3eNWlVMTRDw?e=63t7Ed) and extract them to `datasets/vg/VG-SGG-with-attri.h5`, or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catalog.py`.
3. Download the [detection results](https://drive.google.com/drive/folders/1SdMXwXpdTZdxYOZl0OcPqqGd2p4DhePt?usp=sharing) of 3 datasets, including: Conceptual Caption, COCO Caption and Visual Genome. After downloading, you can run `cat cc_detection_results.zip.part* > cc_detection_results.zip` to merge several partitions into one zip file and unzip it to folder `datasets/vg/`.

### Folder structure:
After downloading the above files, you should have following hierarchy in folder `datasets/vg/`:

```
├── VG_100K
├── cc_detection_results_oid
├── COCO_detection_results_oid
├── VG_detection_results_oid
└── VG-SGG-with-attri.h5
```

### Preprocessing scripts

We provide scripts for data preprocessing, such as extracting the detection results from images and creating pseudo labels based on detection results and parsed concepts from image captions. More detail can be found in the [folder preprocess](https://github.com/YiwuZhong/SGG_from_NLS/tree/main/preprocess).

64 changes: 64 additions & 0 deletions SGG_from_NLS/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
## Installation

Most of the requirements of this projects are exactly the same as [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark). If you have any problem of your environment, you should check their [issues page](https://github.com/facebookresearch/maskrcnn-benchmark/issues) first. Hope you will find the answer.

### Requirements:
- PyTorch >= 1.2 (Mine 1.6.0 (CUDA 10.1))
- torchvision >= 0.4 (Mine 0.7.0 (CUDA 10.1))
- cocoapi
- yacs
- matplotlib
- GCC >= 4.9
- OpenCV


### Step-by-step installation

```bash
# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do

conda create --name scene_graph_benchmark
conda activate scene_graph_benchmark

# this installs the right pip and dependencies for the fresh python
conda install ipython
conda install scipy
conda install h5py

# scene_graph_benchmark and coco api dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python overrides

# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 10.1
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

export INSTALL_DIR=$PWD

# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

# install PyTorch Detection
cd $INSTALL_DIR
git clone https://github.com/YiwuZhong/SGG_from_NLS.git
cd SGG_from_NLS

# the following will install the lib with
# symbolic links, so that you can modify
# the files if you want and won't need to
# re-build it
python setup.py build develop


unset INSTALL_DIR

23 changes: 23 additions & 0 deletions SGG_from_NLS/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
MIT License

Copyright (c) 2021 Yiwu Zhong

This repository was built based on Scene-Graph-Benchmark(https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch) for scene graph generation and UNITER(https://github.com/ChenRocks/UNITER) for image-text representation learning.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
35 changes: 35 additions & 0 deletions SGG_from_NLS/METRICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Explanation of our metrics
### Recall@K (R@K)
The earliest and the most widely accepted metric in scene graph generation, which is firstly adopted by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187). Since the ground-truth annotations of relationships are incomplete, it's improper to use simple accurary as the metric. Therefore, Lu et al. transfer it to a retrieve-like problem: the relationships are not only required to be correctly classified, but also required to have as higher score as possible, so they can be retrieved from plenty of 'none' relationship pairs.

### No Graph Constraint Recall@K (ng-R@K)
It's firstly used by [Pixel2Graph](https://arxiv.org/abs/1706.07365) and named by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). The former paper significantly improves the R@K results by allowing each pair to have multiple predicates, which means for each subject-object pair, all the 50 predicates will be involved in the recall ranking not just the one with highest score. Since predicates are not exclusive, 'on' and 'riding' can both be correct. This setting significantly improves the R@K. To fairly compare with other methods, [Neural-MOTIFS](https://arxiv.org/abs/1711.06640) named it as the No Graph Constraint Recall@K (ngR@K).

### Mean Recall@K (mR@K)
It is proposed by our work [VCTree](https://arxiv.org/abs/1812.01880) and Chen et al.s'[KERN](https://arxiv.org/abs/1903.03326) at the same time (CVPR 2019), although we didn't make it as our main contribution and only listed the full results on the [supplementary material](https://zpascal.net/cvpr2019/Tang_Learning_to_Compose_CVPR_2019_supplemental.pdf). However, we also acknowledge the contribution of [KERN](https://arxiv.org/abs/1903.03326), for they gave more mR@K results of previous methods. The main motivation of Mean Recall@K (mR@K) is that the VisualGenome dataset is biased towards dominant predicates. If the 10 most frequent predicates are correctly classified, the accuracy would reach 90% even the rest 40 kinds of predicates are all wrong. This is definitely not what we want. Therefore, Mean Recall@K (mR@K) calculates Recall@K for each predicate category independently then report their mean.

### No Graph Constraint Mean Recall@K (ng-mR@K)
The same mean Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original mean Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).

### Zero Shot Recall@K (zR@K)
It is firstly used by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187) for VRD dataset, and firstly reported by [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) for VisualGenome dataset. In short, it only calculates the Recall@K for those subject-predicate-object combinations that not occurred in the training set.

### No Graph Constraint Zero Shot Recall@K (ng-zR@K)
The same zero-shot Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original zero-shot Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).

### Top@K Accuracy (A@K)
It is actually caused by the misunderstanding of PredCls and SGCls protocols. [Contrastive Losses](https://arxiv.org/abs/1903.02728) reported Recall@K of PredCls and SGCls by not just giving ground-truth bounding boxes, but also giving the ground-truth subject-object pairs, so no ranking is involved. The results can only be considerred as Top@K Accuracy (A@K) for the given K ground-truth subject-object pairs.

### Sentence-to-Graph Retrieval (S2G)
S2G is proposed by [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) as an ideal downstream task that only relies on the quality of SGs, for the existing VQA and Captioning are too complicated and challenged by their own bias. It takes human descriptions as queries, searching for matching scene graphs (images), where SGs are considered as the symbolic representations of images. More details will be explained in [S2G-RETRIEVAL.md](maskrcnn_benchmark/image_retrieval/S2G-RETRIEVAL.md).

# Two Common Misunderstandings in SGG Metrics
When you read/follow a SGG paper, and you find that its performance is abnormally high for no obvious reasons, whose authors could mess up some metrics.

1. Not differentiate Graph Constraint Recall@K and No Graph Constraint Recall@K. The setting of With/Without Constraint is introduced by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). However, some early work and a few recent researchers don't differentiate these two setting, using No Graph Constraint results to compare with previous work With Graph Constraint. TYPICAL SYMPTOMS: 1) Recall@100 of PredCls is larger than 75%, 2) not mention With/Without Graph Constraint in the original paper. TYPICAL Paper:[Pixel2Graph](https://arxiv.org/abs/1706.07365) (Since this paper is published before MOTIFS, they didn't mean to take this advantage, and they are actually the fathers of No Graph Constraint setting while MOTIFS is the one who named this baby.)

2. Some researchers misunderstand the protocols of PredCls and SGCls. These two protocols only give ground-truth bounding boxes NOT ground-truth subject object pairs. Some works only predict relationships for ground-truth subject-object pairs in PredCls and SGCls, so their PredCls and SGCls results will extremely high. Note that Recall@K metric is a ranking metric, using ground-truth subject-object pairs can be considerred as giving the perfect ranking. In order to separate from normal PredCls and SGCls, I name this kind of setting as Top@K Accuracy, which is only applicable to PredCls and SGCls. TYPICAL SYMPTOMS: 1) results of PredCls and SGCls are extremely high while results of SGGen are normal, 2) Recall@50 and Recall@100 of PredCls and SGCls are exactly the same, since the ranking is perfect (Recall@20 is less, for some images have groud-truth relationships more than 20). TYPICAL Paper:[Contrastive Losses](https://arxiv.org/abs/1903.02728).

# Output Format of Our Code

![alt text](demo/output_format.png "from 'screenshot'")
Loading

0 comments on commit b495a25

Please sign in to comment.