update recent version

kixlab · Dec 26, 2022 · b495a25 · b495a25
1 parent 67dfc82
commit b495a25
Show file tree

Hide file tree

Showing 327 changed files with 59,612 additions and 14 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,35 +3,38 @@ __pycache__
 _ext
 *.pyc
 *.so
-maskrcnn_benchmark.egg-info/
-build/
-dist/
+SGG_from_NLS/maskrcnn_benchmark.egg-info/
+SGG_from_NLS/build/
+SGG_from_NLS/dist/
 
 # ipython/jupyter notebooks
 #*.ipynb
 **/.ipynb_checkpoints/
 
 # checkpoint
-checkpoint/
-checkpoints/*/
-visualization/*/
-visualization/*.npy
-visualization/*.json
+SGG_from_NLS/checkpoint/
+SGG_from_NLS/checkpoints/*/
+SGG_from_NLS/visualization/*/
+SGG_from_NLS/visualization/*.npy
+SGG_from_NLS/visualization/*.json
 
 # dataset
 #datasets/vg/
-datasets/vg/*.h5
-datasets/vg/VG_100K
-datasets/vg/cc_detection_results_oid
-datasets/vg/COCO_detection_results_oid
-datasets/vg/VG_detection_results_oid_original
+SGG_from_NLS/datasets/vg/*.h5
+SGG_from_NLS/datasets/vg/VG_100K
+SGG_from_NLS/datasets/vg/cc_detection_results_oid
+SGG_from_NLS/datasets/vg/COCO_detection_results_oid
+SGG_from_NLS/datasets/vg/VG_detection_results_oid
+SGG_from_NLS/datasets/vg/VG_detection_results_oid_original
+SGG_from_NLS/preprocess/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12
 *.h5
 *.pt
 *.png
 *.jpg
 *.jpeg
 *.npy
 *.npz
+*.zip
 glove/
 !demo/*.png
 

diff --git a/SGG_from_NLS b/SGG_from_NLS
diff --git a/SGG_from_NLS/.flake8 b/SGG_from_NLS/.flake8
@@ -0,0 +1,8 @@
+# This is an example .flake8 config, used when developing *Black* itself.
+# Keep in sync with setup.cfg which is used for source packages.
+
+[flake8]
+ignore = E203, E266, E501, W503
+max-line-length = 80
+max-complexity = 18
+select = B,C,E,F,W,T4,B9
diff --git a/SGG_from_NLS/.gitattributes b/SGG_from_NLS/.gitattributes
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
diff --git a/SGG_from_NLS/.gitignore b/SGG_from_NLS/.gitignore
@@ -0,0 +1,50 @@
+# compilation and distribution
+__pycache__
+_ext
+*.pyc
+*.so
+maskrcnn_benchmark.egg-info/
+build/
+dist/
+
+# ipython/jupyter notebooks
+#*.ipynb
+**/.ipynb_checkpoints/
+
+# checkpoint
+checkpoint/
+checkpoints/*/
+visualization/*/
+visualization/*.npy
+visualization/*.json
+
+# dataset
+#datasets/vg/
+datasets/vg/*.h5
+datasets/vg/VG_100K
+datasets/vg/cc_detection_results_oid
+datasets/vg/COCO_detection_results_oid
+datasets/vg/VG_detection_results_oid
+*.h5
+*.pt
+*.png
+*.jpg
+*.jpeg
+glove/
+!demo/*.png
+
+# Editor temporaries
+*.swn
+*.swo
+*.swp
+*~
+
+# Pycharm editor settings
+.idea
+
+# vscode editor settings
+.vscode
+
+# MacOS
+.DS_Store
+
diff --git a/SGG_from_NLS/ABSTRACTIONS.md b/SGG_from_NLS/ABSTRACTIONS.md
@@ -0,0 +1,65 @@
+## Abstractions
+The main abstractions introduced by `maskrcnn_benchmark` that are useful to
+have in mind are the following:
+
+### ImageList
+In PyTorch, the first dimension of the input to the network generally represents
+the batch dimension, and thus all elements of the same batch have the same
+height / width.
+In order to support images with different sizes and aspect ratios in the same
+batch, we created the `ImageList` class, which holds internally a batch of
+images (os possibly different sizes). The images are padded with zeros such that
+they have the same final size and batched over the first dimension. The original
+sizes of the images before padding are stored in the `image_sizes` attribute,
+and the batched tensor in `tensors`.
+We provide a convenience function `to_image_list` that accepts a few different
+input types, including a list of tensors, and returns an `ImageList` object.
+
+```python
+from maskrcnn_benchmark.structures.image_list import to_image_list
+
+images = [torch.rand(3, 100, 200), torch.rand(3, 150, 170)]
+batched_images = to_image_list(images)
+
+# it is also possible to make the final batched image be a multiple of a number
+batched_images_32 = to_image_list(images, size_divisible=32)
+```
+
+### BoxList
+The `BoxList` class holds a set of bounding boxes (represented as a `Nx4` tensor) for
+a specific image, as well as the size of the image as a `(width, height)` tuple.
+It also contains a set of methods that allow to perform geometric
+transformations to the bounding boxes (such as cropping, scaling and flipping).
+The class accepts bounding boxes from two different input formats:
+- `xyxy`, where each box is encoded as a `x1`, `y1`, `x2` and `y2` coordinates, and
+- `xywh`, where each box is encoded as `x1`, `y1`, `w` and `h`.
+
+Additionally, each `BoxList` instance can also hold arbitrary additional information
+for each bounding box, such as labels, visibility, probability scores etc.
+
+Here is an example on how to create a `BoxList` from a list of coordinates:
+```python
+from maskrcnn_benchmark.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT
+
+width = 100
+height = 200
+boxes = [
+  [0, 10, 50, 50],
+  [50, 20, 90, 60],
+  [10, 10, 50, 50]
+]
+# create a BoxList with 3 boxes
+bbox = BoxList(boxes, image_size=(width, height), mode='xyxy')
+
+# perform some box transformations, has similar API as PIL.Image
+bbox_scaled = bbox.resize((width * 2, height * 3))
+bbox_flipped = bbox.transpose(FLIP_LEFT_RIGHT)
+
+# add labels for each bbox
+labels = torch.tensor([0, 10, 1])
+bbox.add_field('labels', labels)
+
+# bbox also support a few operations, like indexing
+# here, selects boxes 0 and 2
+bbox_subset = bbox[[0, 2]]
+```
diff --git a/SGG_from_NLS/CODE_OF_CONDUCT.md b/SGG_from_NLS/CODE_OF_CONDUCT.md
@@ -0,0 +1,5 @@
+# Code of Conduct
+
+Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
+Please read the [full text](https://code.fb.com/codeofconduct/)
+so that you can understand what actions will and will not be tolerated.
diff --git a/SGG_from_NLS/DATASET.md b/SGG_from_NLS/DATASET.md
@@ -0,0 +1,23 @@
+## DATASET
+The following is adapted from [Scene-Graph-Benchmark](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/master/DATASET.md), [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs).
+
+### Download:
+1. Download the VG images [part1 (9 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) [part2 (5 Gb)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip). Extract these images to the file `datasets/vg/VG_100K`. If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`. 
+2. Download the [scene graphs labels](https://1drv.ms/u/s!AmRLLNf6bzcir8xf9oC3eNWlVMTRDw?e=63t7Ed) and extract them to `datasets/vg/VG-SGG-with-attri.h5`, or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catalog.py`.
+3. Download the [detection results](https://drive.google.com/drive/folders/1SdMXwXpdTZdxYOZl0OcPqqGd2p4DhePt?usp=sharing) of 3 datasets, including: Conceptual Caption, COCO Caption and Visual Genome. After downloading, you can run `cat cc_detection_results.zip.part* > cc_detection_results.zip` to merge several partitions into one zip file and unzip it to folder `datasets/vg/`.
+
+### Folder structure:
+After downloading the above files, you should have following hierarchy in folder `datasets/vg/`:
+
+```
+├── VG_100K
+├── cc_detection_results_oid
+├── COCO_detection_results_oid
+├── VG_detection_results_oid
+└── VG-SGG-with-attri.h5
+```
+
+### Preprocessing scripts
+
+We provide scripts for data preprocessing, such as extracting the detection results from images and creating pseudo labels based on detection results and parsed concepts from image captions. More detail can be found in the [folder preprocess](https://github.com/YiwuZhong/SGG_from_NLS/tree/main/preprocess).
+
diff --git a/SGG_from_NLS/INSTALL.md b/SGG_from_NLS/INSTALL.md
@@ -0,0 +1,64 @@
+## Installation
+
+Most of the requirements of this projects are exactly the same as [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark). If you have any problem of your environment, you should check their [issues page](https://github.com/facebookresearch/maskrcnn-benchmark/issues) first. Hope you will find the answer.
+
+### Requirements:
+- PyTorch >= 1.2 (Mine 1.6.0 (CUDA 10.1))
+- torchvision >= 0.4 (Mine 0.7.0 (CUDA 10.1))
+- cocoapi
+- yacs
+- matplotlib
+- GCC >= 4.9
+- OpenCV
+
+
+### Step-by-step installation
+
+```bash
+# first, make sure that your conda is setup properly with the right environment
+# for that, check that `which conda`, `which pip` and `which python` points to the
+# right path. From a clean conda env, this is what you need to do
+
+conda create --name scene_graph_benchmark
+conda activate scene_graph_benchmark
+
+# this installs the right pip and dependencies for the fresh python
+conda install ipython
+conda install scipy
+conda install h5py
+
+# scene_graph_benchmark and coco api dependencies
+pip install ninja yacs cython matplotlib tqdm opencv-python overrides
+
+# follow PyTorch installation in https://pytorch.org/get-started/locally/
+# we give the instructions for CUDA 10.1
+conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
+
+export INSTALL_DIR=$PWD
+
+# install pycocotools
+cd $INSTALL_DIR
+git clone https://github.com/cocodataset/cocoapi.git
+cd cocoapi/PythonAPI
+python setup.py build_ext install
+
+# install apex
+cd $INSTALL_DIR
+git clone https://github.com/NVIDIA/apex.git
+cd apex
+python setup.py install --cuda_ext --cpp_ext
+
+# install PyTorch Detection
+cd $INSTALL_DIR
+git clone https://github.com/YiwuZhong/SGG_from_NLS.git
+cd SGG_from_NLS
+
+# the following will install the lib with
+# symbolic links, so that you can modify
+# the files if you want and won't need to
+# re-build it
+python setup.py build develop
+
+
+unset INSTALL_DIR
+
diff --git a/SGG_from_NLS/LICENSE b/SGG_from_NLS/LICENSE
@@ -0,0 +1,23 @@
+MIT License
+
+Copyright (c) 2021 Yiwu Zhong
+
+This repository was built based on Scene-Graph-Benchmark(https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch) for scene graph generation and UNITER(https://github.com/ChenRocks/UNITER) for image-text representation learning.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/SGG_from_NLS/METRICS.md b/SGG_from_NLS/METRICS.md
@@ -0,0 +1,35 @@
+# Explanation of our metrics
+### Recall@K (R@K)
+The earliest and the most widely accepted metric in scene graph generation, which is firstly adopted by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187). Since the ground-truth annotations of relationships are incomplete, it's improper to use simple accurary as the metric. Therefore, Lu et al. transfer it to a retrieve-like problem: the relationships are not only required to be correctly classified, but also required to have as higher score as possible, so they can be retrieved from plenty of 'none' relationship pairs.
+
+### No Graph Constraint Recall@K (ng-R@K)
+It's firstly used by [Pixel2Graph](https://arxiv.org/abs/1706.07365) and named by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). The former paper significantly improves the R@K results by allowing each pair to have multiple predicates, which means for each subject-object pair, all the 50 predicates will be involved in the recall ranking not just the one with highest score. Since predicates are not exclusive, 'on' and 'riding' can both be correct. This setting significantly improves the R@K. To fairly compare with other methods, [Neural-MOTIFS](https://arxiv.org/abs/1711.06640) named it as the No Graph Constraint Recall@K (ngR@K).
+
+### Mean Recall@K (mR@K)
+It is proposed by our work [VCTree](https://arxiv.org/abs/1812.01880) and Chen et al.s'[KERN](https://arxiv.org/abs/1903.03326) at the same time (CVPR 2019), although we didn't make it as our main contribution and only listed the full results on the [supplementary material](https://zpascal.net/cvpr2019/Tang_Learning_to_Compose_CVPR_2019_supplemental.pdf). However, we also acknowledge the contribution of [KERN](https://arxiv.org/abs/1903.03326), for they gave more mR@K results of previous methods. The main motivation of Mean Recall@K (mR@K) is that the VisualGenome dataset is biased towards dominant predicates. If the 10 most frequent predicates are correctly classified, the accuracy would reach 90% even the rest 40 kinds of predicates are all wrong. This is definitely not what we want. Therefore, Mean Recall@K (mR@K) calculates Recall@K for each predicate category independently then report their mean. 
+
+### No Graph Constraint Mean Recall@K (ng-mR@K)
+The same mean Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original mean Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
+
+### Zero Shot Recall@K (zR@K)
+It is firstly used by [Visual relationship detection with language priors](https://arxiv.org/abs/1608.00187) for VRD dataset, and firstly reported by  [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) for VisualGenome dataset. In short, it only calculates the Recall@K for those subject-predicate-object combinations that not occurred in the training set.
+
+### No Graph Constraint Zero Shot Recall@K (ng-zR@K)
+The same zero-shot Recall metric, but for each pair of objects, all possible predicates are valid candidates (the original zero-shot Recall@K only considers the predicate with maximum score of each pair as the valid candidate to calculate Recall).
+
+### Top@K Accuracy (A@K) 
+It is actually caused by the misunderstanding of PredCls and SGCls protocols. [Contrastive Losses](https://arxiv.org/abs/1903.02728) reported Recall@K of PredCls and SGCls by not just giving ground-truth bounding boxes, but also giving the ground-truth subject-object pairs, so no ranking is involved. The results can only be considerred as Top@K Accuracy (A@K) for the given K ground-truth subject-object pairs. 
+
+### Sentence-to-Graph Retrieval (S2G)
+S2G is proposed by [Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) as an ideal downstream task that only relies on the quality of SGs, for the existing VQA and Captioning are too complicated and challenged by their own bias. It takes human descriptions as queries, searching for matching scene graphs (images), where SGs are considered as the symbolic representations of images. More details will be explained in [S2G-RETRIEVAL.md](maskrcnn_benchmark/image_retrieval/S2G-RETRIEVAL.md).
+
+# Two Common Misunderstandings in SGG Metrics
+When you read/follow a SGG paper, and you find that its performance is abnormally high for no obvious reasons, whose authors could mess up some metrics.
+
+1. Not differentiate Graph Constraint Recall@K and No Graph Constraint Recall@K. The setting of With/Without Constraint is introduced by [Neural-MOTIFS](https://arxiv.org/abs/1711.06640). However, some early work and a few recent researchers don't differentiate these two setting, using No Graph Constraint results to compare with previous work With Graph Constraint. TYPICAL SYMPTOMS: 1) Recall@100 of PredCls is larger than 75%, 2) not mention With/Without Graph Constraint in the original paper. TYPICAL Paper:[Pixel2Graph](https://arxiv.org/abs/1706.07365) (Since this paper is published before MOTIFS, they didn't mean to take this advantage, and they are actually the fathers of No Graph Constraint setting while MOTIFS is the one who named this baby.)
+
+2. Some researchers misunderstand the protocols of PredCls and SGCls. These two protocols only give ground-truth bounding boxes NOT ground-truth subject object pairs. Some works only predict relationships for ground-truth subject-object pairs in PredCls and SGCls, so their PredCls and SGCls results will extremely high. Note that Recall@K metric is a ranking metric, using ground-truth subject-object pairs can be considerred as giving the perfect ranking. In order to separate from normal PredCls and SGCls,  I name this kind of setting as Top@K Accuracy, which is only applicable to PredCls and SGCls. TYPICAL SYMPTOMS: 1) results of PredCls and SGCls are extremely high while results of SGGen are normal, 2) Recall@50 and Recall@100 of PredCls and SGCls are exactly the same, since the ranking is perfect (Recall@20 is less, for some images have groud-truth relationships more than 20). TYPICAL Paper:[Contrastive Losses](https://arxiv.org/abs/1903.02728).
+
+# Output Format of Our Code
+
+![alt text](demo/output_format.png "from 'screenshot'")
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Auto detect text files and perform LF normalization
		* text=auto