This project aims to explore the impact of occlusion on Pix2Vox, a multi-view reconstruction model, and possible approaches for improving its generated 3D shapes. In this work erasing data augmentation techniques have been used for generating occluded versions of the ShapeNet dataset and StableDiffusion for inpainting the occluded images before feeding them to Pix2Vox. Results show that the developed approach improves the performance of the baseline by 51% on severe occlusion. Interestingly, fine-tuning the Pix2Vox model using occluded images by 30% to 40%, further improves the performance by 97% on severe occlusion. Additionally, the approach improves the generalisation and performance on the Pix3D dataset by 16%.
Fine-tuned model | Link | Size |
---|---|---|
Erasing_10_20 | https://drive.google.com/file/d/1uY-8hL4IyEawBANTp96bHdRS2mcVQPwX/view?usp=drive_link | 1.12 GB |
Erasing_20_30 | https://drive.google.com/file/d/1kchDamWWg1qXQCJPJtkqT2R3wP37J3g8/view?usp=drive_link | 1.11 GB |
Erasing_30_40 | https://drive.google.com/file/d/1OfWwCTJDL7DgFDYk_3zIVzddPkt8gEMG/view?usp=drive_link | 1.12 GB |
Erasing_10_40 | https://drive.google.com/file/d/1mndFOrewWJJJCx23ZzPd-fkQUjRN2T-8/view?usp=drive_link | 1.12 GB |
Dataset | Link | Size |
---|---|---|
ORE_Random_(Occluded/Inpainted/Combined)_(15/25/50) | https://drive.google.com/file/d/1vIdCUMJMNSn2Sb-HlorfCY0emoMWA_kJ/view?usp=drive_link | 660.7 GB |
ShapeNet rendering images | http://cvgl.stanford.edu/data2/ShapeNetRendering.tgz | 11.5 GB |
ShapeNet voxelized models | http://cvgl.stanford.edu/data2/ShapeNetVox32.tgz | 21.2 MB |
Pix3D images & voxelized models | http://pix3d.csail.mit.edu/data/pix3d.zip | 3.5 GB |
Instructions on how to run Pix2Vox can be found here. The code in this repository is a modified version that supports erasing for augmenting input images and allows to save the voxels of the specified samples as images.
Additional files are:
-
FineTuning
- Purpose: fine-tuning of a given checkpoint
- What to change
- out_path -> output path for the generated checkpoints, train and validation files (.csv), and logs of Tensorboard
- taxonomy -> path to the ShapeNet taxonomy file
- weights -> path to the model to fine-tune
- Fine-tuning hyperparameters
-
Evaluation
- Purpose: test a given list of models on a given list of datasets
- What to change
- taxonomy -> path to the ShapeNet taxonomy file
- rendering_paths -> list of paths to the evaluation datasets
- ckpts -> list of paths to the models to evaluated
- out_path -> output path for the generated evaluation files (.csv), voxel and gt images of the selected samples, and logs of Tensorboard
-
CompareFineTuned
- Purpose: compare and plot the impact of fine-tuning hyperparameters and their evaluation results
- What to change
- pandas_path -> path to the train and validation csv files generated by "FineTuning". Or path to the evaluation csv files generated by "Evaluation"
The code and Instructions for runnng StableDiffusion as an API is provided here.
The Stable diffusion checkpoint used for inpainting can be downloaded from here.
After having the StableDiffusion running as an API in the background, the "Inpainting" file can be used to inpaint a specified dataset.
- Inpainting
- Purpose: inpaint an occluded variant of ShapeNet or to combine the original dataset with the inpainted version
- What to change
- taxonomy_file -> path to the ShapeNet taxonomy file
- output_path -> path on which save the generated
- occluded_path -> path to the dataset to inpaint
- inpainting_path -> path of the newly inpainted dataset
- url -> API url of StableDiffusion
Following the explanation of the files used for the generation and analysis of the occluded variants of ShapeNet and the pre-processing of Pix3D.
- OccludeShapeNet-Chair
- Purpose: generate the occluded variations of the ShapeNet dataset
- What to change
- taxonomy_file -> path to the ShapeNet taxonomy file
- renders_path -> path to the original ShapeNet renders
- output_path -> path on which save the generated dataset
- Pix3D_preprocessing
- Purpose: applying the given mask to the input images, convert the obj models to binvox.
- What to change
- pix3d_path -> path to the Pix3D dataset
- chair_dir -> path to the Pix3D model chair folder
- Need to download the mesh voxelizer and place in the same directory of this file
- AnalyseDatasets
- Purpose: Calculate and plot the information loss of an occluded variant of ShapeNet
- What to change
- datasets_path -> path to the occluded dataset
- gt_path -> path to the original dataset
- save_path -> output path for the generated .csv files
(Just in case someone needs it! :D)
@misc{Greco2023exploring,
title = {Exploring 3D reconstruction from RGB images of partially occluded objects},
author = {Davide Greco},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/DavideGrecoGit/3D-reconstruction-under-occlusion}},
commit = {a5a1e55}
}