Run on custom data

Before preparing your own data, you should checkout the dataset module carefully.

Overall, you need to place your data with the following structure and create a corresponding config file.

manhattan_sdf
├───data
|   ├───$scene_name
|   |   ├───intrinsic.txt
|   |   ├───images
|   |   |   ├───0.png
|   |   |   ├───1.png
|   |   |   └───...
|   |   ├───pose
|   |   |   ├───0.txt
|   |   |   ├───1.txt
|   |   |   └───...
|   |   ├───depth_colmap
|   |   |   ├───0.npy
|   |   |   ├───1.npy
|   |   |   └───...
|   |   └───semantic_deeplab
|   |       ├───0.png
|   |       ├───1.png
|   |       └───...
|   └───...
├───configs
|   ├───$scene_name.yaml
|   └───...
└───...

Images

You should place RGB images in data/$scene_name/images folder, note that the filenames can be arbitrary but you need to make sure that they are consistent with files in other folders under data/$scene_name.

Intrinsic parameters

Save the 4x4 intrinsic matrix in data/$scene_name/intrinsic.txt.

Camera poses

You can solve camera poses with COLMAP or other tools you like. Then you need to normalize the camera poses and modify some configs to ensure that:

The scene is inside bounding radius, note that bounding radius should be lower than π so that positional encoding can work well.
The center of the scene is near the origin and the geometric initialization radius is appropriate (surface of initialized sphere and target geometry should not be too far).
Make sure that the sampling range (near, far) can cover the whole scene.

To achieve these, the simplest way is to normalize the camera poses to be inside a unit sphere, which is similar to the implementation of VolSDF and neurecon. Then you can set the geometric initialization radius as 1.0 and bounding radius as 2.0 (note that indoor scenes are scanned from inside, which is different from object dataset such as DTU, so you need to set the geometric initialization radius and bounding radius larger than the range of camera poses).

This can work well if the images are captured by walking around the scene (namely, the camera trajectory is relatively close to the boundary of the scene), since the scale of the scene can be regarded as slightly larger than the range of camera poses. If it cannot be guaranteed, you need to rescale the camera poses to be inside a smaller sphere or adjust the hyperparameters heuristically. A more general way is to first run sparse reconstruction and define a region of interest manually, please refer to NeuS.

Save the normalized poses as 4x4 matrices in txt format under data/$scene_name/pose folder. Remember to save the scale and offset used to normalize here so that you can transform to original coordinate if you want to extract mesh and compare with ground truth mesh.

COLMAP depth maps

You need to run sparse and dense reconstruction of COLMAP first. Please refer to this instruction if you want to use known camera poses.

After dense reconstruction, you can obtain depth prediction for each view. However, the depth predictions can be noisy, so we recommend you to run fusion to filter out most noises. Since original COLMAP does not have fusion mask for each view, you need to compile and run this customized version, which is a submodule of NerfingMVS.

We provide a Python script here for reference, which includes camera poses normalization and COLMAP depth maps generation.

Semantic predictions

You need to run 2D semantic segmentations to generate semantic predictions from images. We provide our trained model and inference code here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUSTOM.md

CUSTOM.md

Run on custom data

Images

Intrinsic parameters

Camera poses

COLMAP depth maps

Semantic predictions

Files

CUSTOM.md

Latest commit

History

CUSTOM.md

File metadata and controls

Run on custom data

Images

Intrinsic parameters

Camera poses

COLMAP depth maps

Semantic predictions