Paper --> Dynamic Contrastive Learning with Pretrained Deep Language Model Enhances Metagenome Binning for Contigs
DeeperBin is a binner to cluter the contigs with dynamic contrastive learning and pretrained deep language model.
Create DeeperBin's conda environment by using this command:
conda env create -n DeeperBin -f deeperbin-conda-env.yml
And
download PyTorch v2.1.0 -cu*** (or higher version) from http://pytorch.org/ if you want to use GPUs (We highly recommend to use GPUs). For example:
conda activate DeeperBin
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
After preparing the environment, the code of DeeperBin can be installed via pip simply.
conda activate DeeperBin
pip install DeeperBin==1.0.1
This installation will run for around 10 minutes.
Download the pretrained weight and other files (DeeperBin-DB.zip) for running DeeperBin from this LINK.
- Unzip the downloaded file (DeeperBin-DB.zip) and set an environmental variable called "DeeperBin_DB" by adding the following line to the last line of .bashrc file (The path of the file: ~/.bashrc):
export DeeperBin_DB=/path/of/this/DeeperBin-DB/
For example: 'export DeeperBin_DB=/home/csbhzou/software/DeeperBin-DB/'.
- Save the .bashrc file, and then execute:
source .bashrc
- You can set the '-db' flag in CLI to the path of the 'DeeperBin-DB' folder if you do not want to set the environmental variable.
1. You can run DeeperBin with 'clean' mode through the following command:
usage: deeperbin [-h] -c CONTIG_PATH -b SORTED_BAMS_PATHS [SORTED_BAMS_PATHS ...] -o OUTPUT_PATH -temp TEMP_FILE_PATH [-db DB_FILES_PATH]
[--device DEVICE] [--n_views N_VIEWS] [--min_contig_length MIN_CONTIG_LENGTH] [--batch_size BATCH_SIZE] [--epoch_base EPOCH_BASE]
[--num_workers NUM_WORKERS]
optional arguments:
-h, --help show this help message and exit
-c CONTIG_PATH, --contig_path CONTIG_PATH
Contig fasta file path.
-b SORTED_BAMS_PATHS [SORTED_BAMS_PATHS ...], --sorted_bams_paths SORTED_BAMS_PATHS [SORTED_BAMS_PATHS ...]
The sorted bam files path. You can set one bam file for single-sample binning and multiple bam files for multi-sample binning.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
The folder to store final MAGs.
-temp TEMP_FILE_PATH, --temp_file_path TEMP_FILE_PATH
The folder to store temporay files.
-db DB_FILES_PATH, --db_files_path DB_FILES_PATH
The folder contains temporay files. You can ignore it if you set the 'DeeperBin_DB' environmental variable.
--device DEVICE The device for training. Default is cuda:0. We highly recommand to use GPU but not CPU. You can adjust 'batch size' parameter to fit your
GPU's memory. We need 24GB GPU memory to run the default settings. You can use CPU if you set this parameter with 'cpu'.
--n_views N_VIEWS Number of views to generate for each contig during training. Defaults to 6.
--min_contig_length MIN_CONTIG_LENGTH
The minimum length of contigs for binning. Defaults to 750.
--batch_size BATCH_SIZE
The batch size. Defaults to 1024.
--epoch_base EPOCH_BASE
Number of basic training epoches. Defaults to 35.
--num_workers NUM_WORKERS
Number of cpus for clustering contigs. Defaults to None. We would set 1/3 of total cpus if it is None.
2. You can run DeeperBin through the binning_with_all_steps function in Python.
from DeeperBin.Binning import binning_with_all_steps
if __name__ == "__main__":
contig_path = "contigs.fasta"
bam_list = ["bam_file1", "bam_file2"]
temp_path = "/temp_folder/"
bin_output_folder = "/bin_output_folder/"
binning_with_all_steps(
contig_file_path=contig_path,
sorted_bam_file_list=bam_list,
temp_file_folder_path=temp_path,
bin_output_folder_path=bin_output_folder,
db_folder_path="./DeepMetaBin-DB",
training_device="cuda:0",
)
This file contains the following columns:
- MAG name (first column),
- completeness of MAG (second column),
- contamination of MAG (third column),
- MAG quality (fourth column),
- System: Linux
- CPU: No restriction.
- RAM: >= 64 GB
- GPU: The GPU memory must be equal to or greater than 6GB.
- System: NVIDIA DGX Server Version 5.5.1 (GNU/Linux 5.4.0-131-generic x86_64)
- CPU: AMD EPYC 7742 64-Core Processor (2 Sockets)
- RAM: 1TB
- GPU: 8 GPUs (A100-40GB)
- DeeperBin-DB: The model weights and other necessary files for running DeeperBin.
- DeeperBin: The main codes (Python) of DeeperBin.