Skip to content

Commit

Permalink
Merge branch 'master' into 'public'
Browse files Browse the repository at this point in the history
Merge for version 1.4

See merge request icbi-lab/pipelines/rnaseq-nf!41
  • Loading branch information
riederd committed Jul 28, 2023
2 parents 8ac296d + 5c570e0 commit 71046cf
Show file tree
Hide file tree
Showing 14 changed files with 544 additions and 375 deletions.
35 changes: 26 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,26 +66,29 @@ curl -s https://get.nextflow.io | bash

The pipeline will install almost all required tools via Singularity images or conda environments. If preferred one can also use local installations of all tools (not recommended, please see `Manual installation` at the end of this document)

The software that needs to be present on the system is **Java** (minimum version 8), **Nextflow** (see above), **Singularity**, **Conda** (optional).
The software that needs to be present on the system is **Java** (minimum version 8, if running conda java version 17 or higher is needed), **Nextflow** (see above), **Singularity**, **Conda** (optional).

If you intend to run the pipeline with the `conda` profile instead of singularity, we recommend to install `mamba` (<https://github.com/mamba-org/mamba>)
to speed up the creation of conda environments. If you can not install `mamba` please set `conda.useMamba = false` for the `conda` profile in `conf/profiles.config`

**Optional but recommended:**
Due to license restrictions you may also need to download and install **HLA-HD** by your own, and set the installation path in ```conf/params.config```. _If HLA-HD is not available Class II neoepitopes will NOT be predicted_

### 1.2 References
### 1.3 References

The pipeline requires different reference files, indexes and databases:

please see ```conf/resources.config```

For each nextNEOpi version we prepared a bundle with all needed references, indexes and databases which can be obtained from:

`https://apps-01.i-med.ac.at/resources/nextneopi/`
<https://apps-01.i-med.ac.at/resources/nextneopi/>

the bundle is named to match the release version `nextNEOpi_<version>_resources.tar.gz`

e.g.:

<https://apps-01.i-med.ac.at/resources/nextneopi/nextNEOpi_1.3_resources.tar.gz>
<https://apps-01.i-med.ac.at/resources/nextneopi/nextNEOpi_1.4_resources.tar.gz>

download and extract the contents of the archive into the directory you specified for ```resourcesBaseDir``` in the ```conf/params.config``` file.

Expand Down Expand Up @@ -117,6 +120,15 @@ Refs:
* <https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files>
* <https://www.gencodegenes.org/human/>

### 1.4 Testdata

If you want to test the pipeline using a working minimal test dataset you may download one from

<https://apps-01.i-med.ac.at/resources/nextneopi/nextNEOpi_testdata.tar.gz>

Please note that due to the limited read coverage `CNVkit` will not run successfully using this test dataset. Please run the
pipeline using the parameter `--CNVkit false` when testing with this dataset.

## 2. Usage

Before running the pipeline, the config files in the ```conf/``` directory may need to be edited. In the
Expand All @@ -127,8 +139,11 @@ the number of CPUs assigned for each process and adjust according to your system
Most pipeline parameters can be edited in the ```params.config``` file or changed on run time with command line options by using ```--NameOfTheParameter``` given in the ```params.config```.
References, databases should be edited in the ```resources.config``` file.

**Note**: nextNEOpi is currently written in nextflow DSL 1, which is only supported up to nextflow version 22.10.8, this means you need to pin the nextflow
version by setting the environment variable `NXF_VER=22.10.8`, in case you have installed a newer nextflow version.

```
nextflow run nextNEOpi.nf --batchFile <batchFile_FASTQ.csv | batchFile_BAM.csv> -profile singularity|conda,[cluster] [-resume] -config conf/params.config
NXF_VER=22.10.8 nextflow run nextNEOpi.nf --batchFile <batchFile_FASTQ.csv | batchFile_BAM.csv> -profile singularity|conda,[cluster] [-resume] -config conf/params.config
```

**Profiles:** conda or singularity
Expand Down Expand Up @@ -269,6 +284,8 @@ nextflow run nextNEOpi.nf \
```--TCR``` Run mixcr for TCR prediction
Default: true

```--CNVkit``` Run CNVkit for detecting CNAs. Default: true

```--HLAHD_DIR``` Specify the path to your HLA-HD installation. Needed if Class II neoantigens should be predicted.

```--HLA_force_RNA``` Use only RNAseq for HLA typing. Default: false
Expand Down Expand Up @@ -350,11 +367,11 @@ If you prefer local installation of the analysis tools please install the follow
* BWA (Version >= 0.7.17)
* SAMTOOLS (Version >= 1.9)
* GATK3 (Version 3.8-0)
* GATK4 (Version >= 4.2.5.0)
* VARSCAN (Version 2.4.3)
* GATK4 (Version >= 4.4.0.0)
* VARSCAN (Version 2.4.6)
* MUTECT1 (Version 1.1.7) ---- optional
* BAMREADCOUNT (Version 0.8.0)
* VEP (Version v105)
* VEP (Version v110)
* BGZIP
* TABIX
* BCFTOOLS
Expand All @@ -368,7 +385,7 @@ If you prefer local installation of the analysis tools please install the follow
* YARA
* HLA-HD
* ALLELECOUNT
* RSCRIPT (R > 3.6.1)
* RSCRIPT (R > 3.6.2)
* SEQUENZA (3.0)
* CNVkit

Expand Down
2 changes: 1 addition & 1 deletion assets/.mambarc
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ channels:
- bioconda
- defaults
always_yes: true

channel_priority: flexible
2 changes: 1 addition & 1 deletion assets/email_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h3>Pipeline Configuration:</h3>
</table>

<p>icbi/nextNEOpi</p>
<p><a href="https://gitlab.i-med.ac.at/icbi-lab/pipelines/nextNEOpi">nextNEOpi</a></p>
<p><a href="https://github.com/icbi-lab/nextNEOpi">nextNEOpi</a></p>

</div>

Expand Down
Binary file removed assets/gatkPythonPackageArchive.zip
Binary file not shown.
32 changes: 23 additions & 9 deletions assets/nextNEOpi.def
Original file line number Diff line number Diff line change
@@ -1,30 +1,44 @@
Bootstrap: docker
From: mambaorg/micromamba
From: mambaorg/micromamba:0.24.0

%files
nextNEOpi.yml /nextNEOpi.yml
./.mambarc /root/.mambarc

%post
apt-get update && apt-get install -y \
procps \
curl \
unzip
apt-get --allow-releaseinfo-change update && apt-get install -y \
procps \
curl \
unzip \
libgomp1 \
openjdk-17-jdk

# set jdk-17 as default
update-java-alternatives -s java-1.17.0-openjdk-amd64

export LANG=C.UTF-8 LC_ALL=C.UTF-8
export PATH=/opt/conda/bin:$PATH

curl -L -o gatk-4.2.6.1.zip https://github.com/broadinstitute/gatk/releases/download/4.2.6.1/gatk-4.2.6.1.zip
unzip -j gatk-4.2.6.1.zip gatk-4.2.6.1/gatkPythonPackageArchive.zip -d ./
mkdir -p /opt/gatk
mkdir -p /opt/conda/bin

curl -L -o gatk-4.4.0.0.zip https://github.com/broadinstitute/gatk/releases/download/4.4.0.0/gatk-4.4.0.0.zip
unzip -j gatk-4.4.0.0.zip gatk-4.4.0.0/gatkPythonPackageArchive.zip -d ./
unzip -j gatk-4.4.0.0.zip gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar -d ./opt/gatk/
unzip -j gatk-4.4.0.0.zip gatk-4.4.0.0/gatk -d ./opt/gatk/

chmod +x /opt/gatk/gatk
ln -s /opt/gatk/gatk /opt/conda/bin/gatk

micromamba install --yes --name base --file /nextNEOpi.yml

rm -f /nextNEOpi.yml
rm -f gatk-4.2.6.1.zip
rm -f gatk-4.4.0.0.zip
rm -f gatkPythonPackageArchive.zip

apt-get clean
micromamba clean --all --yes

%environment
export LANG=C.UTF-8 LC_ALL=C.UTF-8
export PATH=/opt/conda/bin:$PATH
export PATH=/usr/lib/jvm/java-17-openjdk-amd64/bin/:/opt/conda/bin:$PATH
13 changes: 6 additions & 7 deletions assets/nextNEOpi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,19 @@ channels:
dependencies:
- bwa
- samtools
- gatk4=4.2.6.1
- fastp
- fastqc
- multiqc
- sambamba
- bcftools
- varscan
- bam-readcount
- yara
- optitype

# core python dependencies for GATK4 (4.2.6.1)
# core python dependencies for GATK4 (4.4.0.0)
- conda-forge::python=3.6.10 # do not update
- pip=20.0.2 # specifying channel may cause a warning to be emitted by conda
- conda-forge::mkl=2019.5 # MKL typically provides dramatic performance increases for theano, tensorflow, and other key dependencies
- conda-forge::mkl-service=2.3.0
- conda-forge::numpy=1.17.5 # do not update, this will break scipy=0.19.1
- conda-forge::numpy=1.17.5 # do not update, this will break scipy=1.0.0
# verify that numpy is compiled against MKL (e.g., by checking *_mkl_info using numpy.show_config())
# and that it is used in tensorflow, theano, and other key dependencies
- conda-forge::theano=1.0.4 # it is unlikely that new versions of theano will be released
Expand All @@ -42,9 +38,12 @@ dependencies:
- conda-forge::scikit-learn=0.23.1
- conda-forge::matplotlib=3.2.1
- conda-forge::pandas=1.0.3
- conda-forge::typing_extensions=4.1.1 # see https://github.com/broadinstitute/gatk/issues/7800 and linked PRs
- conda-forge::dill=0.3.4 # used for pickling lambdas in TrainVariantAnnotationsModel
- conda-forge::joblib=1.1.1

# core R dependencies; these should only be used for plotting and do not take precedence over core python dependencies!
- r-base=3.6.2
- r-base>=3.6.2
- r-data.table
- r-dplyr=0.8.5
- r-getopt=1.20.3
Expand Down
6 changes: 1 addition & 5 deletions assets/pVACtools_icbi.def
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,7 @@ From: continuumio/miniconda3:4.9.2
pip install protobuf==3.20.1
pip install tensorflow>=2.2.2

# need to pin pandas version see: https://github.com/griffithlab/pVACtools/issues/779
# will not be required in pVACtools versions > 3.0.0
# pip install pandas==0.25.2

pip install pvactools==3.0.2
pip install pvactools==4.0.1

cd /opt
mkdir tmp_src
Expand Down
3 changes: 2 additions & 1 deletion assets/rigscore.def
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ From: mambaorg/micromamba

%post
apt-get update && apt-get install -y \
procps
procps \
patch

export LANG=C.UTF-8 LC_ALL=C.UTF-8
export PATH=/opt/conda/bin:$PATH
Expand Down
6 changes: 3 additions & 3 deletions bin/CSiN.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ def convert_to_df(pvacseq_1_tsv, pvacseq_2_tsv):
# Rename columns in order for the merge to work properly
pvacseq_2_df.rename(
columns={
"NetMHCIIpan WT Score": "NetMHCpan WT Score",
"NetMHCIIpan MT Score": "NetMHCpan MT Score",
"NetMHCIIpan WT IC50 Score": "NetMHCpan WT IC50 Score",
"NetMHCIIpan MT IC50 Score": "NetMHCpan MT IC50 Score",
"NetMHCIIpan WT Percentile": "NetMHCpan WT Percentile",
"NetMHCIIpan MT Percentile": "NetMHCpan MT Percentile",
},
Expand All @@ -63,7 +63,7 @@ def sub_csin(c, IC50_cutoff, xp_cutoff, filtered_df):
# Filter dataframe
filtered_df_tmp = filtered_df
filtered_df_tmp = filtered_df_tmp[filtered_df_tmp["NetMHCpan MT Percentile"] < rank]
filtered_df_tmp = filtered_df_tmp[filtered_df_tmp["Best MT Score"] < IC50_cutoff]
filtered_df_tmp = filtered_df_tmp[filtered_df_tmp["Best MT IC50 Score"] < IC50_cutoff]
filtered_df_tmp = filtered_df_tmp[filtered_df_tmp["Gene Expression"] > xp_cutoff]
# Get the VAF and mean VAF, then normilize
vaf_mean = filtered_df_tmp["Tumor DNA VAF"].mean()
Expand Down
37 changes: 18 additions & 19 deletions bin/get_epitopes.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,26 @@
import sys
import csv

def filter_tsv(sample, input_file, output_file):
# Open the input TSV file and create a CSV reader
with open(input_file, 'r', newline='') as infile:
reader = csv.DictReader(infile, delimiter='\t')

def parse_mhcI(inFile, epitopes=[]):
with open(inFile) as in_file:
csv_reader = csv.reader(in_file, delimiter="\t")
in_file.readline()
for line in csv_reader:
if line[19] == "NA" or line[18] == "NA":
pass
else:
# print("%s\t%s\t%s" % (line[18], line[19], line[17]))
epitopes.append("%s\t%s\t%s" % (line[18], line[19], line[17]))
return epitopes
# Open the output TSV file and create a CSV writer
with open(output_file, 'w', newline='') as outfile:
fieldnames = ['Sample_ID', 'mut_peptide', 'Reference', 'peptide_variant_position']
writer = csv.DictWriter(outfile, fieldnames=fieldnames, delimiter='\t')

# Write the header
writer.writeheader()

def write_output(outFile, sample_id, epitopes=[]):
with open(outFile, "w") as out_file:
out_file.write("Sample_ID\tmut_peptide\tReference\tpeptide_variant_position\n")
for epitope in epitopes:
out_file.write("%s\t%s\n" % (sample_id, epitope))
return outFile
# Filter rows and write selected columns to the output TSV file
for row in reader:
if row['WT Epitope Seq'] and row['MT Epitope Seq']:
writer.writerow({'Sample_ID': sample,
'mut_peptide': row['MT Epitope Seq'],
'Reference': row['WT Epitope Seq'],
'peptide_variant_position': row['Mutation Position']})


if __name__ == "__main__":
Expand All @@ -45,5 +45,4 @@ def write_output(outFile, sample_id, epitopes=[]):
sample = args.sample_id
epitope_array = []

parse_mhcI(infile, epitope_array)
write_output(outfile, sample, epitope_array)
filter_tsv(sample, infile, outfile)
22 changes: 14 additions & 8 deletions conf/params.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ params {
// run controlFREEC
controlFREEC = false

// run CNVkit
CNVkit = true

// Panel of normals (see: https://gatk.broadinstitute.org/hc/en-us/articles/360040510131-CreateSomaticPanelOfNormals-BETA-)
mutect2ponFile = 'NO_FILE'

Expand All @@ -66,8 +69,7 @@ params {

// Directories (need to be in quotes)
tmpDir = "/tmp/$USER/nextNEOpi/" // Please make sure that there is enough free space (~ 50G)
workDir = "$PWD"
outputDir = "${workDir}/RESULTS"
outputDir = "${PWD}/results"

// Result publishing method
publishDirMode = "auto" // Choose between:
Expand Down Expand Up @@ -110,7 +112,7 @@ params {
HLA_HD_genome_version = "hg38"

// URL to the installation package of MIXCRC, will be installed automatically.
MIXCR_url = "https://github.com/milaboratory/mixcr/releases/download/v4.0.0/mixcr-4.0.0.zip"
MIXCR_url = "https://github.com/milaboratory/mixcr/releases/download/v4.4.1/mixcr-4.4.1.zip"
MIXCR_lic = "" // path to MiXCR license file
MIXCR = "" // Optional: specify path to mixcr directory if already installed, will be installed automatically otherwise
// analyze TCRs using mixcr
Expand All @@ -127,8 +129,8 @@ params {
IGS = "" // optional path to IGS

// IEDB tools urls for MHCI and MHCII. These will be used for IEDB installation into resources.databases.IEDB_dir
IEDB_MHCI_url = "https://downloads.iedb.org/tools/mhci/3.1.2/IEDB_MHC_I-3.1.2.tar.gz"
IEDB_MHCII_url = "https://downloads.iedb.org/tools/mhcii/3.1.6/IEDB_MHC_II-3.1.6.tar.gz"
IEDB_MHCI_url = "https://downloads.iedb.org/tools/mhci/3.1.4/IEDB_MHC_I-3.1.4.tar.gz"
IEDB_MHCII_url = "https://downloads.iedb.org/tools/mhcii/3.1.8/IEDB_MHC_II-3.1.8.tar.gz"


// Java settings: please adjust to your memory available
Expand Down Expand Up @@ -176,9 +178,9 @@ params {


// VEP
vep_version = "106.1"
vep_version = "110.0"
vep_assembly = "GRCh38"
vep_cache_version = "106"
vep_cache_version = "110"
vep_species = "homo_sapiens"
vep_options = "--everything" // "--af --af_1kg --af_gnomad --appris --biotype --check_existing --distance 5000 --failed 1 --merged --numbers --polyphen b --protein --pubmed --regulatory --sift b --symbol --xref_refseq --tsl --gene_phenotype"

Expand All @@ -204,7 +206,7 @@ params {
// pVACseq settings
mhci_epitope_len = "8,9,10,11"
mhcii_epitope_len = "15,16,17,18,19,20,21,22,23,24,25" // minimum length has to be at least 15 (see pVACtools /opt/iedb/mhc_ii/mhc_II_binding.py line 246)
epitope_prediction_tools = "NetMHCpan MHCflurry NetMHCIIpan"
epitope_prediction_tools = "NetMHCpan NetMHCpanEL MHCflurry MHCflurryEL NetMHCIIpan NetMHCIIpanEL"
use_NetChop = false
use_NetMHCstab = true

Expand All @@ -231,18 +233,22 @@ includeConfig './profiles.config'

timeline {
enabled = true
overwrite = true
file = "${params.tracedir}/icbi/nextNEOpi_timeline.html"
}
report {
enabled = true
overwrite = true
file = "${params.tracedir}/icbi/nextNEOpi_report.html"
}
trace {
enabled = true
overwrite = true
file = "${params.tracedir}/icbi/nextNEOpi_trace.txt"
}
dag {
enabled = true
overwrite = true
file = "${params.tracedir}/icbi/nextNEOpi_dag.svg"
}

Expand Down
Loading

0 comments on commit 71046cf

Please sign in to comment.