Create community pipelines directory

rrwick · Jan 22, 2025 · 728efcc · 728efcc
1 parent a61e6c3
commit 728efcc
Show file tree

Hide file tree

Showing 3 changed files with 130 additions and 0 deletions.
diff --git a/pipelines/Automated_Autocycler_Bash_script_by_Ryan_Wick/README.md b/pipelines/Automated_Autocycler_Bash_script_by_Ryan_Wick/README.md
@@ -0,0 +1,40 @@
+# Automated Autocycler Bash script (by Ryan Wick)
+
+This is a simple Bash script that automates running a full Autocycler assembly workflow. The script is designed to be minimalistic, without many frills and assumes that the input reads are ready for assembly.
+
+I wrote this script in January 2025 for Autocycler v0.2.1. I'll do my best to update it if future Autocycler versions break compatibility, but I can't make any promises.
+
+
+
+## Key Features
+
+* Does not perform any quality control on the input reads before starting the assembly.
+* Uses Raven to [estimate the genome size](https://github.com/rrwick/Autocycler/wiki/Genome-size-estimation).
+* Hard-codes the use of four read subsets for the assembly process.
+* Uses [GNU Parallel](https://github.com/rrwick/Autocycler/wiki/Parallelising-input-assemblies#gnu-parallel) to run multiple assembly jobs at once.
+* Runs assemblies with `nice -n 19` to give them lower priority with the operating system.
+* Uses seven different assemblers (in this order): Raven, miniasm, Flye, MetaMDBG, NECAT, NextDenovo and Canu. This order was chosen to put the faster assemblers first and slower assemblers last, in case you want to check a run in progress to see if the assemblies look okay.
+* 4 read subsets × 7 assemblers = 28 total input assemblies (assume all are successful).
+
+
+
+## Usage
+
+The script takes the following three arguments:
+1. **Read filename**: Path to the input FASTQ file (can be gzipped).
+2. **Thread count**: Number of threads per assembly.
+3. **Job count**: Number of simultaneous assemblies to run. Note: Each assembly will use up to the given thread count, so the maximum total threads in use will be threads × jobs.
+
+
+**Example command:** `./autocycler_full.sh reads.fastq.gz 16 4`
+
+
+
+## Output
+
+When run, the script will create the following directories and files in the working directory:
+
+* **`subsampled_reads/`**: Directory containing the read subsets for assembly. The actual FASTQ files will be deleted after the assemblies are complete (to save disk space), but the directory and its YAML file will remain.
+* **`assemblies/`**: Directory containing the input assemblies for Autocycler. The logs for each assembler can be found in the `assemblies/logs` directory.
+* **`autocycler_out/`**: Autocycler output directory which will include the final combined assembly as `consensus_assembly.gfa` and `consensus_assembly.fasta`.
+* **`autocycler.stderr`**: File containing all `stderr` output from Autocycler across all steps.
diff --git a/pipelines/Automated_Autocycler_Bash_script_by_Ryan_Wick/autocycler_full.sh b/pipelines/Automated_Autocycler_Bash_script_by_Ryan_Wick/autocycler_full.sh
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+
+# This script is a wrapper for running a fully-automated Autocycler assembly.
+
+# Usage:
+#   autocycler_full.sh <read_fastq> <threads> <jobs>
+
+# Copyright 2025 Ryan Wick ([email protected])
+# Licensed under the GNU General Public License v3.
+# See https://www.gnu.org/licenses/gpl-3.0.html.
+
+# Ensure script exits on error.
+set -e
+
+# Get arguments.
+reads=$1    # input reads FASTQ
+threads=$2  # threads per job
+jobs=$3     # number of simultaneous jobs
+
+# Validate input parameters.
+if [[ -z "$reads" || -z "$threads" || -z "$jobs" ]]; then
+    >&2 echo "Usage: $0 <read_fastq> <threads> <jobs>"
+    exit 1
+fi
+if [[ ! -f "$reads" ]]; then
+    >&2 echo "Error: Input file '$reads' does not exist."
+    exit 1
+fi
+
+genome_size=$(genome_size_raven.sh "$reads" "$threads")
+
+# Step 1: subsample the long-read set into multiple files
+autocycler subsample --reads "$reads" --out_dir subsampled_reads --genome_size "$genome_size" 2>> autocycler.stderr
+
+# Step 2: assemble each subsampled file
+mkdir -p assemblies
+rm -f assemblies/jobs.txt
+for assembler in raven miniasm flye metamdbg necat nextdenovo canu; do
+    for i in 01 02 03 04; do
+        echo "nice -n 19 $assembler.sh subsampled_reads/sample_$i.fastq assemblies/${assembler}_$i $threads $genome_size" >> assemblies/jobs.txt
+    done
+done
+set +e
+parallel --jobs "$jobs" --joblog assemblies/joblog.txt --results assemblies/logs < assemblies/jobs.txt
+set -e
+find assemblies/ -maxdepth 1 -type f -name "*.fasta" -empty -delete
+
+# Optional step: remove the subsampled reads to save space
+rm subsampled_reads/*.fastq
+
+# Step 3: compress the input assemblies into a unitig graph
+autocycler compress -i assemblies -a autocycler_out 2>> autocycler.stderr
+
+# Step 4: cluster the input contigs into putative genomic sequences
+autocycler cluster -a autocycler_out 2>> autocycler.stderr
+
+# Steps 5 and 6: trim and resolve each QC-pass cluster
+for c in autocycler_out/clustering/qc_pass/cluster_*; do
+    autocycler trim -c "$c" 2>> autocycler.stderr
+    autocycler resolve -c "$c" 2>> autocycler.stderr
+done
+
+# Step 7: combine resolved clusters into a final assembly
+autocycler combine -a autocycler_out -i autocycler_out/clustering/qc_pass/cluster_*/5_final.gfa 2>> autocycler.stderr
diff --git a/pipelines/README.md b/pipelines/README.md
@@ -0,0 +1,26 @@
+# Full Autocycler pipelines
+
+This directory contains scripts, wrappers and pipelines contributed by the community for running full Autocycler assemblies. Since everyone's needs differ, there is no one-size-fits-all way to automate Autocycler. These contributions aim to provide examples and inspiration for automating workflows.
+
+
+
+## Usage
+
+Feel free to use one of these pipelines as-is, modify it for your needs or write your own pipeline from scratch. If you create a pipeline that you think others might find useful, you are welcome to contribute it to this directory by submitting a [pull request](https://github.com/rrwick/Autocycler/pulls).
+
+
+
+## Contributing
+
+If you'd like to add your pipeline to this directory, please follow these basic submission guidelines:
+
+* Place your pipeline in its own directory. Include all required files within this directory.
+* Provide some documentation. This can be in the form of a `README.md` file within the pipeline's directory or inline documentation in the file(s).
+* Include a license with your pipeline (either as a license file or in the comments at the top of the file) to specify how it can be used, modified or redistributed.
+* If you make changes to your pipeline in the future, feel free to submit a PR to update it in the repository.
+
+
+
+## Disclaimer
+
+These pipelines are contributed by the community and are not officially maintained or supported. They may become outdated or non-functional with future versions of Autocycler. Use them at your own risk!