Skip to content

Commit

Permalink
Create community pipelines directory
Browse files Browse the repository at this point in the history
  • Loading branch information
rrwick committed Jan 22, 2025
1 parent a61e6c3 commit 728efcc
Show file tree
Hide file tree
Showing 3 changed files with 130 additions and 0 deletions.
40 changes: 40 additions & 0 deletions pipelines/Automated_Autocycler_Bash_script_by_Ryan_Wick/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Automated Autocycler Bash script (by Ryan Wick)

This is a simple Bash script that automates running a full Autocycler assembly workflow. The script is designed to be minimalistic, without many frills and assumes that the input reads are ready for assembly.

I wrote this script in January 2025 for Autocycler v0.2.1. I'll do my best to update it if future Autocycler versions break compatibility, but I can't make any promises.



## Key Features

* Does not perform any quality control on the input reads before starting the assembly.
* Uses Raven to [estimate the genome size](https://github.com/rrwick/Autocycler/wiki/Genome-size-estimation).
* Hard-codes the use of four read subsets for the assembly process.
* Uses [GNU Parallel](https://github.com/rrwick/Autocycler/wiki/Parallelising-input-assemblies#gnu-parallel) to run multiple assembly jobs at once.
* Runs assemblies with `nice -n 19` to give them lower priority with the operating system.
* Uses seven different assemblers (in this order): Raven, miniasm, Flye, MetaMDBG, NECAT, NextDenovo and Canu. This order was chosen to put the faster assemblers first and slower assemblers last, in case you want to check a run in progress to see if the assemblies look okay.
* 4 read subsets × 7 assemblers = 28 total input assemblies (assume all are successful).



## Usage

The script takes the following three arguments:
1. **Read filename**: Path to the input FASTQ file (can be gzipped).
2. **Thread count**: Number of threads per assembly.
3. **Job count**: Number of simultaneous assemblies to run. Note: Each assembly will use up to the given thread count, so the maximum total threads in use will be threads × jobs.


**Example command:** `./autocycler_full.sh reads.fastq.gz 16 4`



## Output

When run, the script will create the following directories and files in the working directory:

* **`subsampled_reads/`**: Directory containing the read subsets for assembly. The actual FASTQ files will be deleted after the assemblies are complete (to save disk space), but the directory and its YAML file will remain.
* **`assemblies/`**: Directory containing the input assemblies for Autocycler. The logs for each assembler can be found in the `assemblies/logs` directory.
* **`autocycler_out/`**: Autocycler output directory which will include the final combined assembly as `consensus_assembly.gfa` and `consensus_assembly.fasta`.
* **`autocycler.stderr`**: File containing all `stderr` output from Autocycler across all steps.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env bash

# This script is a wrapper for running a fully-automated Autocycler assembly.

# Usage:
# autocycler_full.sh <read_fastq> <threads> <jobs>

# Copyright 2025 Ryan Wick ([email protected])
# Licensed under the GNU General Public License v3.
# See https://www.gnu.org/licenses/gpl-3.0.html.

# Ensure script exits on error.
set -e

# Get arguments.
reads=$1 # input reads FASTQ
threads=$2 # threads per job
jobs=$3 # number of simultaneous jobs

# Validate input parameters.
if [[ -z "$reads" || -z "$threads" || -z "$jobs" ]]; then
>&2 echo "Usage: $0 <read_fastq> <threads> <jobs>"
exit 1
fi
if [[ ! -f "$reads" ]]; then
>&2 echo "Error: Input file '$reads' does not exist."
exit 1
fi

genome_size=$(genome_size_raven.sh "$reads" "$threads")

# Step 1: subsample the long-read set into multiple files
autocycler subsample --reads "$reads" --out_dir subsampled_reads --genome_size "$genome_size" 2>> autocycler.stderr

# Step 2: assemble each subsampled file
mkdir -p assemblies
rm -f assemblies/jobs.txt
for assembler in raven miniasm flye metamdbg necat nextdenovo canu; do
for i in 01 02 03 04; do
echo "nice -n 19 $assembler.sh subsampled_reads/sample_$i.fastq assemblies/${assembler}_$i $threads $genome_size" >> assemblies/jobs.txt
done
done
set +e
parallel --jobs "$jobs" --joblog assemblies/joblog.txt --results assemblies/logs < assemblies/jobs.txt
set -e
find assemblies/ -maxdepth 1 -type f -name "*.fasta" -empty -delete

# Optional step: remove the subsampled reads to save space
rm subsampled_reads/*.fastq

# Step 3: compress the input assemblies into a unitig graph
autocycler compress -i assemblies -a autocycler_out 2>> autocycler.stderr

# Step 4: cluster the input contigs into putative genomic sequences
autocycler cluster -a autocycler_out 2>> autocycler.stderr

# Steps 5 and 6: trim and resolve each QC-pass cluster
for c in autocycler_out/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c" 2>> autocycler.stderr
autocycler resolve -c "$c" 2>> autocycler.stderr
done

# Step 7: combine resolved clusters into a final assembly
autocycler combine -a autocycler_out -i autocycler_out/clustering/qc_pass/cluster_*/5_final.gfa 2>> autocycler.stderr
26 changes: 26 additions & 0 deletions pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Full Autocycler pipelines

This directory contains scripts, wrappers and pipelines contributed by the community for running full Autocycler assemblies. Since everyone's needs differ, there is no one-size-fits-all way to automate Autocycler. These contributions aim to provide examples and inspiration for automating workflows.



## Usage

Feel free to use one of these pipelines as-is, modify it for your needs or write your own pipeline from scratch. If you create a pipeline that you think others might find useful, you are welcome to contribute it to this directory by submitting a [pull request](https://github.com/rrwick/Autocycler/pulls).



## Contributing

If you'd like to add your pipeline to this directory, please follow these basic submission guidelines:

* Place your pipeline in its own directory. Include all required files within this directory.
* Provide some documentation. This can be in the form of a `README.md` file within the pipeline's directory or inline documentation in the file(s).
* Include a license with your pipeline (either as a license file or in the comments at the top of the file) to specify how it can be used, modified or redistributed.
* If you make changes to your pipeline in the future, feel free to submit a PR to update it in the repository.



## Disclaimer

These pipelines are contributed by the community and are not officially maintained or supported. They may become outdated or non-functional with future versions of Autocycler. Use them at your own risk!

0 comments on commit 728efcc

Please sign in to comment.