Skip to content

Autocycler dotplot

Ryan Wick edited this page Dec 19, 2024 · 13 revisions

Basics

Autocycler dotplot generates pairwise dotplots for input sequences. This can help to illuminate the differences between sequences and potential structural errors.

Within an Autocycler assembly, the most common way to run Autocycler dotplot is before/after trimming a cluster's sequences with Autocycler trim, allowing the user to verify that trimming has removed any duplicated or erroneous regions. However, Autocycler dotplot can also be run outside of an Autocycler assembly – just give it a FASTA file or directory of FASTA files as input, and it will produce all pairwise dotplots.

Example commands

Create pairwise dotplots for all sequences in an Autocycler cluster, before and after trimming:

autocycler dotplot -i qc_pass/cluster_002/1_untrimmed.gfa -o qc_pass/cluster_002/1_untrimmed.png
autocycler dotplot -i qc_pass/cluster_002/2_trimmed.gfa -o qc_pass/cluster_002/2_trimmed.png

Create pairwise dot plots for all sequences in a FASTA file:

autocycler dotplot -i sequences.fasta -o dotplots.png

Full usage

Usage: autocycler dotplot [OPTIONS] --input <INPUT> --out_png <OUT_PNG>

Options:
  -i, --input <INPUT>      Input Autocycler GFA file, FASTA file or directory (required)
  -o, --out_png <OUT_PNG>  File path where dotplot PNG will be saved (required)
      --res <RES>          Size (in pixels) of dotplot image [default: 2000]
      --kmer <KMER>        K-mer size to use in dotplot [default: 32]
  -h, --help               Print help
  -V, --version            Print version

Notes

  • Blue for same-strand matches, red for opposite-strand matches.
  • Performance scales with sequence length (n) and sequence number (n2).
  • While Autocycler dotplot can be run on large sequences (e.g. a bacterial chromosome), it can be slow, especially if there are many sequences.
  • Text size is scaled to fit in the margins. This means that having too many sequences or very short sequences can be a problem, as it will cause the text size to be very small.

Example 1: circular plasmid

This example shows the Autocycler dotplot images for a small plasmid, before and after running Autocycler trim. The circular nature of the sequence can be seen by the fact that the sequences are all in good agreement with each other, but with different starting positions. Before trimming, two sequences (from assembly_05.fasta and assembly_09.fasta) are twice the length of the others – these contigs contain an erroneously doubled version of the plasmid sequence. After trimming, all sequences are the same length (no doubled sequences) and there are fewer sequences because Autocycler trim discards sequences with outlier lengths.

Before Autocycler trim: After Autocycler trim:
Autocycler dotplot example 1 untrimmed Autocycler dotplot example 1 trimmed

Example 2: linear plasmid

This example shows the Autocycler dotplot images for a linear plasmid, before and after running Autocycler trim. This plasmid contains one blunt end and one hairpin end, the latter causing sequences to extend past the end of the plasmid on the opposite strand. This hairpin artefact can be seen as a cross structure in the before-trimming dot plots, but after trimming all sequences are the same length and no hairpin artefact is present.

Before Autocycler trim: After Autocycler trim:
Autocycler dotplot example 2 untrimmed Autocycler dotplot example 2 trimmed

Example 3: plasmid with repeats

Since dot plots show all matching k-mers, repeats in the sequence will also be visible. This example shows a plasmid that contains six copies of the same insertion sequence.

Before Autocycler trim: After Autocycler trim:
Autocycler dotplot example 3 untrimmed Autocycler dotplot example 3 trimmed
Clone this wiki locally