-
Notifications
You must be signed in to change notification settings - Fork 5
Autocycler dotplot
Autocycler dotplot generates pairwise dotplots for input sequences. This can help to illuminate the differences between sequences and potential structural errors.
Within an Autocycler assembly, the most common way to run Autocycler dotplot is before/after trimming a cluster's sequences with Autocycler trim, allowing the user to verify that trimming has removed any duplicated or erroneous regions. However, Autocycler dotplot can also be run outside of an Autocycler assembly – just give it a FASTA file or directory of FASTA files as input, and it will produce all pairwise dotplots.
Create pairwise dotplots for all sequences in an Autocycler cluster, before and after trimming:
autocycler dotplot -i qc_pass/cluster_002/1_untrimmed.gfa -o qc_pass/cluster_002/1_untrimmed.png
autocycler dotplot -i qc_pass/cluster_002/2_trimmed.gfa -o qc_pass/cluster_002/2_trimmed.png
Create pairwise dot plots for all sequences in a FASTA file:
autocycler dotplot -i sequences.fasta -o dotplots.png
Usage: autocycler dotplot [OPTIONS] --input <INPUT> --out_png <OUT_PNG>
Options:
-i, --input <INPUT> Input Autocycler GFA file, FASTA file or directory (required)
-o, --out_png <OUT_PNG> File path where dotplot PNG will be saved (required)
--res <RES> Size (in pixels) of dotplot image [default: 2000]
--kmer <KMER> K-mer size to use in dotplot [default: 32]
-h, --help Print help
-V, --version Print version
- Blue for same-strand matches, red for opposite-strand matches.
- Performance scales with sequence length (n) and sequence number (n2).
- While Autocycler dotplot can be run on large sequences (e.g. a bacterial chromosome), it can be slow, especially if there are many sequences.
- Text size is scaled to fit in the margins. This means that having too many sequences or very short sequences can be a problem, as it will cause the text size to be very small.
This example shows the Autocycler dotplot images for a small plasmid, before and after running Autocycler trim. The circular nature of the sequence can be seen by the fact that the sequences are all in good agreement with each other, but with different starting positions. Before trimming, two sequences (from assembly_05.fasta
and assembly_09.fasta
) are twice the length of the others – these contigs contain an erroneously doubled version of the plasmid sequence. After trimming, all sequences are the same length (no doubled sequences) and there are fewer sequences because Autocycler trim discards sequences with outlier lengths.
Before Autocycler trim: | After Autocycler trim: |
![]() |
![]() |
This example shows the Autocycler dotplot images for a linear plasmid, before and after running Autocycler trim. This plasmid contains one blunt end and one hairpin end, the latter causing sequences to extend past the end of the plasmid on the opposite strand. This hairpin artefact can be seen as a cross structure in the before-trimming dot plots, but after trimming all sequences are the same length and no hairpin artefact is present.
Before Autocycler trim: | After Autocycler trim: |
![]() |
![]() |
Since dot plots show all matching k-mers, repeats in the sequence will also be visible. This example shows a plasmid that contains six copies of the same insertion sequence.
Before Autocycler trim: | After Autocycler trim: |
![]() |
![]() |
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine