You can now click on "Open tree" link in the dataset info box to open reference tree of this dataset on nextstrain.org. This allows to browse the current trees for each dataset without running Nextclade analysis. If a dataset does not provide a reference tree, the link will be disabled.
The "Load example" links are now correctly disabled, not hidden, for the datasets which do not provide example sequence data.
This version addresses an issue when sometimes clade (or clade-like attribute, such as lineage) of the placed query node might not always match the clade of its parent.
The query node placement is adjusted during the greedy tree building, and sometimes the branch needs to be split and a new auxiliary internal node to be inserted to accommodate the new node. Previously, Nextclade would copy the clade of this internal node from the attachment target node. However, this is not always correct and can lead to mismatch between clade of the query node and of the new internal node.
In this version we added a voting mechanism, which calculates a mode of the clades involved: of the parent, target and query nodes. The same procedure is repeated for each clade-like attribute. After that, in some cases, branch labels also need to adjust their positions.
This should not change the clade assignment for query samples, but only the clades of the inserted auxiliary internal nodes, to make sure that the tree is consistent.
Nextclade CLI: Obtain CA certificates from platform trust store; add NEXTCLADE_EXTRA_CA_CERTS
/ --extra-ca-certs
Nextclade CLI users have previously reported issues with CA certificates when fetching datasets from an organization's network (e.g. in a university or in a company).
Starting with this version, Nextclade CLI respects the OS-level trust store configurations. This includes private CAs and self-signed certificates. Ensures backward compatibility and functionality across different platforms, including those lacking a native trust store or with outdated ones.
We introduced a NEXTCLADE_EXTRA_CA_CERTS
environment variable and --extra-ca-certs
option which allow adding additional CA certificates to the trust store specifically for Nextclade, for when modifying the system's trust store isn't desirable/possible. See #1536 for more details.
Auspice tree visualization package has been updated from 2.56.0 to 2.58.0. See Auspice changelog here.
Sometimes Nextclade Web would detect incorrect number of available CPU threads and would create too many processing threads for processing. This could cause additional overhead and slowdown the runs. We observed this behavior on non_chromium based browsers, such as Firefox and Safari. This has been fixed now. The number of threads has been clamped to 3 by default. You can modify this in "Settings" dialog.
Since 3.8.0 Nextclade could crash when particular combinations of CSV/TSV columns selected in "Column config" tab on "Export" page in Nextclade Web or with --output-columns-selection
argument in Nextclade CLI. This has been resolved.
Remove extra spaces in the text of entries in the "Relative to" dropdown selector in Nextclade Web.
Nextclade now calls mutations relative to multiple targets. Additionally, to previously available mutations relative to reference and mutations relative to parent tree node (private mutations), Nextclade now calls mutations relative to clade founder tree nodes, and relative to custom nodes of interest if defined in the dataset (e.g. vaccine strains).
Nextclade Web now has an additional dropdown selector for the target of mutation calling. Output files has new columns/fields for mutations relative to clade founders (founderMuts
) as well as for mutations relative to custom nodes (relativeMutations
).
See documentation for more details.
Auspice tree visualization package has been updated from 2.55.0 to 2.56.0. See Auspice changelog here.
This definitively resolves crash due to missing JavaScript polyfills, which occurred in Nextclade Web 3.7.2
Temporarily downgrade Auspice from 2.55.0 to 2.54.3 to prevent the tree page in Nextclade Web from crashing. The definitive fix will follow.
When multiple query samples were to be placed onto the same node on the reference tree, sometimes multiple auxiliary nodes could be created having the same name. Node names are expected to be unique for Auspice visualization to work correctly, so when visualizing the tree Auspice have been renaming these nodes and emitting warnings into browsers' dev console.
In this version we pick unique names for the auxiliary nodes during placement, so that there are no more warnings. Users may observe changes in some of the node names when inspecting output Auspice JSON file. However, this unlikely to affect most users' work.
Since 3.7.0 Nextclade Web is not showing "updated at" date for any datasets. This has been fixed.
Most markers can be toggled on or off on the sequence views in "Settings" page in Nextclade Web, however frame shifts and insertions could not be. We added the missing toggles.
The text in details/summary ("collapse", "spoiler") component (e.g. the list of SC2 lineages) overflowing and producing garbled text in dataset readmes and changelogs. This has been fixed.
Auspice tree visualization package has been updated from 2.53.0 to 2.55.0. See Auspice changelog here.
The deprecated feature-policy
header was removed entirely and interest-cohort
entry was removed from the permission-policy
header. Latest versions of web browsers should no longer emit warnings into console.
Additionally to the previous, we now test Nextclade CLI on the following newer Linux distributions:
- Amazon Linux 2.0.2024
- Debian 12
- Fedora 41
- Oracle Linux 8.9
- Ubuntu 24.04
When both a standalone reference sequence and Auspice tree containing .root.sequence.nuc
are present, Nextclade will check that these are the same sequence. If not, a warning is emitted to stderr for Nextclade CLI and to browser's dev console for Nextclade Web. This is mostly useful for dataset authors, for debugging.
Nextclade sometimes displayed an error in the peptide view when switching CDS by clicking on annotation visualization. This has been fixed now.
Nextclade can now optionally use Auspice datasets (in Auspice v2 JSON format) not only as reference trees, but also as self-contained full Nextclade datasets. Nextclade will take pathogen info, genome annotation, reference sequence, and, of course, reference tree from Auspice JSON. No other files are needed. This allows to use almost any Auspice dataset (e.g. from nextstrain.org) as Nextclade dataset.
-
In Nextclade CLI,
--input-dataset
argument now also accepts a path to Auspice JSON file (in addition to accepting the usual paths to a dataset directory and zip archive) -
Nextclade Web now has a new URL parameter
dataset-json-url
, which accepts a URL to Auspice JSON file or even to a dataset URL on nextstrain.org
This feature is currently in experimental stage. For details and discussion see PR #1455.
Nextclade now accepts Auspice JSONs without .branch_attrs
on tree nodes.
Previously, Nextclade treated output CSV/TSV columns index and seqName as mandatory and they were always present in the output files. In this release they are made configurable. One can:
- in CLI: add or omit
index
andseqName
values when using--output-columns-selection
argument - in Web: tick or untick checkboxes for
index
andseqName
in "Column config" tab of "Export" page
The table in the nextclade dataset list
command now displays an additional column "capabilities", which lists dataset capabilities, i.e. whether dataset contains information allowing clade assignment, QC, etc. The same information is available in JSON format (unstable) if you pass --json
flag.
Previously Nextclade required clade information to be always present in the input reference tree in the form of the .node_attrs.clade_membership
field on each tree node. However, for certain datasets we might not have or need clade information. Making such datasets required workarounds, such as adding an empty string to the clade_membership
field.
In this version we make clade_membership
field optional. This allows to make datasets without clades. This is useful when working with organisms for which clades don't make sense or for which the nomenclature is not sufficiently established. This is also useful for dataset authors, who can now bootstrap simple datasets without clades first and then add clades and other features gradually later.
With this change, if clade_membership
is not present in the dataset's reference tree nodes, then
-
Clade assignment will not run
-
Any clade-related functionality will not work
-
Output JSON/NDJSON result entries will not contain clade field
-
Clade column in output CSV/TSV will be empty
-
Clade column in Nextclade Web will be empty
This change does not affect any other parts of the application. Notably, clade-like attributes (from .meta.extensions.nextclade.clade_node_attrs
), if present, are still assigned and being written to the output as before.
Nextclade sometimes failed to detect a motif loss if that motif was the only one in its category. This is now fixed and users could observe changes in detected lost motifs. This affects datasets using aaMotifs
property in their pathogen.json file, notably the flu datasets.
When dataset-url
URL parameter is provided Nextclade Web would not update the dataset's pathogen.json file when remote dataset changes without changing its version. This is now fixed. It only affected users providing custom datasets using dataset-url
URL parameter.
The Auspice tree rendering package has been updated from version 2.52.1 to version 2.53.0. See the list of changes here
In dataset selector, sometimes there were extra scrollbars displayed to the right of the dataset names. This has been fixed.
When suggestion is triggered manually, using "Suggest" button on main page, Nextclade will now automatically select the best dataset as the current dataset. Previously this could only be done by clearing the current dataset first and then clicking "Suggest". When suggestion algorithm is triggered automatically, the behavior is unchanged - the dataset will not be selected.
Nextclade CLI will no longer read tree.json
and genome_annotation.gff3
from the dataset, unless they are declared in the pathogen.json
. These are optional files and we cannot assume their presence or filenames.
Nextclade CLI will warn users when input datasets contains extra files which are not declared in the dataset's pathogen.json, or if there's extra declarations of files in the pathogen.json, but the files are not actually present in the dataset. This is mostly only useful to dataset authors for debugging issues in their datasets.
We added one more build variant to Bioconda distribution channel - for Linux operating system on 64-bit ARM hardware architecture. It uses nextclade-aarch64-unknown-linux-gnu
executable underneath. This can be useful if you prefer to manage Nextclade CLI installation on your Linux ARM machine or in a Docker ARM container with Conda package manager. However, because Nextclade CLI is a self-contained single-file executable, we still recommend direct downloads from GitHub Releases rather than Conda or other installation methods.
Nextclade was crashing with internal error when --verbosity
option was present. This has been fixed.
Nextclade reports WebWorker-related errors when analysis is started on Safari browser. The minimum working version of Safari we were able to successfully test Nextclade on is 16.5. We still recommend using Chrome or Firefox for the best experience.
Previously Nextclade would report an error "Expected character '>' at record start" when input FASTA file contained trailing newline or when it was empty. This was fixed.
Dataset suggestion will no longer run each time "Datasets" page is opened
See changelog here
This fixes a bug in the dataset filtering logic causing "Dataset not found" error when even correct name and tag were requested using nextclade dataset get
with --tag
argument.
Minimizer search algorithm used in dataset auto-suggestion in Nextclade Web as well as in sort
command of Nextclade CLI.
The default value for minimum match score (--min-score
) has been reduced from 0.3 to 0.1. The default value for minimum number of hits (--min-hits
) required for a detection has been reduced from 10 to 5. This should allow to better handle more diverse viruses.
If there is a sufficiently large gap between dataset scores, the algorithm will now only consider the group of datasets before the gap. The gap size can be configured using --max-score-gap
argument in Nextclade CLI. The default value is 0.2
.
Additionally, in Nextclade CLI sort
command the algorithm now chooses only the best matching dataset. In order to select all matching datasets, the --all-matches
flag has been added.
The TSV output of the sort
command (requested with --output-results-tsv
) now contains additional column: index
. The cells under this column contain index of the corresponding input sequence in the FASTA file. These indices can be used in the downstream processing to reliably map input sequences to the output results. Sequence names alone can be unreliable because they are arbitrary strings which are not guaranteed to be unique.
Due to popular demand, we are bringing back --input-pcr-primers
argument for Nextclade CLI, which accepts a path to primers.csv
file. The feature works just like it did prior to release of Nextclade v3, except primers.csv
is never read from a dataset - you always need to provide it separately. At the same time, we removed support for primers
field from pathogen.json
, because it was too difficult to make a correct JSON object and it would conflict with the primers provided with --input-pcr-primers
.
Results table stripes are always alternating now, regardless of sorting and filtering applied. This is only a visual change and does not affect any functionality.
-
Fixed a bug introduced in v3.0.0 which caused the default path for translations to be incorrect. This affected only users who used
--output-all
without passing a custom path template via--output-translations
. The new default path isnextclade.cds_translation.{cds}.fasta
where{cds}
gets replaced with the name of the CDS, e.g.nextclade.cds_translation.S.fasta
for SARS-CoV-2's spike protein. -
Fixed a bug where
nextclade dataset get
command fails to download a dataset if a dataset has more than one version released.
- Added a section to the v3 migration guide about the renamed default path for translations, a breaking change. The new default output path for translations is
nextclade.cds_translation.{cds}.fasta
. Before v3, the default path wasnextclade_gene_{gene}.translation.fasta
. You can emulate the old (default) behavior by passing--output-translations="nextclade_gene_{cds}.translation.fasta"
tonextclade3
.
Fixed links on navigation bar: "Docs" and "CLI"
We are happy to present a major release of Nextclade, containing new features and bug fixes.
⚠️ This release contains breaking changes which may require your attention.
Useful links:
- Nextclade Web v3
- Nextclade Web v2 - if you need the old version, e.g. if you have custom v2 datasets
- Nextclade CLI releases - all versions
- Nextclade user documentation - for detailed instructions on how to use Nextclade Web and Nextclade CLI
- Nextclade dataset curation guide - if you have a custom Nextclade dataset or want to create one
- Nextclade source code repository - for contributors to Nextclade software (code, bug reports, feature requests etc.)
- Nextclade data repository - for contributors to Nextclade datasets (add new datasets, update existing ones, report bugs, etc.)
- Nextclade software issues - to report bugs and ask questions about Nextclade software
- Nextclade data issues - to report bugs and ask questions about Nextclade datasets
- Nextstrain discussion forum - for general discussion and questions about Nextstrain
This section briefly lists breaking changes in Nextclade v3 compared to Nextclade v2. Please see Nextclade v3 migration guide (alternative link) for a detailed description of each breaking change and of possible migration paths.
- Nextalign CLI is removed, because Nextclade CLI can now do everything that Nextalign v2 did
- Potentially different alignment and translation output due to changes in the seed alignment algorithm. Some of the alignment parameters are removed. Default parameters of new parameters might need to be adjusted.
- Potentially different tree output due to a new tree builder algorithm.
- Dataset file format and dataset names have changed.
- Some CLI arguments for individual input files are removed.
- Some output files are removed
- Genome annotation CLI argument is renamed
- URL parameters in Nextclade Web have changed
- CDS instead of genes
The sections below list all changes - breaking and non-breaking. The breaking changes are denoted with word [BREAKING]
.
If you encounter problems during migration, or breaking changes not mentioned in this document, please report it to the developers by opening a new GitHub issue.
The seed matching algorithm was rewritten to be more robust and handle sequences with higher diversity. For example, RSV-A can now be aligned against RSV-B.
Parameters minSeeds
, seedLength
, seedSpacing
, minMatchRate
, mismatchesAllowed
, maxIndel
no longer have any effect and are removed.
New parameters kmerLength
, kmerDistance
, minMatchLength
, allowedMismatches
, windowSize
are added.
Default values should work for sequences with a diversity of up to X%. For sequences with higher diversity, the parameters may need to be adjusted.
For short sequences, the threshold length to use full-matrix alignment is now determined based on kmerLength
instead of the removed seedLength
. The coefficient is adjusted to roughly match the old final value.
Nextclade now treats genes only as containers for CDSes ("CDS" is coding sequence). CDSes are the main unit of translation and a basis for AA mutations now. A gene can contain multiple CDSes, but they are handled independently.
A CDS can consist of multiple fragments. These fragments are extracted from the full nucleotide genome independently and joined together (in the order provided in the genome annotation) to form the nucleotide sequence of the CDS. The CDS is then translated and the resulting polypeptides are analyzed (mutations are detected etc.). This implementation allows to handle slippage (e.g. ORF1ab in coronaviruses) and splicing (e.g. tat and rev in HIV-1).
If genome annotation describes a CDS fragment as circular (wrapping around origin), Nextclade splits it into multiple linear (non-wrapping) fragments. The translation and analysis is then performed as if it was a linear genome.
Nextclade follows the GFF3 specification. Please refer to it for how to describe circular features.
The GFF3 file parser has been augmented to support all the types of genetic features necessary for Nextclade to operate. There are still feature types which Nextclade ignores. We can consider supporting more types as scientific need arises.
Nextclade v3 now has the ability to phylogenetically resolve relationships between input sequences, where v2 would only attach each query sequence independently to the reference tree. Nextclade v3 thus may produce trees that are different from the trees produced in Nextclade v2.
Please read the Phylogenetic placement section in the documentation for more details.
We no longer treat mutations to ambiguous nucleotides as reversions, i.e. if the attachment node has a mutation mutated with respect to reference and the query sequence is ambiguous we previously counted this as a reversion. This change only affects “private mutation” QC score and the classification of private mutations into “reversion substitution” and “unlabeled substitution”.
Nextclade Web can now optionally suggest the most appropriate dataset(s) for user-provided input sequences. Drop your sequences and click "Suggest" to try out this feature.
Following changes in genome annotation handling, the genome annotations widget in Nextclade Web now shows CDS fragments instead of genes.
The gene selector dropdown in Nextclade Web's results table has been transformed into a more general genetic feature selector. It shows the hierarchy of genetic features if there are nested features. Otherwise, the list is flat, to save screen space. It shows types of each of the genetic feature (gene, CDS or protein) as colorful badges. The menu is searchable, which is useful for mpox and other large viruses with many genes. Only CDSes can be selected currently, but we may extend this in the future to more feature types.
Nucleotide sequence views (in the results table) now also show colored markers for ambiguous nucleotides (non-ACTGN).
The row of buttons, containing "Back", "Tree" and other buttons is removed. Instead, different sections of the web application are always accessible via the main navigation bar.
The "Export" ("Download") and "Settings" sections are moved to dedicated pages.
Due to changes in the dataset format and input files, the URL parameters have the following changes:
input-root-seq
renamed toinput-ref
input-gene-map
renamed toinput-annotation
input-pathogen-json
addedinput-qc-config
removedinput-pcr-primers
removedinput-virus-properties
removeddataset-reference
removed
The nextclade.errors.csv
and nextclade.insertions.csv
files are removed and no longer appear in the "Export" dialog, nor are they included into the nextclade.zip
archive of all outputs.
Errors and insertions are now included in the nextclade.csv
and nextclade.tsv
files.
The Auspice tree viewer component is updated from version 2.45.2 to 2.51.0. See the Auspice releases or changelog.
Nextalign CLI is no longer provided as a standalone application along with Nextclade CLI v3 because Nextclade now has all the features that distinguished Nextalign. This means there's only one set of command line arguments to remember. Nextclade CLI runs the same algorithms, accepts same the inputs and provides the same outputs as v2 Nextalign, plus some more. For most use-cases, the CLI interface and the input and output files should be the same or very similar.
Due to changes in the seed alignment algorithm, the following parameters are no longer used and the corresponding CLI arguments and JSON fields under alignmentParams
in pathogen.json
(previously virus_properties.json
) were removed:
--seed-length
--seed-spacing
--max-indel
--min-match-rate
--min-seeds
--mismatches-allowed
The following new alignment parameters were added:
--allowed-mismatches
--kmer-distance
--kmer-length
--min-match-length
--min-seed-cover
--max-alignment-attempts
--max-band-area
--window-size
Due to changes in the dataset format the following CLI arguments were removed:
--input-virus-properties
--input-qc-config
--input-pcr-primers
in favor of --input-pathogen-json
.
The arguments --output-errors
and --output-insertions
have been removed. Their information is now included in --output-csv
and --output-tsv
.
The argument --input-gene-map
renamed to --input-annotation
. The short form -m
remains unchanged.
The argument --genes
is renamed to --cds-selection
. The short form -g
remains unchanged.
Nextclade can now also export the tree in Newick format via the --output-tree-nwk
argument.
Most input files and files inside datasets are now optional. This simplifies dataset creation and maintenance and allows for step-by-step, incremental extension of them. You can start only with a reference sequence, which will only allow for alignment and very basic mutation calling in Nextclade, and later you can add more functionality. Optional input files also enable the removal of Nextalign CLI.
If you maintain a custom dataset or want to try creating one - refer to our Dataset curation guide. Community contributed datasets are welcome!
The old phylogenetic tree placement behavior can be restored by adding the --without-greedy-tree-builder
flag.
The new argument --only-names
allows to print a concise list of dataset names:
nextclade dataset list --only-names
The new argument --search
allows to search datasets using substring match with dataset name, dataset friendly name, reference name or reference accession:
nextclade dataset list --search=flu
The argument --json
allows to output a JSON object instead of the table. You can write it into a file and to postprocess it:
nextclade dataset list --json > "dataset_list.json"
nextclade dataset list --json | jq '.[] | select(.path | startswith("nextstrain/sars-cov-2")) | .attributes'
The sort
subcommand takes your sequences in FASTA format and outputs sequences grouped by dataset in the form of a directory tree. Each subdirectory corresponds to a dataset and contains an output FASTA file with only sequences that are detected to be similar to the reference sequence in this dataset.
Example usage:
nextclade sort --output-dir="out/sort/" --output-results-tsv="out/sort.tsv" "input.fasta"
This can be useful for splitting FASTA files containing sequences which belong to different pathogens, strains or segments, for example for separating flu HA and NA segments.
The read-annotation
subcommand takes a GFF3 file and displays how features are arranged hierarchically as viewed by Nextclade. This is useful for Nextclade developers and dataset creators to verify (and debug) how Nextclade understand genetic features from a particular GFF3 file.
Example usage:
nextclade read-annotation genome_annotation.gff3
Type nextclade read-annotation --help
for description of arguments.
Nextclade Web now uses multithreading more effectively. This results in faster processing of large fastas on computers with more than one processor. The speedup is around 2 for 1000 SARS-CoV-2 sequences on a multi-core machine.
The new features caused changes in major internal data structures and made them more complex. We now generate JSON schema and Typescript typings from Rust code. This allows to find mismatches between parts written in different languages, and to avoid bugs related to data types.
The change in genome annotation handling had significant consequences for coordinate spaces Nextclade is using internally (e.g. alignment space vs reference space, nuc space vs aa space, global nuc space vs nuc space local to a CDS). In order to make coordinate transforms safer, we introduced new Position
and Range
types, different for each space. This prevents mixing up coordinates in different spaces.
For changes in older versions, see docs/changes/CHANGELOG.old.md