This page provides an explanation of the parameters offered by each of our Python components. We represent each parameter with its Run Config key and its corresponding commandline argument. When running the tinyRNA pipeline (tiny run/recount/replot
), you'll use the Run Config key to specify your preferences. The commandline argument is used when you run a tool as an individual, standalone step (tiny-collapse
, tiny-count
, or tiny-plot
)
Run Config Key | Commandline Argument |
---|---|
threshold: | --threshold THRESHOLD |
You can specify a minimum count threshold to determine which sequences are reported after unique sequence identification and counting is complete. Sequences with a count less than or equal to the threshold value are placed in a low_count.fasta
file, and are not used in downstream analyses.
Run Config Key | Commandline Argument |
---|---|
5p_trim: | --5p-trim LENGTH |
3p_trim: | --3p-trim LENGTH |
Bases can be trimmed from the 5' and/or 3' end of each sequence before it is evaluated for uniqueness and counted. This is useful for trimming randomized bases and UMIs (note: formal UMI deduplication is not offered at this time).
Run Config Key | Commandline Argument |
---|---|
compress: | --compress |
tiny-collapse outputs are often very large. You can save space by switching this option "on" so that outputs are gzipped before being written to disk.
tiny-collapse -i FASTQFILE -o OUTPREFIX [-h] [-t THRESHOLD] [-c]
[--5p-trim LENGTH] [--3p-trim LENGTH]
Collapse sequences from a fastq file to a fasta file. Headers in the output
fasta file will contain the number of times each sequence occurred in the
input fastq file, and an ID which indicates the relative order in which each
sequence was first encountered. Gzipped files are automatically supported for
fastq inputs, and compressed fasta outputs are available by request.
Required arguments:
-i FASTQFILE, --input-file FASTQFILE
The input fastq(.gz) file to collapse
-o OUTPREFIX, --out-prefix OUTPREFIX
The prefix for output files {prefix}_collapsed.fa and,
if counts fall below threshold,
{prefix}_collapsed_lowcounts.fa
Optional arguments:
-h, --help show this help message and exit
-t THRESHOLD, --threshold THRESHOLD
Sequences <= THRESHOLD will be omitted from
{prefix}_collapsed.fa and will instead be placed in
{prefix}_collapsed_lowcounts.fa
-c, --compress Use gzip compression when writing fasta outputs
--5p-trim LENGTH Trim LENGTH bases from the 5' end of each sequence
--3p-trim LENGTH Trim LENGTH bases from the 3' end of each sequence
Run Config Key | Commandline Argument |
---|---|
--get-templates |
Copies the template configuration files required by tiny-count into the current directory. This argument can't be combined with --paths-file
. All other arguments are ignored when provided, and once the templates have been copied tiny-count exits.
Run Config Key | Commandline Argument |
---|---|
counter_normalize_by_genomic_hits: | --normalize-by-genomic-hits T/F |
By default, tiny-count will increment feature counts by a normalized amount to avoid overcounting. Each unique sequence's read count is determined by tiny-collapse (or a compatible collapsing utility) and is preserved through the alignment process. For sequences with multiple alignments, a portion of the sequence's original count is allocated to each of its alignments to be assigned to features that pass selection at the locus. This portion is the original count divided by the number of alignments, or genomic hits. By disabling this normalization step, each of the sequence's alignments will be allocated the full original read count rather than the normalized portion.
Run Config Key | Commandline Argument |
---|---|
counter_normalize_by_feature_hits: | --normalize-by-feature-hits T/F |
By default, tiny-count will increment feature counts by a normalized amount to avoid overcounting. Each sequence alignment locus is allocated a portion of the sequence's original read count (depending on counter_normalize_by_genomic_hits
), and once selection is complete the allocated count is divided by the number of selected features, or feature hits, at the alignment. The resulting value is added to the totals for each matching feature. By disabling this normalization step, each selected feature will receive the full amount allocated to the locus rather than the normalized portion.
Run Config Key | Commandline Argument |
---|---|
counter_decollapse: | --decollapse |
The SAM files produced by the tinyRNA pipeline are collapsed by default; alignments sharing a SEQ field are strictly multi-alignments and do not reflect original sequence counts. If this option is switched "on", tiny-count will produce a decollapsed copy of each input SAM file. Each alignment in the decollapsed SAM will be duplicated by the sequence's original count. This is useful for browsing in IGV. The indicated count will be stripped from each alignment's QNAME field so that these files remain compatible with tiny-count. If non-collapsed inputs are provided to tiny-count in standalone mode, this option will be ignored.
Run Config Key | Commandline Argument |
---|---|
counter_stepvector | --stepvector |
A custom Cython implementation of HTSeq's StepVector is used for finding features that overlap each alignment interval. While the core C++ component of the StepVector is the same, we have found that our Cython implementation can result in runtimes up to 50% faster than HTSeq's implementation. This parameter allows you to use HTSeq's StepVector if you wish.
Run Config Key | Commandline Argument |
---|---|
--in-pipeline |
This commandline argument tells tiny-count that it is running as a workflow step rather than a standalone/manual run. Under these conditions tiny-count will look for all input files in the current working directory regardless of the paths defined in the Samples Sheet and Features Sheet.
Run Config Key | Commandline Argument |
---|---|
counter_diags: | --report-diags |
Diagnostic information will include intermediate alignment files for each library and an additional stats table with information about counts that were not assigned to a feature. See the description of these outputs for details.
tiny-count (-pf FILE | --get-templates) [-o PREFIX] [-ng T/F] [-nf T/F]
[-vs T/F] [-dc] [-sv {Cython,HTSeq}] [-p] [-d]
tiny-count is a precision counting tool for hierarchical classification and
quantification of small RNA-seq reads
Required arguments:
You must either provide a Paths File or request templates for detailing
your configuration.
-pf FILE, --paths-file FILE
your Paths File (default: None)
--get-templates Copies the template configuration files required by
tiny-count into the current directory. (default:
False)
Optional arguments:
These options can be used in conjunction with the Paths File (-pf)
argument mentioned above.
-o PREFIX, --out-prefix PREFIX
The output prefix to use for file names. (default:
None)
-ng T/F, --normalize-by-genomic-hits T/F
Normalize counts by genomic hits. (default: T)
-nf T/F, --normalize-by-feature-hits T/F
Normalize counts by feature hits. (default: T)
-vs T/F, --verify-stats T/F
Verify that all reported stats are internally
consistent. (default: T)
-dc, --decollapse Create a decollapsed SAM copy of all files listed in
your Samples Sheet. This option is ignored for non-
collapsed inputs. (default: False)
-sv {Cython,HTSeq}, --stepvector {Cython,HTSeq}
Select which StepVector implementation is used to find
features overlapping an interval. (default: Cython)
-p, --in-pipeline All file inputs and outputs will be read from and
written to the working directory regardless of the
exact paths listed in configuration files. This is
convenient when working with workflow runners.
(default: False)
-d, --report-diags Produce diagnostic information about
uncounted/eliminated selection elements. (default:
False)
Run Config Key | Commandline Argument |
---|---|
dge_pca_plot | --pca |
DESeq2 can produce a PCA plot for your samples if your experiment contains biological replicates. If this option is switched "on", the PCA plot will be produced and placed in the plots subdirectory for each run.
Run Config Key | Commandline Argument |
---|---|
dge_drop_zero | --drop-zero |
Features with zero counts across all libraries will be dropped before DGE analysis if this option is switched "on".
tiny-deseq.r --input-file COUNTFILE --outfile-prefix PREFIX [--control CONDITION] [--pca] [--drop-zero]
Required arguments:
--input-file <count_file>
A text file containing a table of features x samples of the run to
process by DESeq2. The [...]feature_counts.csv output of tinyrna-count is expected here.
--outfile-prefix <outfile>
Name of the output files to write. These will be created:
1. Normalized count table of all samples
2. Differential gene expression table per comparison
3. A PCA plot per comparison, if --pca is also provided.
Optional arguments:
--control <control_condition>
If the control condition is specified, comparisons will
only be made between the control and experimental conditions.
--pca
This will produce principle component analysis plots
using the DESeq2 library. Output files are PDF format.
--drop-zero
Prior to performing analysis, this will drop all
rows/features which have a zero count in all samples."
Run Config Key | Commandline Argument |
---|---|
plot_requests: | --plots PLOT PLOT PLOT ... |
tiny-plot will only produce the list of plots requested.
Run Config Key | Commandline Argument |
---|---|
plot_pval: | --p-value VALUE |
Feature expression levels are considered significant if their P value is less than this value, with a default of 0.05. Non-differentially expressed features are plotted as gray points, and in sample_avg_scatter_by_dge_class
, these points are not colored by feature class.
Run Config Key | Paths File Key | Commandline Argument |
---|---|---|
plot_style_sheet: | --style-sheet MPLSTYLE |
The plot style sheet can be used to override the default Matplotlib styles used by tiny-plot. Unlike the other parameters, this option is found in the Paths File. The expected value for this parameter is the path to your modified style sheet. See the Plot Style Sheet documentation for more information.
Run Config Key | Commandline Argument |
---|---|
plot_vector_points: | --vector-scatter |
The scatter plots produced by tiny-plot have rasterized points by default. This allows for faster plot generation, smaller file sizes, and files that are more easily handled by PDF readers. Plots are produced in 300 dpi by default, so in most cases this rasterization is seldom noticeable under normal zoom levels. Switching this option "on" will cause points to be vectorized allowing for zooming without pixelation.
Note: only scatter points are rasterized with this option switched "off"; all other elements are vectorized in every plot type.
Run Config Key | Commandline Argument |
---|---|
plot_len_dist_min: | --len-dist-min VALUE |
plot_len_dist_max: | --len-dist-max VALUE |
The min and/or max bounds for plotted lengths can be set with this option. See tiny-plot's documentation for more information about how these values are determined if they aren't set.
Run Config Key | Commandline Argument |
---|---|
plot_dge_scatter_min: | --dge-min VALUE |
plot_dge_scatter_max: | --dge-max VALUE |
The min and/or max bounds for DGE scatter plots can be set with this option. The value you provide should be a log2 count value and can be whole or fractional, e.g. --dge-min 1.9
would produce a plot whose first tick mark is labeled 2 and would include points for feature counts as low as 3.74. Unspecified bounds are automatically calculated to fit the data, and will include the margin specified by the axes.[x/y]margin
key in the Plot Style Sheet.
Run Config Key | Commandline Argument |
---|---|
plot_unknown_class: | --unknown-class |
plot_unassigned_class: | --unassigned-class |
The labels that should be used for special groups in class_charts
and sample_avg_scatter_by_dge_class
plots. The "unknown" class group represents counts which were assigned by a Features Sheet rule which lacked a "Classify as..." label. The "unassigned" class group represents counts which weren't assigned to a feature.
Run Config Key | Commandline Argument |
---|---|
plot_class_scatter_filter: | --classes-include |
--classes-exclude |
If an inclusive filter is used, then only the classes in the list, if present, are shown. If an exclusive filter is used, then the listed classes are omitted from the plot. This behavior extends to features whose P value is above threshold. In the Run Config, the filter type can be set with the style:
sub-key, and the desired list of classes for the filter can be provided between the brackets of the classes:
sub-key
tiny-plot [-rc RAW_COUNTS] [-nc NORM_COUNTS] [-uc RULE_COUNTS]
[-ss STAT] [-dge COMPARISON [COMPARISON ...]]
[-len 5P_LEN [5P_LEN ...]] [-o PREFIX] [-pv VALUE]
[-s MPLSTYLE] [-v] [-ldi VALUE] [-lda VALUE] [-dgi VALUE]
[-dga VALUE] [-una LABEL] [-unk LABEL] [-ic CLASS [CLASS ...]
| -ec CLASS [CLASS ...]] -p PLOT [PLOT ...]
This script produces basic static plots for publication as part of the tinyRNA
workflow.
Input file requirements vary by plot type and you are free to supply only the
files necessary for your plot selections. If you are sourcing all of your
input files from the same run directory, you may find it easier to instead run
`tiny replot` within that run directory.
Required arguments:
-p PLOT [PLOT ...], --plots PLOT [PLOT ...]
List of plots to create. Options:
• len_dist: A stacked barchart showing size & 5'
nucleotide distribution.
• rule_charts: A barchart showing percentages of
counts by matched rule.
• class_charts: A barchart showing percentages of
counts per classification.
• replicate_scatter: A scatter plot comparing
replicates for all count files given.
• sample_avg_scatter_by_dge: A scatter plot comparing
all sample groups, with differentially expressed
small RNAs highlighted based on P value cutoff.
• sample_avg_scatter_by_dge_class: A scatter plot
comparing all sample groups, with classes
highlighted for differentially expressed small RNAs
based on P value cutoff.
Input files produced by tiny-count:
-rc RAW_COUNTS, --raw-counts RAW_COUNTS
The ...feature_counts.csv file
-uc RULE_COUNTS, --rule-counts RULE_COUNTS
The ...counts-by-rule.csv file
-ss STAT, --summary-stats STAT
The ...summary_stats.csv file
-len 5P_LEN [5P_LEN ...], --len-dist 5P_LEN [5P_LEN ...]
The ...nt_len_dist.csv files
Input files produced by tiny-deseq.r:
-nc NORM_COUNTS, --norm-counts NORM_COUNTS
The ...norm_counts.csv file
-dge COMPARISON [COMPARISON ...], --dge-tables COMPARISON [COMPARISON ...]
The ...cond1...cond2...deseq.csv files
Optional arguments:
-o PREFIX, --out-prefix PREFIX
Prefix to use for output filenames.
-pv VALUE, --p-value VALUE
P value to use in DGE scatter plots.
-s MPLSTYLE, --style-sheet MPLSTYLE
Optional matplotlib style sheet to use for plots.
-v, --vector-scatter Produce scatter plots with vectorized points (slower).
Note: only the points on scatter plots will be raster
if this option is not provided.
-ldi VALUE, --len-dist-min VALUE
len_dist plots will start at this value
-lda VALUE, --len-dist-max VALUE
len_dist plots will end at this value
-dgi VALUE, --dge-min VALUE
scatter_by_dge plots will start at this log2 value
-dga VALUE, --dge-max VALUE
scatter_by_dge plots will end at this log2 value
-una LABEL, --unassigned-class LABEL
Use this label in class-related plots for unassigned
counts
-unk LABEL, --unknown-class LABEL
Use this label in class-related plots for counts which
were assigned by rules lacking a "Classify as..."
value
-ic CLASS [CLASS ...], --classes-include CLASS [CLASS ...]
Only include these classes, if present, in class
scatter plots (applies regardless of P value)
-ec CLASS [CLASS ...], --classes-exclude CLASS [CLASS ...]
Omit these classes, if present, from class scatter
plots (applies regardless of P value)