Skip to content

Commit

Permalink
Merge pull request #64 from ShawHahnLab/release-0.5.1
Browse files Browse the repository at this point in the history
Version 0.5.1
  • Loading branch information
ressy authored Apr 5, 2023
2 parents 30ae20e + e678bbf commit 809130d
Show file tree
Hide file tree
Showing 1,289 changed files with 228 additions and 70 deletions.
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# Changelog

## 0.5.1 - 2023-03-31

### Changed

* `convert` now handles edge cases for sequence input and tabular output by
always including a sequence description column in the output ([#63])

### Fixed

* `msa` will now bypass calling MUSCLE when called with just a single input
sequence, avoiding a crash ([#62])
* `convert` will now obey a custom sequence description column name if one is
given with `--col-seq-desc` ([#60])
* `getreads` command is now compatible with the latest available version of
bcl2fastq, v2.20.0.422 ([#58])
* `tree` command can now handle assigning a color code when exactly one
sequence set is defined ([#57])

[#63]: https://github.com/ShawHahnLab/igseq/pull/63
[#62]: https://github.com/ShawHahnLab/igseq/pull/62
[#60]: https://github.com/ShawHahnLab/igseq/pull/60
[#58]: https://github.com/ShawHahnLab/igseq/pull/58
[#57]: https://github.com/ShawHahnLab/igseq/pull/57

## 0.5.0 - 2023-01-04

### Added
Expand Down
2 changes: 1 addition & 1 deletion conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html
{% set version = "0.5.0" %}
{% set version = "0.5.1" %}
{% set build = "0" %}

package:
Expand Down
8 changes: 8 additions & 0 deletions igseq/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,12 @@ def convert(path_in, path_out, fmt_in=None, fmt_out=None, colmap=None, dummyqual
with RecordReader(path_in, fmt_in, colmap, dry_run=dry_run) as reader, \
RecordWriter(path_out, fmt_out, colmap, dummyqual=dummyqual, dry_run=dry_run) as writer:
for record in reader:
# special case for descriptions: they may or may not exist on any
# particular record for seq input, but for tabular output, we have
# to have consistent columns. So in that case make sure to include
# a description column by forcing one for the first record to be
# written.
if not writer.writer and writer.tabular and not reader.tabular:
key = reader.colmap["sequence_description"]
record[key] = record.get(key, "")
writer.write(record)
4 changes: 2 additions & 2 deletions igseq/data/examples/outputs/convert/unwrapped.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id,sequence
seqid,ACTGACTGACTGACTG
sequence_id,sequence,sequence_description
seqid,ACTGACTGACTGACTG,
9 changes: 4 additions & 5 deletions igseq/getreads.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
those options.
"""

import sys
import logging
import subprocess
from tempfile import NamedTemporaryFile
Expand Down Expand Up @@ -98,10 +99,6 @@ def getreads(path_input, dir_out, path_counts="", extra_args=None, threads_load=
"--sample-sheet", sample_sheet.name,
# parallel processing during loading can help a bit in my tests
"--loading-threads", threads_load,
# parallel processing does *not* help during the bcl2fastq
# demultiplexing step, go figure, when we don't have any
# demultiplexing to perform here
"--demultiplexing-threads", 1,
# parallel processing in the processing step helps quite a bit
"--processing-threads", threads_proc,
# help text says "this must not be higher than number of
Expand Down Expand Up @@ -165,4 +162,6 @@ def _run_bcl2fastq(args, extra_args=None):
raise util.IgSeqError(f"bcl2fastq arg collision from extra arguments: {shared}")
args += extra_args
LOGGER.info("bcl2fastq command: %s", args)
subprocess.run(args, check=True)
proc = subprocess.run(args, check=True, capture_output=True, text=True)
sys.stdout.write(proc.stdout)
sys.stderr.write(proc.stderr)
2 changes: 1 addition & 1 deletion igseq/identity.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
The scoring is based on a simple global pairwise alignment, with matches scored
as 1, mismatches and gaps 0. Any existing gaps are removed before comparing
sequences, and differeces in case (lower/upper) are disregarded.
sequences, and differences in case (lower/upper) are disregarded.
"""

import logging
Expand Down
2 changes: 1 addition & 1 deletion igseq/igblast.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
command, so you can configure things like the output format and file path. See
igblastn -help for those options. Any igblastn argument can be given with two
dashes if needed to force igseq to handle it correctly (for example,
-num_alignments_V will be interprted as -n um_alignments_V, but
-num_alignments_V will be interpreted as -n um_alignments_V, but
--num_alignments_V will work).
"""

Expand Down
5 changes: 3 additions & 2 deletions igseq/msa.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ def msa(path_in, path_out, fmt_in=None, fmt_out=None, colmap=None, dry_run=False
def run_muscle(records):
"""Align a set of records with MUSCLE."""
# muscle crashes with empty input, so we'll just do a noop for that case
if not records:
LOGGER.warning("no records provided to align; skipping MUSCLE")
if len(records) < 2:
detail = "only one record" if len(records) else "no records"
LOGGER.warning("%s provided to align; skipping MUSCLE", detail)
return records
args = ["muscle", "-align", "/dev/stdin", "-output", "/dev/stdout"]
with Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True) as proc:
Expand Down
10 changes: 7 additions & 3 deletions igseq/record.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,10 @@ def infer_fmt(self, fmt=None):
fmt = fmt_inferred
return fmt

@property
def tabular(self):
return self.fmt in ["csv", "tsv", "csvgz", "tsvgz"]

@staticmethod
def _infer_fmt(path):
try:
Expand Down Expand Up @@ -141,7 +145,7 @@ def decode_record(self, obj):
if quals:
record[self.colmap["sequence_quality"]] = self.encode_phred(quals)
if seq_desc is not None:
record["sequence_description"] = seq_desc
record[self.colmap["sequence_description"]] = seq_desc
else:
record = obj
return record
Expand Down Expand Up @@ -267,7 +271,7 @@ def _write_fa(self, record):
seq = record[self.colmap["sequence"]]
defline = record[self.colmap["sequence_id"]]
desc = record.get(self.colmap["sequence_description"])
if desc is not None:
if desc:
defline += f" {desc}"
if not self.dry_run:
self.handle.write(f">{defline}\n{seq}\n")
Expand All @@ -283,7 +287,7 @@ def _write_fq(self, record):
"No quality scores available, using default dummy value: %s",
DEFAULT_DUMMY_QUAL)
quals = "".join(DEFAULT_DUMMY_QUAL * len(seq))
if desc is not None:
if desc:
defline += f" {desc}"
if not self.dry_run:
self.handle.write(f"@{defline}\n{seq}\n+\n{quals}\n")
2 changes: 1 addition & 1 deletion igseq/tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ def make_seq_set_colors(seq_sets):
# adapted from SONAR
# this stretches across COLORS in even increments for as many as we need here
num = len(colors.COLORS)
subset = [int( a * (num-1) / (len(seq_sets)-1) ) for a in range(num)]
subset = [int( a * (num-1) / max(1, (len(seq_sets)-1)) ) for a in range(num)]
try:
seq_set_colors[set_name] = colors.color_str_to_trio(colors.COLORS[subset[idx]])
except IndexError:
Expand Down
2 changes: 1 addition & 1 deletion igseq/trim.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
By default this will remove any instances of the R2 adapter found toward the
end of R1 and the R1 adapter found toward the end of R2. It will also insist
that the 5' RACE Anchor be found at the start of R1, discarding read pairs that
are mising the anchor. The adapter sequences will be determined from the
are missing the anchor. The adapter sequences will be determined from the
barcodes used for each sample and the selected species.
"""

Expand Down
2 changes: 1 addition & 1 deletion igseq/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
See https://www.python.org/dev/peps/pep-0396/
"""

__version__ = "0.5.0"
__version__ = "0.5.1"
4 changes: 2 additions & 2 deletions test_igseq/data/test_convert/TestConvert/unwrapped.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id,sequence
seqid,ACTGACTGACTGACTG
sequence_id,sequence,sequence_description
seqid,ACTGACTGACTGACTG,
4 changes: 2 additions & 2 deletions test_igseq/data/test_convert/TestConvert/unwrapped.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id sequence
seqid ACTGACTGACTGACTG
sequence_id sequence sequence_description
seqid ACTGACTGACTGACTG
4 changes: 2 additions & 2 deletions test_igseq/data/test_convert/TestConvert/unwrapped_quals.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id,sequence,sequence_quality
seqid,ACTGACTGACTGACTG,IIIIIIIIIIIIIIII
sequence_id,sequence,sequence_quality,sequence_description
seqid,ACTGACTGACTGACTG,IIIIIIIIIIIIIIII,
4 changes: 2 additions & 2 deletions test_igseq/data/test_convert/TestConvert/unwrapped_quals.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id sequence sequence_quality
seqid ACTGACTGACTGACTG IIIIIIIIIIIIIIII
sequence_id sequence sequence_quality sequence_description
seqid ACTGACTGACTGACTG IIIIIIIIIIIIIIII
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
SeqID,sequence
seqid,ACTGACTGACTGACTG
SeqID,sequence,sequence_description
seqid,ACTGACTGACTGACTG,
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sequence_id,sequence
seqid,ACTGACTGACTGACTG
sequence_id,sequence,sequence_description
seqid,ACTGACTGACTGACTG,
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
SeqID,Seq
seqid,ACTGACTGACTGACTG
SeqID,Seq,sequence_description
seqid,ACTGACTGACTGACTG,
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
SeqID,Seq,SeqDesc
seqid,ACTGACTGACTGACTG,desc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SeqID,Seq,SeqDesc
seqid,ACTGACTGACTGACTG,
seqid2,TCTGACTCACTGAGTA,desc
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
SeqID,Seq,SeqQual
seqid,ACTGACTGACTGACTG,IIIIIIIIIIIIIIII
SeqID,Seq,SeqQual,sequence_description
seqid,ACTGACTGACTGACTG,IIIIIIIIIIIIIIII,
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
>seqid desc
ACTGACTG
ACTGACTG
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
>seqid
ACTGACTG
ACTGACTG
>seqid2 desc
TCTGACTC
ACTGAGTA
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 809130d

Please sign in to comment.