You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developer,
Sorry to disturb, I found a bug in using starcode and it really confused me.
I ran this command "starcode -d 1 -t 4 --input b1.fa --print-clusters > b1.cluster"
but output is like this, it seems that distance configure didn't work?
CATACCAA 2310939 AAAACCAA,AAAACCAC,AAAACCAG,AAAACCAT,AAAACCCA,AAAACCCG,......
...
...
The text was updated successfully, but these errors were encountered:
Well, this is not a bug, this is how starcode is supposed to work. What starcode reports at the output is the result of clustering the sequences after matching them at a specified distance. So, to make things simple, there are two steps:
In the first step, starcode finds all the pairs of sequences that match each other at distance 1.
At the second step, it creates a network with the matching sequences and clusters the sequences following the specified algorithm (message passing by default).
That means that even if you set -d to 1, some clusters may contain sequences that are more than 1 mismatch from each other, especially when the input data set is too dense (almost all the combination of nucleotides are present).
To prevent this you can set a more restrictive clustering ratio, but that really depends on the nature of the biological data you are trying to cluster.
I can try to help you further but I would need more information about the sequences you are feeding to starcode.
Here is an example of what is going on
Say that you have an input with three sequences:
1. ATTTGAC
2. ATTCGAC
3. ATTCCAC
We set starcode to find matches at distance 1, and finds the following matches:
1 matches 2 at distance 1
2 matches 3 at distance 1
So the network is:
ATTTGAC <-> ATTCGAC <-> ATTCCAC
Which results in a cluster where ATTCGAC is the centroid and contains the three sequences, even though ATTTGAC and ATTCCAC are at distance 2.
Dear developer,
Sorry to disturb, I found a bug in using starcode and it really confused me.
I ran this command "starcode -d 1 -t 4 --input b1.fa --print-clusters > b1.cluster"
but output is like this, it seems that distance configure didn't work?
CATACCAA 2310939 AAAACCAA,AAAACCAC,AAAACCAG,AAAACCAT,AAAACCCA,AAAACCCG,......
...
...
The text was updated successfully, but these errors were encountered: