Replies: 6 comments
-
Thanks Henry for your suggestion.
In fact, this is what simper does, see the examples provided with the function (4 groups, yielding 6 pairwise comparisons). We could a short sentence to the groups parameter to describe that in the case of >2 levels each pairwise comparison is computed. If you want to compare only two groups, you can just supply a subset of two groups. This is just convenience functionality for the users which usually do pairwise comparisons of multiple groups. |
Beta Was this translation helpful? Give feedback.
-
I confirm that |
Beta Was this translation helpful? Give feedback.
-
@jarioksa thanks for your comments; yeah I did the rough numbers and it is really a lot of calculations. I ended up just running the task on my high performance computing cluster. Good to know it is all “working as intended”. At first I had thought perhaps the function was doing 700 x 700 x 700 with the three groups, or that maybe that it just got stuck somewhere because it was trying to make an impossible number of comparisons. Pairwise comparisons for each pair within the group makes a lot more sense. |
Beta Was this translation helpful? Give feedback.
-
In case you're interested; I allocated a node with 32 cores and 123gb of memory to the task. It finished in a little bit over 2 hours (47 hours CPU time). |
Beta Was this translation helpful? Give feedback.
-
@hlydecker , which version of vegan did you use? I noticed that the gitHub version of We haven't implemented parallel processing in this new function, though. That's the reason why it has not been ported to the release versions. |
Beta Was this translation helpful? Give feedback.
-
@jarioksa I'll check out the gitHub version next time I need to run this locally! As you guessed, I was using the CRAN release. 10x faster is a pretty massive performance increase; good job! Personally I'll stick with using the CRAN build so that I can use parallel processing. I think any time I need to do a SIMPER with large sample sizes I'll just use my HPC cluster, and parallel processing gives huge gains. Interestingly there was some strange resource utilisation: I found on a local machine that the function was CPU bottle necked and never ended up actually using much RAM. However, on my HPC cluster I found that CPU utilisation was poor and that RAM consumption was excessive. |
Beta Was this translation helpful? Give feedback.
-
Currently vegan::simper() allows the user to run the function with 2 or more groups.
While SIMPER can work would multiple groups, there are a couple potential issues involved with groups > 2.
First, the computational needs for the calculations increase dramatically. I attempted several comparisons between three groups (3 groups x 700 samples/group x 450 species); in all cases this led to 100% CPU utilisation. I attempted this on a very powerful Macbook Pro (2.3 GHz 8-core i9, 64 gb RAM) using the parallel options; I had to kill the process after ~20 minutes of max CPU load. Interestingly RAM consumption was minimal.
Secondly, and perhaps more fundamentally, SIMPER is a comparison between two groups using Bray-Curtis dissimilarity. This dissimilarity metric is for making comparisons between two groups. While I suppose we can do multiple Bray-Curtis dissimilarity distance measures between each pair of groups, this seems to be beyond what I would expect from a standard SIMPER.
If it is still desirable to allow people to compare between > 2 groups, I propose adding in some communication to the user if the number of groups being compared is > 2. Something to warn the user that this comparison may be computationally expensive, and that it may be difficult to interpret the results (and a normal SIMPER is already hard to interpret!).
Otherwise, it might be a good idea to only allow 2 groups for the standard simper() and maybe add in a simper_n() for playing with groups >2.
P.S. I am not a mathematician, so if someone with more familiarity with actual mechanics underlying SIMPER + Bray-Curtis can chime in providing some more explanation that would be great. I'm just used to these metrics only being calculated between 2 groups.
Beta Was this translation helpful? Give feedback.
All reactions