-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
111 additions
and
71 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,15 +3,15 @@ | |
setup( | ||
name='Topyfic', # the name of your package | ||
packages=['Topyfic'], # same as above | ||
version='v0.4.13', # version number | ||
version='v0.4.15', # version number | ||
license='MIT', # license type | ||
description='Topyfic is a Python package designed to identify reproducible latent dirichlet allocation (LDA) ' | ||
'using leiden clustering and harmony for single cell epigenomics data', | ||
# short description | ||
author='Narges Rezaie', # your name | ||
author_email='[email protected]', # your email | ||
url='https://github.com/mortazavilab/Topyfic', # url to your git repo | ||
download_url='https://github.com/mortazavilab/Topyfic/archive/refs/tags/v0.4.13.tar.gz', # link to the tar.gz file associated with this release | ||
download_url='https://github.com/mortazavilab/Topyfic/archive/refs/tags/v0.4.15.tar.gz', # link to the tar.gz file associated with this release | ||
keywords=['Cellular Programs', 'Latent Dirichlet allocation', 'single-cell multiome', 'single-cell RNA-seq', | ||
'gene regulatory network', 'Topic Modeling', 'single-nucleus RNA-seq'], # | ||
python_requires='>=3.9', | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,10 +2,11 @@ | |
|
||
This directory contains a Snakemake pipeline for running the Topyfic automatically. | ||
|
||
The snakemake will run training and building model (topModel). | ||
The snakemake will run training (Train) and building model (topModel, Analysis). | ||
|
||
**Note**: Please make sure to install necessary packages and set up your Snakemake appropriately. | ||
**Note**: pipeline is tested for >= 8.* | ||
|
||
**Note**: pipeline is tested for Snakemake >= 8.X ([more info](https://snakemake.readthedocs.io/en/stable/index.html)) | ||
|
||
## Getting started | ||
|
||
|
@@ -31,55 +32,64 @@ Modify the [config file](config/config.yaml) or create a new one with the same s | |
|
||
3. **n_topics** | ||
- Contains list of number of initial topics you wish to train model base on them | ||
- list of int: [5, 10, 15, 20, 25, 30, 35, 40, 45, 50] | ||
- list of int: `[5, 10, 15, 20, 25, 30, 35, 40, 45, 50]` | ||
|
||
4. **organism** | ||
- Indicate spices which will be used for downstream analysis | ||
- Example: human or mouse | ||
|
||
5. **train** | ||
5. **workdir** | ||
- Directory to put the outputs | ||
- Make sure to have write access. | ||
- It will create one folder per dataset. | ||
|
||
6. **train** | ||
- most of the item is an input of `train_model()` | ||
- n_runs: number of run to define rLDA model (default: 100) | ||
- random_states: list of random state, we used to run LDA models (default: range(n_runs)) | ||
- workdir: directory of train outputs. make sure to have write access. | ||
|
||
6. **top_model** | ||
- workdir: directory of topModel outputs. make sure to have write access. | ||
7. **top_model** | ||
- n_top_genes (int): Number of highly-variable genes to keep (default: 50) | ||
- resolution (int): A parameter value controlling the coarseness of the clustering. Higher values lead to more clusters. (default: 1) | ||
- max_iter_harmony (int): Number of iteration for running harmony (default: 10) | ||
- min_cell_participation (float): Minimum cell participation across for each topic to keep them, when is `None`, it will keep topics with cell participation more than 1% of #cells (#cells / 100) | ||
|
||
7. **merging** | ||
- workdir: directory of merged outputs. make sure to have write access. | ||
- only if you have multiple adata input | ||
8. **merge** | ||
- Indicate if you want to also get a model for all data together. | ||
|
||
|
||
### 3. Run snakemake | ||
|
||
First run it with -n to make sure the steps that it plans to run are reasonable. | ||
After it finishes, run the same command without the -n option. | ||
First run it with `-n` to make sure the steps that it plans to run are reasonable. | ||
After it finishes, run the same command without the `-n` option. | ||
|
||
`snakemake -n` | ||
|
||
for slurm: | ||
For SLURM: | ||
|
||
``` | ||
snakemake \ | ||
-j 200 \ | ||
--latency-wait 120 \ | ||
-j 1000 \ | ||
--latency-wait 300 \ | ||
--use-conda \ | ||
--rerun-triggers mtime \ | ||
--executor cluster-generic \ | ||
--cluster-generic-submit-cmd \ | ||
"sbatch -A model-ad_lab \ | ||
--partition=standard \ | ||
--partition=highmem \ | ||
--cpus-per-task 16 \ | ||
[email protected] \ | ||
--mail-type=START,END,FAIL \ | ||
--time=72:00:00" \ | ||
-n | ||
-n \ | ||
-p \ | ||
--verbose | ||
``` | ||
highmem | ||
standard | ||
|
||
Development hints: If you ran to any error `-p --verbose` would give you more detail about each run and will help you to debug your code. | ||
|
||
|
||
### 4. Further downstream analysis | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,38 @@ | ||
from scripts import make_train | ||
|
||
configfile: 'config/config.yaml' | ||
|
||
# Rule to run make_single_train_model | ||
rule run_single_train: | ||
output: | ||
f"{config['train']['workdir']}/train_{{name}}_{{n_topic}}_{{random_state}}.p", | ||
f"{config['workdir']}/{{name}}/{{n_topic}}/train/train_{{name}}_{{n_topic}}_{{random_state}}.p", | ||
params: | ||
name=lambda wildcards: config['names'], | ||
n_topic=lambda wildcards: config['n_topics'], | ||
random_state=lambda wildcards: config["train"]["random_states"] | ||
name=lambda wildcards: wildcards.name, | ||
n_topic=lambda wildcards: wildcards.n_topic, | ||
random_state=lambda wildcards: wildcards.random_state | ||
run: | ||
make_train.make_single_train_model(name=params.name, | ||
adata_path=config['count_adata'][params.name], | ||
k=params.n_topic, | ||
random_state=params.random_state, | ||
train_output=config['train']['workdir']) | ||
k=int(params.n_topic), | ||
random_state=int(params.random_state), | ||
train_output=f"{config['workdir']}/{params.name}/{params.n_topic}/train/") | ||
|
||
# Rule to run make_train_model | ||
rule run_train_model: | ||
input: | ||
expand(f"{config['train']['workdir']}/train_{{name}}_{{n_topic}}_{{random_state}}.p", | ||
expand(f"{config['workdir']}/{{name}}/{{n_topic}}/train/train_{{name}}_{{n_topic}}_{{random_state}}.p", | ||
name=config["names"], | ||
n_topic=config["n_topics"], | ||
random_state=config["train"]["random_states"]) | ||
output: | ||
f"{config['train']['workdir']}/train_{{name}}_{{n_topic}}.p", | ||
f"{config['workdir']}/{{name}}/{{n_topic}}/train/train_{{name}}_{{n_topic}}.p", | ||
params: | ||
name=lambda wildcards: config['names'], | ||
n_topic=lambda wildcards: config['n_topics'], | ||
name=lambda wildcards: wildcards.name, | ||
n_topic=lambda wildcards: wildcards.n_topic, | ||
run: | ||
make_train.make_train_model(name=params.name, | ||
adata_path=config['count_adata'][params.name], | ||
k=params.n_topic, | ||
k=int(params.n_topic), | ||
n_runs=config['train']['n_runs'], | ||
random_state=config['train']['random_states'], | ||
train_output=config['train']['workdir']) | ||
train_output=f"{config['workdir']}/{params.name}/{params.n_topic}/train/") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters