Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
chasemc committed Jul 18, 2024
1 parent 98d34ec commit 3586274
Show file tree
Hide file tree
Showing 3 changed files with 395 additions and 302 deletions.
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
Interactive version of the atlas is available at:

https://bgcatlas.pages.dev



## Search Parameters

SocialGene search parameters used:

```python
search_bgc(
input=gbk_path,
hmm_dir=hmm_dir,
outpath_clinker=outpath_clinker,
use_neo4j_precalc=True,
assemblies_must_have_x_matches=0.4,
nucleotide_sequences_must_have_x_matches=0.4,
gene_clusters_must_have_x_matches=0.4,
break_bgc_on_gap_of=10000,
target_bgc_padding=10000,
max_domains_per_protein=3,
max_outdegree=300000,
max_query_proteins=10,
scatter=True,
locus_tag_bypass_list=None,
protein_id_bypass_list=None,
only_culture_collection=True,
frac=0.75,
run_async=True,
analyze_with="blastp",
blast_speed='ultra-sensitive',
)
```

Plots with multiple contigs do not display well in clustermap without modifying locus coordinates. The `./only_display_best_hit_per_assembly.py` script was used to reprocess the search results to create clustermap plots where only the best single search result per assembly is shown. The script also limits the total number of genomes to 100 or less for performance reasons.
Loading

0 comments on commit 3586274

Please sign in to comment.