Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripting access of django databases #276

Merged
merged 40 commits into from
Aug 1, 2019
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f7b6cd3
addblast.py compoleted
invalid-email-address Apr 17, 2019
b3fe6df
addorganism.py completed
invalid-email-address Apr 17, 2019
49f9768
split addblast to 3 steps and makeblastdb.pu completed
invalid-email-address Apr 17, 2019
7729f38
populate sequence functionality completed and fix addorganism script
invalid-email-address Apr 17, 2019
cd55449
addhmmer.py completed
invalid-email-address Apr 17, 2019
8051e72
fix some blank space after else:
invalid-email-address Apr 17, 2019
27f4765
fix the security vulnerabilities for django 1.11.15
invalid-email-address Apr 18, 2019
64f5dcc
fix the missing argument in handle() and add the functionality to hav…
invalid-email-address Apr 18, 2019
3696bc9
fix trailing white space and unuse module
invalid-email-address Apr 18, 2019
a341a7e
intergrate the similar funtion to add_func.py
invalid-email-address May 6, 2019
277ce81
upgrade django to 1.11.20
invalid-email-address May 7, 2019
509ccc0
fix trailing space and unused module
invalid-email-address May 7, 2019
83667db
fix blast trailing space
invalid-email-address May 7, 2019
b049fd3
fix the identical code in blast models.py and makeblastdb
invalid-email-address May 7, 2019
a2a1a01
remove identical code from populatesequence and makeblastdb
invalid-email-address May 7, 2019
80d4ed2
fix the issue of display_name function
invalid-email-address May 7, 2019
d278457
fix the similar code for get_path function
invalid-email-address May 7, 2019
c45b9a6
fix codacy/PR quality review
invalid-email-address May 7, 2019
bcb66ce
put get_type function into add_func.py and fix rest of trailing space…
invalid-email-address May 8, 2019
a6e97f8
move add_func.py to misc folder and move maek display_name func much …
invalid-email-address May 8, 2019
3133600
put all the function in addorganism to add_func and import in addhmmer
invalid-email-address May 8, 2019
dad1742
split get_type to three function
invalid-email-address May 8, 2019
d5b594f
cost down cognitive complexity
invalid-email-address May 8, 2019
d269d65
deal with the cognitive complexity of get_dataset
invalid-email-address May 9, 2019
7829b39
change the code in def get_type
invalid-email-address May 9, 2019
d6f096a
fix code in get_type
invalid-email-address May 9, 2019
05096f2
modify if else condition to expect case in get_type
invalid-email-address May 9, 2019
c8bffd6
combine makeblastdb and populatesequence
invalid-email-address May 9, 2019
732fa4c
complete the delete function
invalid-email-address May 14, 2019
ec0a42f
fix some trailing issue and refactor the structure of delete function
invalid-email-address May 15, 2019
4f2eddc
complete all of the functionality
invalid-email-address May 16, 2019
9af9af5
modify get_type() to decrease complexity
invalid-email-address May 20, 2019
c3868ea
modify addblast is_shown default setting and add is_shonw command
invalid-email-address May 20, 2019
c7d207f
split is_shown feature from blast_utility
invalid-email-address Jun 11, 2019
c54a6a7
Merge branch 'master' into addorg2
deming7h777 Jun 11, 2019
1750dd4
add __init__.py
invalid-email-address Jul 25, 2019
f92c8a7
remove useless line in add_func.py
invalid-email-address Jul 25, 2019
a70ff41
update django 1.11.22
invalid-email-address Jul 25, 2019
adac21e
add more __init__.py
invalid-email-address Jul 25, 2019
555ca93
comment unused module
invalid-email-address Jul 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions app/management/commands/addorganism.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from app.models import Organism
from django.core.management.base import BaseCommand
import django
from add_func import display_name, short_name, get_description, get_taxid

id_baseurl = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&retmode=json&term='
wiki_url1 = 'https://en.wikipedia.org/w/api.php?action=query&list=search&srprop=snippet&srlimit=1&format=json&srsearch='
wiki_url2 = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=true&titles='

class Command(BaseCommand):

def add_arguments(self,parser):
parser.add_argument('Genus_Species',nargs='+',type=str)
#parser.add_argument('Species',nargs='*',type=str)
#parser.add_argument('Species2',nargs='?',type=str)
def handle(self,*args,**options):

name = display_name(options)
shortname = short_name(name)
url1 = wiki_url1 + name
description = get_description(url1,wiki_url2)
tax_id = get_taxid(id_baseurl,name)
new_org = Organism(display_name=name, short_name=shortname, description=description, tax_id=tax_id)

try:
new_org.save()
print("Succeessfully add to database")
except django.db.utils.IntegrityError:
print("adding database failed, check if this organism is already in the database and try again")
32 changes: 32 additions & 0 deletions blast/management/commands/addblast.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from blast.models import BlastDb
from django.core.management.base import BaseCommand
#from app.models import Organism
#sys.path.append('genomics-workspace/app/management/commands/add_func.py')
from add_func import get_organism, display_name, get_path, get_type, get_molecule, get_dataset

class Command(BaseCommand):

def add_arguments(self,parser):
parser.add_argument('Genus_Species',nargs='+',type=str)
parser.add_argument('-t','--type',nargs='+',type=str,help='please enter nucleotide or peptide and enter Genome Assembly or Protein or Transcript')
parser.add_argument('-f','--filename',nargs=1,type=str)

def handle(self,*args,**options):

name=display_name(options)
organism = get_organism(name)
if organism:#check whether organism is exist or not
molecule2,molecule_str = get_molecule(options)
dataset,dataset_str = get_dataset(options)
blast_type = get_type(dataset,molecule2,molecule_str,dataset_str)
title = options['filename'][0]
fasta_file_path = get_path('blast',title)
new_db = BlastDb(organism = organism, type = blast_type, fasta_file = fasta_file_path, title = title, description = '', is_shown = False )
new_db.save()
print("you can move to makeblastdb and populate sequence step")
#except django.db.utils.IntegrityError:
#print("This database already exists")
#sys.exit(0)
else :
pass
#TODO can use subprocess lib here to add new organism
26 changes: 26 additions & 0 deletions blast/management/commands/blast_shown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from blast.models import BlastDb
from django.core.management.base import BaseCommand
import sys

class Command(BaseCommand):

def add_arguments(self,parser):
parser.add_argument('BlastDb', nargs='+', type=str, help='enter the blastdb name')
parser.add_argument('--shown', nargs='*', help= 'make blastdb show or not ex: python manage.py blast_shown [xxx.fa] [xxx.fa] --shown true/false')
def handle(self,*args,**options):

n=0;
title = options['BlastDb']
for title in title:
blast2 = BlastDb.objects.filter(title = title)
print options
n+=1
if options['shown'][0] == 'true':
blast2.update(is_shown = True)
elif options['shown'][0] == 'false':
blast2.update(is_shown = False)
else:
print("please choose --shown for true or false")
sys.exit(0)
print("%d species finished "%n)
print("all done")
29 changes: 29 additions & 0 deletions blast/management/commands/blast_utility.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from blast.models import BlastDb
from django.core.management.base import BaseCommand
import sys

class Command(BaseCommand):

def add_arguments(self,parser):
parser.add_argument('BlastDb', nargs='+', type=str, help='enter the blastdb name')
parser.add_argument('-m','--makeblastdb', nargs='*', help = 'execute makeblastdb command to specific blastdb, ex: python manage.py blast_utility [xxx.fa] [xxx.fa] -m')
parser.add_argument('-p','--populatesequence', nargs='*', help = 'populate specifice blastdb, ex: python manage.py blast_utility [xxx.fa] [xxx.fa] -p')
#parser.add_argument('--shown', nargs='*', help= 'make blastdb show or not ex: python manage.py blast_utility [xxx.fa] [xxx.fa] --shown true/false')

def handle(self,*args,**options):

n=0;
title = options['BlastDb']
for title in title:
blast = BlastDb.objects.get(title = title)
#print blast
n+=1
if options['makeblastdb'] == []:
blast.makeblastdb()
elif options['populatesequence'] == []:
blast.index_fasta()
else:
print("please choose -m for makeblastd, -p for populate sequence, --shown for true or false")
sys.exit(0)
print("%d species finished "%n)
print("all done")
30 changes: 30 additions & 0 deletions hmmer/management/commands/addhmmer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from hmmer.models import HmmerDB
from django.core.management.base import BaseCommand
#from app.models import Organism
#import django.db
from add_func import get_organism, display_name, get_path

class Command(BaseCommand):

def add_arguments(self,parser):
parser.add_argument('Genus_Species',nargs='+',type=str)
parser.add_argument('-f','--filename',nargs=1,type=str)

def handle(self,*args,**options):

name=display_name(options)
organism = get_organism(name)
#print options
if organism:#check whether organism is exist or not

title = options['filename'][0]
fasta_file_path = get_path('hmmer',title)
new_db = HmmerDB(organism = organism, fasta_file = fasta_file_path, title = title, description = '', is_shown = True )
new_db.save()
print("Success")
#except django.db.utils.IntegrityError:
#print("This database already exists")
#sys.exit(0)
else :
pass
#can use subprocess lib here to add new organism
160 changes: 160 additions & 0 deletions misc/add_func.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
from blast.models import BlastDb, SequenceType
#from django.core.management.base import BaseCommand, CommandError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import directly rather than commenting it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

from app.models import Organism
import os
import sys
import requests
from hmmer.models import HmmerDB


def display_name(options):
try:
base_organism = options['Genus_Species'][0].lower().capitalize() + ' ' + options['Genus_Species'][1].lower()
except TypeError:
return 0
if len(options['Genus_Species']) == 3:
display_name = base_organism + ' '+ options['Genus_Species'][2].lower()
return display_name

else:
display_name = base_organism
return display_name

def get_organism(display_name):

organism_database = Organism.objects.get(display_name = display_name)
if organism_database :
return organism_database
else:
print("check your organism name again if it still fails then check your organism database")
sys.exit(0)

def get_path(app_name,title):
base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
if app_name == 'blast':
path = os.path.join('blast/db',title)
else:
path = os.path.join('hmmer/db',title)

a=os.path.join(base_dir,'media',path)
check = os.path.isfile(a)
if check:
return path
else:
print("No fasta file in media/blast/db or media/hmmer/db")
sys.exit(0)

def short_name(name):
short_name = name.split(' ')
short_name1 = short_name[0][0:3]
short_name2 = short_name[1][0:3]
short_name = short_name1 + short_name2
return short_name

def get_molecule(options):
try:
molecule = options['type'][0].lower() #get molecule_type from command line
if molecule == 'peptide': #change the name tp prot or nucl
molecule2 = 'prot'
elif molecule == 'nucleotide':
molecule2 = 'nucl'
else:
print("please enter the correct molecule_type, must be nucleotide or peptide")
sys.exit(0)
except Exception :
print("enter the argument complete '-t' '-f' ")
sys.exit(0)
molecule_type = SequenceType.objects.filter(molecule_type = molecule2) #get the data from molecule_type field
a = molecule_type[0]
molecule_str = a.molecule_type
return molecule2,molecule_str

def get_dataset(options):

dataset = options['type'][1].lower().capitalize()
if dataset =='Genome':
dataset = dataset + ' ' + options['type'][2].lower().capitalize()
elif dataset == 'Transcript':
pass
elif dataset == 'Protein':
pass
else:
print('enter the correct dataset type')
sys.exit(0)
dataset_type = SequenceType.objects.filter(dataset_type = dataset)
b = dataset_type[0]
dataset_str = str(b.dataset_type)
return dataset,dataset_str

def get_type(dataset,molecule2,molecule_str,dataset_str): #get the sequence type from SequencType Table

if molecule2 != molecule_str :
print("something wrong in molecule")
elif dataset != dataset_str :
print("something wrong with dataset")
else:
try:
dataset_type = SequenceType.objects.filter(molecule_type = molecule2, dataset_type = dataset)
return dataset_type[0]
except IndexError:
print("there are no {molecule} - {dataset} combination in the database".format(molecule=molecule2.capitalize(),dataset=dataset_str))
sys.exit(0)
def get_description(url1,wiki_url2):
try:
re1 = requests.get(url1)
data1 = re1.json()
try:
title = data1['query']['search'][0]['title']
url2 = wiki_url2 + title
re2 = requests.get(url2)
data2 = re2.json()
key = data1['query']['search'][0]['pageid']
key = str(key)
#print type(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debug purpose code will be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

description = data2['query']['pages'][key]['extract']
#print description
return description
except IndexError:
print("check your organism name again")
sys.exit(0)
except requests.exceptions.ConnectionError:
print("check your internet connection")
sys.exit(0)

def get_taxid(id_baseurl,name):
try:
url = id_baseurl+ name
re = requests.get(url)
data = re.json()
tax_id = data['esearchresult']['idlist'][0]
tax_id = int(tax_id)
return tax_id
except IndexError:
print("make sure your name is completed and correct")
sys.exit(0)

def delete_org(name):
#if options["organism"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused lines here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gottcha

#for organism in options["organism"]:
#organism = options["organism"][0].lower().capitalize() + " " + options["organism"][1].lower()
Organism.objects.filter(display_name = name).delete()
return ("remove %s in database"%name)
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't include these commented out lines. It seems to me that these lines are for initial development, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good point

def delete(db, dbname):
tmp=[]
if db[0]=='all':
if dbname=='blast':
BlastDb.objects.all().delete()
print("remove all data in blast")
else:
HmmerDB.objects.all().delete()
print("remove all data in hmmer")
else:
for name in db :
if dbname=='blast':
BlastDb.objects.filter(title = name).delete()
else:
HmmerDB.objects.filter(title = name).delete()
tmp.append(name)
return "remove %s "%tmp
'''