-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
has human check #3284
has human check #3284
Changes from all commits
d5f846c
8d2cf5e
920d171
03429ca
a8b92bd
fefb57b
ad459a3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,7 @@ class Artifact(qdb.base.QiitaObject): | |
prep_template | ||
ebi_run_accession | ||
study | ||
has_human | ||
|
||
Methods | ||
------- | ||
|
@@ -1550,6 +1551,27 @@ def being_deleted_by(self): | |
res = qdb.sql_connection.TRN.execute_fetchindex() | ||
return qdb.processing_job.ProcessingJob(res[0][0]) if res else None | ||
|
||
@property | ||
def has_human(self): | ||
has_human = False | ||
if self.artifact_type == 'per_sample_FASTQ': | ||
st = self.study.sample_template | ||
if 'env_package' in st.categories: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry about whether this is stringent enough. Users who upload data, or data we import from EBI, is not assured to have the One thing we could do is test multiple variables, such as also including There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question! As background, around 4 years ago we introduced the idea of restrictions to change artifacts from sandbox to private/public so it means that all studies that have changed level since that date should have env_package. Now that doesn't mean that this is the case for older studies; however, all Metagenomic data has been added in more recent years. Anyway, to give do this more data oriented, we currently have: Counter({'public': 695, 'private': 153, 'sandbox': 3254}) and checking Study status and the existence of env_package & host_taxid, we have: What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are any of the public studies which lack env_package ones we should be concerned about? Are there other variables those studies have which could be used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMOO yes but I think we should add that column to the studies vs. trying to use another column. I'll send an email to the qiita.admin's with the studies. |
||
sql = f"""SELECT DISTINCT sample_values->>'env_package' | ||
FROM qiita.sample_{st.id} WHERE sample_id in ( | ||
SELECT sample_id from qiita.preparation_artifact | ||
LEFT JOIN qiita.prep_template_sample USING ( | ||
prep_template_id) | ||
WHERE artifact_id = {self.id})""" | ||
with qdb.sql_connection.TRN: | ||
qdb.sql_connection.TRN.add(sql) | ||
for v in qdb.sql_connection.TRN.execute_fetchflatten(): | ||
if v.startswith('human-'): | ||
has_human = True | ||
break | ||
|
||
return has_human | ||
|
||
def jobs(self, cmd=None, status=None, show_hidden=False): | ||
"""Jobs that used this artifact as input | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -235,7 +235,7 @@ def get(self, study_id): | |
(self.current_user in study.shared_with))) | ||
|
||
for a in study.artifacts(artifact_type='BIOM'): | ||
if full_access or a.visibility == 'public': | ||
if full_access or (a.visibility == 'public' and not a.has_human): | ||
to_download.extend(self._list_artifact_files_nginx(a)) | ||
|
||
self._write_nginx_file_list(to_download) | ||
|
@@ -289,7 +289,7 @@ def get(self, study_id): | |
to_download = [] | ||
for a in study.artifacts(): | ||
if not a.parents: | ||
if not is_owner and a.visibility != 'public': | ||
if not is_owner and (a.visibility != 'public' or a.has_human): | ||
continue | ||
to_download.extend(self._list_artifact_files_nginx(a)) | ||
|
||
|
@@ -460,7 +460,7 @@ def get(self): | |
artifacts = study.artifacts( | ||
dtype=data_type, artifact_type='BIOM') | ||
for a in artifacts: | ||
if a.visibility != 'public': | ||
if a.visibility != 'public' or a.has_human: | ||
continue | ||
to_download.extend(self._list_artifact_files_nginx(a)) | ||
|
||
|
@@ -498,6 +498,10 @@ def get(self): | |
raise HTTPError(404, reason='Artifact is not public. If ' | ||
'this is a mistake contact: ' | ||
'[email protected]') | ||
elif artifact.has_human: | ||
raise HTTPError(404, reason='Artifact has possible human ' | ||
'sequences. If this is a mistake contact: ' | ||
'[email protected]') | ||
else: | ||
to_download = self._list_artifact_files_nginx(artifact) | ||
if not to_download: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,7 +102,7 @@ | |
scripts=glob('scripts/*'), | ||
# making sure that numpy is installed before biom | ||
setup_requires=['numpy', 'cython'], | ||
install_requires=['psycopg2', 'click', 'bcrypt', 'pandas', | ||
install_requires=['psycopg2', 'click', 'bcrypt', 'pandas<2.0', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why the pin? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because pandas>2.0 brakes the tests. I saw that this was failing: https://github.com/qiita-spots/qiita/actions/runs/4940536571/jobs/8832562245 so installed the latests pandas version in my local computer and got a lot of errors; then installed 'pandas<2.0' and everything worked fine again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, makes sense to punt. Some of the types of changes needed are outlined here scikit-bio/scikit-bio#1851 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added #3289 so we don't forget. |
||
'biom-format', 'tornado<6.0', 'toredis', 'redis', | ||
'scp', 'pyparsing', 'h5py', 'natsort', 'nose', 'pep8', | ||
'networkx', 'humanize', 'wtforms<3.0.0', 'nltk', | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on something else, I realized that this documentation was wrong.