-
Notifications
You must be signed in to change notification settings - Fork 3
Software
Development of tools and applications for Linked Open Data
- Quality assessment of SPARQL endpoints, RDF data and triple stores (cf. yummydata https://github.com/dbcls/bh14/wiki/Yummydata ) [Atsuko(interested)]
- Automation of utilization, documentation and visualization of RDF data [Atsuko(interested), Kouji(Interested), Yas]
- SPARQL Builder (Kouji, Yasunori, Atsuko) (see also YASGUI, yasgui.org)
- Improvement of current version (see http://sparqlbuilder.org/) collaborating with other groups (such as Automation of ulitization, Quality assesment, Federated query)
- Discussion on relationships between SPARQL Builder metadata, ServiceDescription and VoID
- How to describe relationships of metadata between datasets in the same SPARQL endpoint
- target endpoints (requests)
- For wrap up: https://github.com/dbcls/bh15/wiki/SPARQLBuilder
- Improvement of current version (see http://sparqlbuilder.org/) collaborating with other groups (such as Automation of ulitization, Quality assesment, Federated query)
- Schema Salad (Peter)
- Federated query via SPARQL endpoints(Hongyan, Atsuko, Jin-Dong, Kouji) (Kieron for Ensembl + PubMed)
- Text search with triple store
- Embedding ElasticSearch functions in RDF store to enhance query function and performance, such as autocompletion and so on (G. Fu)
- the BioVirtuoso Docker data containers (HiroMishima, https://github.com/misshie/bio-virtuoso )[T Nakazato(interested)]
- Bio2RDF - MichelD
- OrphaData, better RDFized HPO Annotation
- Common Workflow Language, portable workflows, container standards (Peter, Colin, Tazro)
- CWL tutorial
- wrapping tools and writing workflows using CWL
- containers, runtime configuration
- writing CWL implementations (JS, Ruby, Java?)
- tool & workflow registries, workflow metadata ontologies (Dublin core, EDAM, DOAP)
- visualisation
- software discovery
- RDF/SPARQL explorer and visualizer (Naoki)
- SMART API - semantic annotation of web APIs http://smartapi.info - (MichelD) [Nick interested]
- develop documentation
- annotate biohack15 APIs
-
Semantic Wetlab (Erick Antezana, Alexander Garcia, Tazro Ohta, Jean-Luc Perret)
- Ontologies for representing investigations [OBI?][SIO?][EDAM][ISA]
- experimental design
- rdf specification for workflows
- tools for designing, planning and running experiments (how good should look like, use case)
- Report BH 2015
- Ontologies for representing investigations [OBI?][SIO?][EDAM][ISA]
-
Reproducible Software and data deployment (Pjotr Prins)
- GUIX for databases (Jerven Bolleman, Raoul)
- Software discovery (Pjotr, Raoul)
- Ruby biogem support (Pjotr, Raoul, Naohisa)
- GUIX for UniProt RDF releases (Jerven)
- GUIX UniProt virtuoso local builds
- With minimal auto tuning for memory
- Reproducible software and data with Dat, hyperos.io (Bruno, Tazro)
- Bionode pipeline inside, visualization with BioJS, nyaplot, D3
- Visualisation
- D3 visualisation work group (Toshiaki, Naoki, Pjotr, Peter, Bruno, MichelD)
- Open-Bio
- BioRuby (Naohisa Goto,)
- Sequence metadata RDF conversion (consider using RDF.rb --Arto Bendiken)
- Revisit Bio::Sequence internal data structure for better RDFize
- https://github.com/dbcls/rdfsummit/tree/master/insdc2ttl (Toshiaki Katayama, Takatomo Fujisawa)
- UniProt RDF module (alternative to SPTRParser)
- A JSON-LD based prototype, https://github.com/nakao/uniprot-rdf-module (Mitsu)
- Biogems
- biogem simplification
- Switch from jeweler to bundler
- biogem simplification
- bio-keggapi (for REST API, SOAP is deprecated now)
- Split independent functional components from bioruby core to biogems
- Components having external dependencies
- bioruby-phyloxml ?
- Based on https://github.com/csw/bioruby-phyloxml
- Adding support for Nokogiri (XML parser) based on https://github.com/csw/bioruby/tree/phyloxml-nokogiri
- bioruby-biosql ?
- bioruby-phyloxml ?
- Components having external dependencies
- Sequence metadata RDF conversion (consider using RDF.rb --Arto Bendiken)
- BioRuby (Naohisa Goto,)
- Crick-chan - a question answering system (Kazuharu Arakawa, Kotone Itaya)
Participants: Hiro Mishima, Jeremy Nguyen Xuan, Tudor Grosa
See BioVirtuoso
Participants: ...
- CWL
- Tutorial given by Peter Amstutz. Attended by Benedict Paten, Bruno Viera, Alex Garcia, Tazro Ohta, others++
- Discussion of how to combine CWL, Docker Hub, Elixir Tool Registry to provide a central repository for bioinformatics tools that can be directly downloaded and executed in workflows (no installation needed)
- Annotating CWL files with metadata
- Ways of running Docker when the IT staff doesn't want to run Docker (solution: run Docker inside VM)
- Rebasing CWL draft 3 on Salad schema to support linked data annotations
Participants: Atsuko, Kouji, Yasunori ...
- Re-design for SPARQL Builder Matadata(SBM)
- Setup SPARQL endpoint for SBM, etc.
Participants: Atsuko, Kouji, Yasunori ...
- new version of specification of SPARQL Builder Matadata(SBM) was released http://www.sparqlbuilder.org/doc/sbm_2015sep/
- Thank to Arto, Dydra can automatically generate the metadata.
- crawling for LSDB archive rdf
- Since problems are found, we asked the author of the crawler to fix them.
- Docker with Guix package manager (Bruno, Pjotr, Raoul)
- Started installing CWL with Guix (Bruno, Pjotr)
- CWL Streaming interface (Peter, Bruno)
- Started playing with Bionode JavaScript and CWL streaming (Bruno)
- Federated query service: http://fsearch.dbcls.jp/bio2RDF/
Participants: Atsuko, Kouji, Yasunori ...
- Because anyone who are users of Dydra can generate SB metadata for their DB, we started to develop an interface for uploading SB metadata.
- alpha version of SPARQL Builder for LSDB archive
- Bio::PhyloXML is moved to https://github.com/bioruby/bioruby-phyloxml (Naohisa)
- Bio::PhyloXML is now removed from BioRuby core (https://github.com/bioruby/bioruby)
- bio-phyloxml gem will soon be released (stopping because of gem upload privilege)
- BioSQL support will be moved to https://github.com/bioruby/bioruby-biosql
- Discussing to improve BioRuby's Bio::Seqnece and https://github.com/dbcls/rdfsummit/tree/master/insdc2ttl by using RDF.rb
Participants: Fu Gang, Jeremy
Inspired by Aber-owl: embedded DL query into SPARQL (http://aber-owl.net/aber-owl/sparql/) Elasticsearch allows synonym search (doc: https://www.elastic.co/guide/en/elasticsearch/guide/current/using-synonyms.html), spell correction (doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html), and phrase match (doc: https://www.elastic.co/guide/en/elasticsearch/guide/current/phrase-matching.html).
Experiment setup:
-
Elasticsearch index of PubMed title (data file 2.4G, index file 1.7G)
-
search phrase: "drug treat disease" returned 415 230 records; "gene mutation cause disease" returned 598 057 records.
-
embed the results into sparql query using 'values' keyword: challenges cannot allow too many records.
Typical query performed in our group:
START disease = node:node_auto_index(iri={disease_id}) MATCH path = (disease)<-[:subClassOf*0..]-(diseaseSubclass)-[:hasPhenotype]->(phenotype) RETURN distinct phenotype
Expensive to run in big/deep ontologies.
Idea: Index in ElasticSearch the subclasses and superclasses of all the nodes, and delegate the expensive part of the query to ElasticSearch instead of performing it all in Cypher.
Experiment setup:
- Subset of ontologies used in the Monarch Initiative (2.1G, 74k nodes)
- Query runtime was divided by half.
- Use Bionode for streamable workflow (Bruno)
- Reason: Node.js events and Streams are very flexible for scalable pipelines and workflows
- Use Guix for package management
- Use CWL for tools integration
- Reason: Integration between several bioinformatics tools, JSON stdin and stdout (easy to integrate with Bionode).
- Streamable CWL (Peter, Bruno)
- CWL Guix package (Pjotr, Bruno)
-
Run Docker tarball with Dat/Hyperos (Bruno)
- Reason: Run tarball in non-Docker environments (e.g., HPC)
- Discussion around standards and distribution of containers for bioinformatics (Benedict, Peter, Bruno)
- Implemented CWL support within Toil (pip install toil), a scaleable workflow execution engine (Peter, Benedict)