Software

Development of tools and applications for Linked Open Data

Quality assessment of SPARQL endpoints, RDF data and triple stores (cf. yummydata https://github.com/dbcls/bh14/wiki/Yummydata ) [Atsuko(interested)]
- Automation of utilization, documentation and visualization of RDF data [Atsuko(interested), Kouji(Interested), Yas]
SPARQL Builder (Kouji, Yasunori, Atsuko) (see also YASGUI, yasgui.org)
- Improvement of current version (see http://sparqlbuilder.org/) collaborating with other groups (such as Automation of ulitization, Quality assesment, Federated query)
  - Discussion on relationships between SPARQL Builder metadata, ServiceDescription and VoID
  - How to describe relationships of metadata between datasets in the same SPARQL endpoint
- target endpoints (requests)
- For wrap up: https://github.com/dbcls/bh15/wiki/SPARQLBuilder
Schema Salad (Peter)
Federated query via SPARQL endpoints(Hongyan, Atsuko, Jin-Dong, Kouji) (Kieron for Ensembl + PubMed)
Text search with triple store
- Embedding ElasticSearch functions in RDF store to enhance query function and performance, such as autocompletion and so on (G. Fu)
the BioVirtuoso Docker data containers (HiroMishima, https://github.com/misshie/bio-virtuoso )[T Nakazato(interested)]
- Bio2RDF - MichelD
- OrphaData, better RDFized HPO Annotation
Common Workflow Language, portable workflows, container standards (Peter, Colin, Tazro)
- CWL tutorial
- wrapping tools and writing workflows using CWL
- containers, runtime configuration
- writing CWL implementations (JS, Ruby, Java?)
- tool & workflow registries, workflow metadata ontologies (Dublin core, EDAM, DOAP)
- visualisation
- software discovery
- RDF/SPARQL explorer and visualizer (Naoki)
SMART API - semantic annotation of web APIs http://smartapi.info - (MichelD) [Nick interested]
- develop documentation
- annotate biohack15 APIs
Semantic Wetlab (Erick Antezana, Alexander Garcia, Tazro Ohta, Jean-Luc Perret)
- Ontologies for representing investigations [OBI?][SIO?][EDAM][ISA]
  - experimental design
  - rdf specification for workflows
  - tools for designing, planning and running experiments (how good should look like, use case)
- Report BH 2015
Reproducible Software and data deployment (Pjotr Prins)
- GUIX for databases (Jerven Bolleman, Raoul)
- Software discovery (Pjotr, Raoul)
- Ruby biogem support (Pjotr, Raoul, Naohisa)
- GUIX for UniProt RDF releases (Jerven)
- GUIX UniProt virtuoso local builds
  - With minimal auto tuning for memory
- Reproducible software and data with Dat, hyperos.io (Bruno, Tazro)
  - Bionode pipeline inside, visualization with BioJS, nyaplot, D3
Visualisation
- D3 visualisation work group (Toshiaki, Naoki, Pjotr, Peter, Bruno, MichelD)
Open-Bio
- BioRuby (Naohisa Goto,)
  - Sequence metadata RDF conversion (consider using RDF.rb --Arto Bendiken)
    - Revisit Bio::Sequence internal data structure for better RDFize
    - https://github.com/dbcls/rdfsummit/tree/master/insdc2ttl (Toshiaki Katayama, Takatomo Fujisawa)
  - UniProt RDF module (alternative to SPTRParser)
    - A JSON-LD based prototype, https://github.com/nakao/uniprot-rdf-module (Mitsu)
  - Biogems
    - biogem simplification
      - Switch from jeweler to bundler
  - bio-keggapi (for REST API, SOAP is deprecated now)
  - Split independent functional components from bioruby core to biogems
    - Components having external dependencies
      - bioruby-phyloxml ?
        
        Based on https://github.com/csw/bioruby-phyloxml
        
        Adding support for Nokogiri (XML parser) based on https://github.com/csw/bioruby/tree/phyloxml-nokogiri
      - bioruby-biosql ?
Crick-chan - a question answering system (Kazuharu Arakawa, Kotone Itaya)
- Crick-chan

Day 1

Bio-virtuoso

Participants: Hiro Mishima, Jeremy Nguyen Xuan, Tudor Grosa

See BioVirtuoso

Day 2

project

Participants: ...

CWL
- Tutorial given by Peter Amstutz. Attended by Benedict Paten, Bruno Viera, Alex Garcia, Tazro Ohta, others++
- Discussion of how to combine CWL, Docker Hub, Elixir Tool Registry to provide a central repository for bioinformatics tools that can be directly downloaded and executed in workflows (no installation needed)
- Annotating CWL files with metadata
- Ways of running Docker when the IT staff doesn't want to run Docker (solution: run Docker inside VM)
- Rebasing CWL draft 3 on Salad schema to support linked data annotations

SPARQL Builder

Participants: Atsuko, Kouji, Yasunori ...

Re-design for SPARQL Builder Matadata(SBM)
Setup SPARQL endpoint for SBM, etc.

Day 3

SPARQL Builder

Participants: Atsuko, Kouji, Yasunori ...

new version of specification of SPARQL Builder Matadata(SBM) was released http://www.sparqlbuilder.org/doc/sbm_2015sep/
Thank to Arto, Dydra can automatically generate the metadata.
crawling for LSDB archive rdf
Since problems are found, we asked the author of the crawler to fix them.

CWL

Docker with Guix package manager (Bruno, Pjotr, Raoul)
- https://hub.docker.com/r/bmpvieira/guix/
- https://github.com/bmpvieira/Dockerfiles/blob/master/guix/Dockerfile
Started installing CWL with Guix (Bruno, Pjotr)
CWL Streaming interface (Peter, Bruno)
Started playing with Bionode JavaScript and CWL streaming (Bruno)

RDF::VCF

https://github.com/ruby-rdf/rdf-vcf

Day 4

Federated query service: http://fsearch.dbcls.jp/bio2RDF/

SPARQL Builder

Participants: Atsuko, Kouji, Yasunori ...

Because anyone who are users of Dydra can generate SB metadata for their DB, we started to develop an interface for uploading SB metadata.
alpha version of SPARQL Builder for LSDB archive
- http://www.sparqlbuilder.org/dba/

Day 5

BioRuby

Bio::PhyloXML is moved to https://github.com/bioruby/bioruby-phyloxml (Naohisa)
- Bio::PhyloXML is now removed from BioRuby core (https://github.com/bioruby/bioruby)
- bio-phyloxml gem will soon be released (stopping because of gem upload privilege)
BioSQL support will be moved to https://github.com/bioruby/bioruby-biosql
Discussing to improve BioRuby's Bio::Seqnece and https://github.com/dbcls/rdfsummit/tree/master/insdc2ttl by using RDF.rb

Day 6

Embedding Elasticsearch into SPARQL and Cypher

Participants: Fu Gang, Jeremy

Inspired by Aber-owl: embedded DL query into SPARQL (http://aber-owl.net/aber-owl/sparql/) Elasticsearch allows synonym search (doc: https://www.elastic.co/guide/en/elasticsearch/guide/current/using-synonyms.html), spell correction (doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html), and phrase match (doc: https://www.elastic.co/guide/en/elasticsearch/guide/current/phrase-matching.html).

Experiment setup:

Elasticsearch index of PubMed title (data file 2.4G, index file 1.7G)
search phrase: "drug treat disease" returned 415 230 records; "gene mutation cause disease" returned 598 057 records.
embed the results into sparql query using 'values' keyword: challenges cannot allow too many records.

Improve performance of a query

Typical query performed in our group:

START disease = node:node_auto_index(iri={disease_id}) MATCH path = (disease)<-[:subClassOf*0..]-(diseaseSubclass)-[:hasPhenotype]->(phenotype) RETURN distinct phenotype

Expensive to run in big/deep ontologies.

Idea: Index in ElasticSearch the subclasses and superclasses of all the nodes, and delegate the expensive part of the query to ElasticSearch instead of performing it all in Cypher.

Experiment setup:

Subset of ontologies used in the Monarch Initiative (2.1G, 74k nodes)
Query runtime was divided by half.

VCF to RDF Mapping

Reproducible and distributable software and data

Use Bionode for streamable workflow (Bruno)
- Reason: Node.js events and Streams are very flexible for scalable pipelines and workflows
Use Guix for package management
- Reason: Reproducible software installation, dependency management, toolchain independent from OS.
- Docker container with Guix (Bruno, Pjotr, Raoul)
Use CWL for tools integration
- Reason: Integration between several bioinformatics tools, JSON stdin and stdout (easy to integrate with Bionode).
- Streamable CWL (Peter, Bruno)
- CWL Guix package (Pjotr, Bruno)
Run Docker tarball with Dat/Hyperos (Bruno)
- Reason: Run tarball in non-Docker environments (e.g., HPC)
Discussion around standards and distribution of containers for bioinformatics (Benedict, Peter, Bruno)
Implemented CWL support within Toil (pip install toil), a scaleable workflow execution engine (Peter, Benedict)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software

Software

Day 1

Bio-virtuoso

Day 2

project

SPARQL Builder

Day 3

SPARQL Builder

CWL

RDF::VCF

Day 4

SPARQL Builder

Day 5

BioRuby

Day 6

Embedding Elasticsearch into SPARQL and Cypher

Improve performance of a query

VCF to RDF Mapping

Reproducible and distributable software and data

Clone this wiki locally