-
Notifications
You must be signed in to change notification settings - Fork 3
VCF to RDF Mapping
Arto Bendiken edited this page Sep 18, 2015
·
22 revisions
The team worked to produce an ontology mapping and supporting software to expose Variant Call Format (VCF) files as linked data, in order to facilitate offline batch conversion of VCF to various RDF formats as well as to enable online SPARQL query access to compressed VCF files directly.
- Worked on by Raoul, Arto, and Kieron, with some support from Pjotr and Jerven.
- Built on previous work at BioHackathon 2014 and RDF Summit by Raoul, Francesco, Kieron, James, Will and Simon.
-
https://github.com/ruby-rdf/rdf-vcf (RDF::VCF plugin for RDF.rb)
- Released on RubyGems, installable with
jruby -S gem install rdf-vcf
. (Note: Requires JRuby 9.0+.) - Includes a
vcf2rdf
program to transform VCF files into RDF as a batch job. - Implements an RDF.rb reader for VCF and BCF files, supporting also bgzipped and tabix-indexed files.
- Released on RubyGems, installable with
-
https://github.com/helios/bio-sparql-otf (OTF: On-the-Fly conversion)
- Implements a proof-of-concept-quality SPARQL backend for VCF files, based on RDF.rb and RDF::VCF.
- Continued from BioHackathon 2014 work: https://github.com/dbcls/bh14/wiki/On-The-Fly-RDF-converter
-
https://github.com/JervenBolleman/sparql-vcf (Java+Sesame)
- Implements a proof-of-concept-quality SPARQL backend for VCF files, based on Sesame.
There are currently two distinct mappings:
-
Ensembl/FALDO. This originates from BioHackathon 2014 work and is presently described in the RDF::VCF source code.
-
GFVO (Genomic Feature and Variation Ontology)
The CLI utility called vcf2rdf
transforms VCF files into RDF (currently outputting N-Triples):
vcf2rdf Homo_sapiens.1.vcf.gz Homo_sapiens.2.vcf.gz ...
The input files can be either plain-text VCF or compressed by bgzip
(as in the above example).
The RDF::VCF
gem can be used like any other RDF.rb reader plugin:
# Load the RDF::VCF library:
require 'rdf/vcf'
# Open a VCF file for reading:
RDF::VCF::Reader.open('Homo_sapiens.vcf.gz') do |reader|
# Loop over all generated RDF statements:
reader.each_statement do |statement|
# Print the RDF statement to the screen:
$stdout.puts statement.inspect
end
end
SELECT ?s ?quality WHERE {
?s faldo:location ?location .
?location faldo:reference [ dc:identifier "Y" ] .
?location faldo:begin [ faldo:position ?begin ] .
?location faldo:end [ faldo:position ?end ] .
?s vcf:quality ?quality .
FILTER(?begin >= 2749180)
FILTER(?end <= 2755180)
}