-
Notifications
You must be signed in to change notification settings - Fork 3
VCF to RDF Mapping
The team worked to produce an ontology mapping and supporting software to expose Variant Call Format (VCF) files as linked data, in order to facilitate offline batch conversion of VCF to various RDF formats as well as to enable online SPARQL query access to compressed VCF files directly.
- Worked on by Raoul, Arto, and Kieron, with some support from Pjotr and Jerven.
- Built on previous work at BioHackathon 2014 and RDF Summit by Raoul, Francesco, Kieron, James, Will and Simon.
-
https://github.com/ruby-rdf/rdf-vcf (RDF::VCF plugin for RDF.rb)
- Released on RubyGems, installable with
jruby -S gem install rdf-vcf
. (Note: Requires JRuby 9.0+.) - Includes a
vcf2rdf
program to transform VCF files into RDF as a batch job. - Implements an RDF.rb reader for VCF and BCF files, supporting also bgzipped and tabix-indexed files.
- Released on RubyGems, installable with
-
https://github.com/helios/bio-sparql-otf (OTF: On-the-Fly conversion)
- Implements a proof-of-concept-quality SPARQL backend for VCF files, based on RDF.rb and RDF::VCF.
- Continued from BioHackathon 2014 work: https://github.com/dbcls/bh14/wiki/On-The-Fly-RDF-converter
-
https://github.com/JervenBolleman/sparql-vcf (Java+Sesame)
- Implements a proof-of-concept-quality SPARQL backend for VCF files, based on Sesame.
There are currently two distinct mappings, both which are planned to be supported in the RDF::VCF software:
-
Ensembl/FALDO. This originates from BioHackathon 2014 work and is presently described in the RDF::VCF source code.
-
GFVO (Genomic Feature and Variation Ontology) by Erick, Robert, Michel, et al. This is described in the paper and at BioInterchange/Ontologies. Supports GFF3, GTF, GVF, and VCF files.
The CLI utility called vcf2rdf
transforms VCF files into RDF (currently outputting N-Triples):
vcf2rdf Homo_sapiens.1.vcf.gz Homo_sapiens.2.vcf.gz ...
The input files can be either plain-text VCF or compressed by bgzip
(as in the above example).
The RDF::VCF
gem can be used like any other RDF.rb reader plugin:
# Load the RDF::VCF library:
require 'rdf/vcf'
# Open a VCF file for reading:
RDF::VCF::Reader.open('Homo_sapiens.vcf.gz') do |reader|
# Loop over all generated RDF statements:
reader.each_statement do |statement|
# Print the RDF statement to the screen:
$stdout.puts statement.inspect
end
end
SELECT ?s ?quality WHERE {
?s faldo:location ?location .
?location faldo:reference [ dc:identifier "Y" ] .
?location faldo:begin [ faldo:position ?begin ] .
?location faldo:end [ faldo:position ?end ] .
?s vcf:quality ?quality .
FILTER(?begin >= 2749180)
FILTER(?end <= 2755180)
}