+ + +

Some VIVO Things Blog


Musings on the community, software, data, use, and whatever else comes to mind.

+ + + +
+ +
+ +
+ +

Is VIVO Fair?

+ +

NOTE: Apologies in advance. This post is a bit longer than I would like, and contains some unavoidable technical + terms. I have tried to provide citations for each term, recognizing that this will further lengthen the reading + for some. I felt it was better to address this topic in one post rather than break it in two. I hope that is + good for all.


The FAIR data principles (https://goo.gl/MFTfC6) developed by Force 11 (https://www.force11.org) + are increasingly popular and provide a means for assessing whether data is being shared in a + useful manner for others.


VIVO sites produce data in the form of assertions about the connected graph of research and + scholarship. How does VIVO stack up against the FAIR data principles?


Findable. VIVO data is quite findable. VIVO includes schema.org tags (http://schema.org) on + its pages to improve search engine finding. VIVO has a registry of sites + (https://goo.gl/9Thaa8) with URLs for the sites. VIVO sites can participate in Direct2Experts + (http://direct2experts.org/), another finding tool. VIVO site data is aggregated by CTSAsearch + (https://goo.gl/Du3Fwn), yet another finding tool. OpenVIVO (http://openvivo.org) provides + its data as constantly updated text files on the web. These files are very easy to find using + a search engine (hint: search for "OpenVIVO data"). And with the addition of Triple Pattern + Fragments (TPF) (https://goo.gl/k1BtFQ), in the next release of VIVO, I expect additional tools + to be developed to find VIVO data. The future is bright to further improve "find ability" of + VIVO data.


Accessible. If people can find your VIVO data, can they access it? The answer is yes. VIVO is + designed to share its data. Every page in VIVO can be accessed as HTML, which browsers use to + render the page for humans to read, and as RDF (https://www.w3.org/RDF/), a machine readable + data format for computers to read. This is one of VIVO's strongest features, and one of its + biggest secrets. Programmers can access VIVO's data starting from almost any page in VIVO, + because VIVO provides a connected graph of scholarship and research. Starting at a person, one + can find papers, leading to co-authors. Starting at an organization, one can find people who + have positions in the organization. Starting at a grant, one can find the funding agency, + investigators, and so on. VIVO makes traversing the graph straightforward.


Additionally, sites may export their data to files accessible on the Internet, as OpenVIVO does, + or provide a SPARQL (https://en.wikipedia.org/wiki/SPARQL) endpoint. The TPF feature in the + next release of VIVO will make VIVO data even easier to access.


Interoperable. VIVO data, modeled using the VIVO ontology, is amazingly interoperable. Two sets + of VIVO data can be combined simply by putting them in the same file. No other work is needed. + All VIVO sites and sites exporting VIVO data (there are many) are fully interoperable. They + share the same data format (RDF) and the same representation/vocabulary (The VIVO Ontology).


Interoperability is lowered when sites do not use the same version of the VIVO ontology. While + each version is a valid representation of scholarship, the ontology currently does not provide + equivalence between versions. This must be done by software attempting to use multiple versions + of the ontology. Future work may lower the effort currently needed to use multiple ontology + versions.


Interoperability can be lowered when VIVO sites extend the ontology in custom ways to represent + additional elements in VIVO, or to represent elements that should be common and in the ontology. + The VIVO community needs to work with sites to identify elements that should be in the common + ontology to avoid such customizations.


Similarly, interoperability can be lowered when sites use custom vocabulary to represent + research concepts. The VIVO community needs more work to develop best practices for + presenting the concepts underlying research areas of scholars, and subject areas of their + works.


Reusable VIVO data, modeled by the VIVO ontology, achieves the highest standards for + reusability. VIVO data is "Five Star Linked Data (https://goo.gl/GRN1RV)," a term coined by + Tim Berners-Lee. (https://goo.gl/rrjzmZ). VIVO data is 1) on the web; 2) machine readable + structured data; 3) uses a non-proprietary format; 4) published using open W3C standards; and + 5) links to other open data. Anyone on the Internet can reuse VIVO data.


And yet, there are things we can do to improve reusability. We can clarify the license under + which sites provide VIVO data, and provide that information with the data. We can clarify + where sites obtained their data and provide that information with the data. VIVO's current + practice is to "inherit" provenance information from the source providing the information -- + that is, if the data came from site x, we currently assume site x provided the data. We can + go further and assert such facts explicitly in the VIVO data. We currently assume that VIVO + data is provided by each site in a manner that supports reuse with attrbution. We can clarify + this by providing a license assertion in the VIVO data.


Each VIVO site determines for itself how best to meet the FAIR data principles, if at all. + Some sites share their data freely, while others rely on the delivered VIVO software to share + their data. Still others have their data behind firewalls, preventing sharing. Unshared data + cannot be FAIR.


Each of the FAIR data principles has sub-headings providing further guidance regarding what it + means to be Findable, Accessible, Interoperable, and Reusable. I urge you take a look at the + principles and consider how VIVO can be improved and how your data practices can be improved to + further the goal of VIVO data as FAIR data.


There is more that VIVO can do to improve VIVO's data as FAIR data. We are all learning how + to be FAIR. I think VIVO Is doing well and can do better.


So perhaps a short working answer to "Is VIVO FAIR?" is: 1) the VIVO project supports the FAIR + data principles; 2) the VIVO ontology is a strong element of VIVO which supports the FAIR data + principles; 3) the VIVO software provides features which support the FAIR data principles; + and 4) VIVO sites provide VIVO data and each can share data according to the FAIR data + principles.


If you are involved with a VIVO site and are non-technical, you may wish to discuss with your + technical staff how your site is addressing FAIR data principles. If you are at a VIVO site + and are technical, you may wish to speak with the non-technical members of the team regarding + how your site should address FAIR data principles. Working together, sites should be able to + align their practices with their institutional requirements and with the FAIR data + principles.


What do you think? What more can the VIVO project do to promote data sharing using the FAIR + data principles? What features could be added to the ontology or to the software to make + sharing data even more natural?

+ + + +
+ +
+ + + +
+ +
+ +
+ + + +