Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new class: VCFReaderFactory a factory for VCFReader #1551

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lindenb
Copy link
Contributor

@lindenb lindenb commented Apr 23, 2021

Description

This PR adds a file VCFReaderFactory that open a VCFReader using a few methods.

VCFReader open(final File vcfFile,boolean requiredIndex);
VCFReader open(final Path vcfFile,boolean requiredIndex);
VCFReader open(final String vcfFile,boolean requiredIndex);

the aim is to provide a generic factory opening instance of VCFReaders instead of opening directltly a VCFReader. This gives a chance to decode variants files that are not VCF (database, multiple VCF, etc...). The default implementation of VCFReaderFactory returns an instance of VCFFileReader.

A new property CUSTOM_VCF_READER_FACTORY in Defaults is used to define the class of the default implementation.

I added a few tests in VCFReaderFactoryTest that are mostly a copy of VCFReaderTest

there are some problem with travis but I'm not sure it comes from my code

Things to think about before submitting:

  • Make sure your changes compile and new tests pass locally.
  • Add new tests or update existing ones:
    • A bug fix should include a test that previously would have failed and passes now.
    • New features should come with new tests that exercise and validate the new functionality.
  • Extended the README / documentation, if necessary
  • Check your code style.
  • Write a clear commit title and message
    • The commit message should describe what changed and is targeted at htsjdk developers
    • Breaking changes should be mentioned in the commit message.

@lbergelson
Copy link
Member

@lindenb This is a very reasonable thing to want to do. I think it has pretty strong overlap with what @cmnbroad is working on in the CZI grant work right now to enable codec discovery and support multiple input sources fo things like vcf.

I'm not really a fan of injecting a single custom factory class through an environment variable. I know we've done it before for Reads but it was kind of gross then as well. It's just not very flexible if you for instance want to load from something like mixed sources. Registering dynamically loadable codecs in the manifest and then picking the right one automatically is something we're actively working on right now.

I assume you have a particular alternate VariantContext record sources you want to use? Is there any way to identify what type of source it is by the URI you access it at? We're currently building a system to figure this out for dynamically loaded codecs and a real world vcf example could be really useful.

Maybe you could be a beta tester for the new codec api?

@lindenb
Copy link
Contributor Author

lindenb commented May 14, 2021

@cmnbroad , thanks for reviewing

I assume you have a particular alternate VariantContext record sources you want to use?

yes for now I've got a wrapper over bcftools (reading BCF) and I'll write another one reading plink files.

Is there any way to identify what type of source it is by the URI you access it at?

yes, I could use bcftools:///path/to/my/bcf for example

We're currently building a system to figure this out for dynamically loaded codecs and a real world vcf example could be really useful.

so I can wait for your implementation, no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants