Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new argument to allow setting the google requester pays project #1938

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lbergelson
Copy link
Member

  • New argument REQUESTER_PAYS_PROJECT added to command line program. This allows configuring the project to use with requester pays buckets in GCS
  • This can also be configured by setting a new system property: picard.googleProjectForRequesterPays
  • Updated the initialization of GCS and HTTP filesystem providers

This is an updated version of #1825

@yfarjoun This is a bit late to the party, but does this work for you?
@rickymagner Could you test that this works in your use case?

@lbergelson
Copy link
Member Author

@yfarjoun I didn't remember why you needed to be able to set this through a system property. I renamed it to match the other picard system properties but on further thought maybe you wanted it to pick up from an environment variable? In which case we should rename it back to something without a . in it.

@rickymagner
Copy link
Contributor

Hi Louis, thanks for making this. I just tested it on Terra, first to check backwards compatibility in working with files that aren't requester-pays but to ensure the flag still works. Using the new jar without the RP flag works fine. Unfortunately it doesn't seem to work with the RP flag in Terra possibly because Terra permissions don't allow it. Here's the stacktrace I saw:

[Mon Feb 05 15:42:13 GMT 2024] Executing as root@1ca08604a5ee on Linux 6.1.58+ amd64; OpenJDK 64-Bit Server VM 17.0.6+10-Ubuntu-0ubuntu118.04.1; Deflater: Intel; Inflater: Intel; Requester Pays Project set by argument: terra-54184c68 Picard version: Version:3.1.1-6-geddbf9b98-SNAPSHOT
INFO 2024-02-05 15:42:14 CrosscheckFingerprints SECOND_INPUT is not empty, and CROSSCHECK_MODE==CHECK_ALL_OTHERS. Will compare fingerprints from INPUT against all the fingerprints in SECOND_INPUT.
[Mon Feb 05 15:42:15 GMT 2024] picard.fingerprint.CrosscheckFingerprints done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=59768832
To get help, see https://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" com.google.cloud.storage.StorageException: [email protected] does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).
at com.google.cloud.storage.StorageException.translate(StorageException.java:165)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:298)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:489)
at com.google.cloud.storage.StorageImpl.lambda$internalBucketGet$65(StorageImpl.java:1590)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.Retrying.run(Retrying.java:65)
at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1515)
at com.google.cloud.storage.StorageImpl.internalBucketGet(StorageImpl.java:1588)
at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:316)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.requesterPays(CloudStorageFileSystemProvider.java:1118)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystem.<init>(CloudStorageFileSystem.java:211)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:278)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.getFileSystem(CloudStorageFileSystemProvider.java:236)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.getPath(CloudStorageFileSystemProvider.java:286)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.getPath(CloudStorageFileSystemProvider.java:97)
at java.base/java.nio.file.Path.of(Path.java:208)
at java.base/java.nio.file.Paths.get(Paths.java:98)
at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1253)
at htsjdk.samtools.util.IOUtil.lambda$getPaths$2(IOUtil.java:1266)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
at htsjdk.samtools.util.IOUtil.getPaths(IOUtil.java:1270)
at picard.fingerprint.CrosscheckFingerprints.doWork(CrosscheckFingerprints.java:506)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:105)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:115)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/dsde-palantir?projection=full
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "[email protected] does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).",
"reason" : "forbidden"
} ],
"message" : "[email protected] does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist)."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$3.interceptResponse(AbstractGoogleClientRequest.java:466)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:552)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:486)
... 30 more

At first I thought this might be a mistake in passing Terra permissions to the service account, but then remembered it works fine without the RP flag. The way I set the RP flag was with --REQUESTER_PAYS_PROJECT $(gcloud config get-value project). Maybe this isn't the right project value to use in the Terra environment? Do you know what Picard was doing in the case without the RP flag so that it didn't run into the permissions issue?

@lbergelson
Copy link
Member Author

@rickymagner Sorry, I was just re-reading this again, I seem to have forgotten the details. Are you saying that this broke something that was previously functional on terra? Or that adding the flag still doesn't work on terra because of the buckets.get issue? My understanding was that this should not break anything more than it already is broken, but that it will continue to not work on terra because of the other permissions issue here.

@rickymagner
Copy link
Contributor

If I remember correctly, it didn't fix the issue on Terra, but also didn't break anything. I think it came down to the Google storage.get API having a bug in it or something.

* New argument REQUESTER_PAYS_PROJECT added to command line program.  This allows
  configuring the project to use with requester pays buckets in GCS
* This can also be configured by setting a new system property: picard.googleProjectForRequesterPays
* Updated the initialization of GCS and HTTP filesystem providers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming from requester-pays buckets for CrosscheckFingerprints doesn't work in Terra
2 participants