Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBO Purl System: Add HTTPS support #705

Closed
srobb1 opened this issue Nov 10, 2020 · 40 comments
Closed

OBO Purl System: Add HTTPS support #705

srobb1 opened this issue Nov 10, 2020 · 40 comments

Comments

@srobb1
Copy link
Contributor

srobb1 commented Nov 10, 2020

Hello.

I currently have images that I was using the purl.obolibrary.org redirect to my ontology github to serve them on OLS. Chrome is requiring https, and purl.obolibrary does not support https.

Are there any plans to support https in the future?

Thank you,
Sofia

@jamesaoverton
Copy link
Member

I've been thinking about this since I was told about the original issue last week. Doing this properly would require significant changes to the system, and every change to the PURL system must be made very carefully.

If someone want to volunteer to do that work, that would be great. Otherwise this will have to wait.

@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2021

Seems that Chrome is now blocking our purls, at least using the right-click save file as method..

Try here:
https://github.com/matentzn/test/blob/main/README.md

image

@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2021

How many person hours would we need to enable https @jamesaoverton ?

@srobb1
Copy link
Contributor Author

srobb1 commented Mar 3, 2021

On a personal website I just set up https in about 5min. Here is some info: https://letsencrypt.org/. My server is on AWS and I use a bitnami wordpress image, here is the documentation for bitnami https. https://docs.bitnami.com/aws/how-to/understand-bncert/

Sofia

@srobb1
Copy link
Contributor Author

srobb1 commented Mar 3, 2021

Oh! and it is free!!

@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2021

The problems are the PURLs themselves.. We cant just change 2000 (or, if we need to change terms, 2 million) purls from an http scheme to an https scheme. Anyways we will sleep on this a bit.

@jamesaoverton
Copy link
Member

Yes, I've used Let's Encrypt for a large number of websites, and set them up to automatically redirect HTTP to HTTPS. It's great and we'll probably use Let's Encrypt for this too.

But the PURL system is not a website. It's a system to manage millions of permanent identifiers for hundreds of projects used by who-knows-how-many downstream users, tools, and databases. If we make a mistake we have to live with it forever. So every change has to be made very carefully.

So I'll set aside time for this, but it can't be rushed.

@srobb1
Copy link
Contributor Author

srobb1 commented Mar 3, 2021

Ah! Sounds complicated :/

@jamesaoverton
Copy link
Member

I believe that this Chromium Blog posting explains the issue, which is specifically about mixing HTTP content (images, downloads) in a page that is served by HTTPS:

https://blog.chromium.org/2020/02/protecting-users-from-insecure.html

@matentzn
Copy link
Contributor

I think at least for the ontology purls (not terms), we will have to come up with some kind of plan for addressing this issue. For example, https://hpo.jax.org/app/download/annotation you cannot click on any of the OBO purls in there!

@jamesaoverton
Copy link
Member

They work for me... (Safari on iPadOS 15.3.1)

The best option is probably to allow HTTP and HTTPS in parallel for all PURLs, but I'm still worried that people will cause themselves all sorts of problems once we allow this.

@matentzn
Copy link
Contributor

I tried only Chrome on Mac, I guess Chrome is generally now blocking all http referrals from https sites!

@kltm
Copy link
Contributor

kltm commented Apr 14, 2022

Chrome, "Version 100.0.4896.88 (Official Build) (64-bit)" on Linux works for me.
Usually, there is a prohibition on mixed schema within a page, to prevent data leaking, trackers, etc.; blocking links from https to http would be quite a big deal. Is it possible that you have plugins doing something in addition to default behavior?

@matentzn
Copy link
Contributor

Good point, using ghostery, traffic light and AdBlock. Deactiveated them all, same behaviour though. Clicking on

image

The first purl there, a new tab opens briefly, a loading wheel shows up, then the tab gets closed and nothing happens. Anyways, might still be me.

@kltm
Copy link
Contributor

kltm commented Apr 14, 2022

@matentzn Clicking on that works for me. It sounds like behavior I ran into with an HTTPS everywhere plugin I used to run. I would suggest trying with a temporary clean profile. As well, if you could report your version number, we could see if something recent has changed in Chrome.

@matentzn
Copy link
Contributor

image

I tried in incognito mode.. Same! :) Also deactivated all plugins.

@kltm
Copy link
Contributor

kltm commented Apr 14, 2022

Alas. We're on the same version and I can guarantee I'm on a completely unaltered installation, so I'd still put the weight on local issue. Possibly work through https://support.google.com/chrome/thread/16888999/links-won-t-open-in-chrome?hl=en
It might be better to work through this on another channel, rather than clutter the conversion issue.

@matentzn
Copy link
Contributor

If I go to my settings there is the option to configure insecure content:

image

If I add hpo website like in the screenshot, it works.. So it must have something to do with the security levels in chrome. And afaics I am using standard settings..

@iimpulse
Copy link

iimpulse commented Apr 6, 2023

They work for me... (Safari on iPadOS 15.3.1)

The best option is probably to allow HTTP and HTTPS in parallel for all PURLs, but I'm still worried that people will cause themselves all sorts of problems once we allow this.

Why would this be the case? Because the ontologies will then have links to both or only one in some case? @jamesaoverton

@jamesaoverton
Copy link
Member

Many of our current PURL configuration entries point to HTTP URLs, but redirecting HTTPS to HTTP raises security warnings in many clients (e.g. browsers).

If one resource has both HTTP and HTTPS URLs, clients may not recognize that the two refer to the same thing. This is a recognized problem for the RDF stack, and I haven't seen a solution with broad support.

@balhoff
Copy link
Contributor

balhoff commented Apr 24, 2023

I notice that linking to an HTTP ontology IRI in an HTTPS page is blocked by Chrome, and that seems to be primarily because the target is a file download. If you link to an HTTP term IRI, which generally redirects to a viewable page, Chrome seems to be fine with that. Maybe we could pursue an initial solution of just migrating ontology IRIs to HTTPS (of which there are far fewer than term IRIs).

@matentzn
Copy link
Contributor

matentzn commented Apr 27, 2023

After tons of the discussions now about this, I would like to propose the same thing. I think despite the obvious problems with having http uris for terms, we should not change that, as the URI is primarily an identifier and secondarily a URL, regardless of what people may think when they look at it.

Without making and statement about who will deal with this problem, I would like to propose this:

  1. OBO TWG (@jamesaoverton @balhoff @cmungall and @matentzn and whoever else has an opinion) get an agreement that we should make a push for supporting https for ontology IRIs (or lets rather say: non-term PURLs, because the are not just used for ontologies). This is entirely independent of how much work it is - just to see if we are all on the same page here. If we don't get agreement we just stop until the situation become untabable.
  2. We figure out how much work this is, and ask PIs (Chris, Bjoern) to find resources to do this work.
  3. We propose the move to OBO Foundry Ops, get approval and do it (do it means that all ontologies on the OBO Dashboard will fail).

To get 1 out of the way:

Proposal: Add https support for the OBO purl server and migrate ontology PURLs to https

  • 👍 Let's go for it, I will help where needed (no commitment, just intention)
  • 👎 I am against it, because I am not (yet) convinced that the benefits (support of clickable links in browsers, sharing of purls in issues etc) is worth the churn this will create.
  • 🎉 I agree with the move, but I don't want to be involved in developing a solution

@matentzn matentzn changed the title HTTPS support OBO Purl System: Add HTTPS support Apr 27, 2023
@matentzn
Copy link
Contributor

As if the gods heard my comment above:

image

@jamesaoverton
Copy link
Member

http://obofoundry.org is handled by GitHub Pages, not the PURL server. That's a completely separate issue. Please don't make this PURL system issue more complicated than it already is.

The only solution that I see is for the PURL server to support both HTTP and HTTPS in parallel. We should not automatically redirect HTTP to HTTPS, or it will break redirects specified here as HTTP. Users will have to specify HTTP or HTTPS as appropriate, which will lead to all sorts of confusion.

Technically, this should just be a matter of getting a certificate and duplicating the Apache VirtualHost config for SSL in port 443: https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/tools/etc_apache2_sites-available_site.j2

LBL is running the PURL infrastructure now, so changes will have to be coordinated with them: @kltm and @cmungall.

@balhoff
Copy link
Contributor

balhoff commented May 2, 2023

Comment from OBO ops meeting: we should not turn on HTTPS for obofoundry.org until we support downloading ontology PURLs via HTTPS. The reason is that Chrome will not allow clicking an HTTP download from an HTTPS page, so any direct ontology downloads would be broken from an HTTPS obofoundry.org.

@cmungall
Copy link
Contributor

Is this the action for @kltm?

Technically, this should just be a matter of getting a certificate and duplicating the Apache VirtualHost config for SSL in port 443: https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/tools/etc_apache2_sites-available_site.j2

@jamesaoverton
Copy link
Member

Yes, I'd appreciate @kltm's input on that.

@kltm
Copy link
Contributor

kltm commented May 30, 2023

To clarify, what we're talking about here is:

  • purl.obolibrary.org will accept HTTP and HTTPS connections, but will not try and automatically upgrade HTTP to HTTPS; no other action

If so, we can confer with @abessiari about changing the image. We could also try and just toss Cloudflare in front of it (with the bonus of maybe speeding things up and saving a wee bit of money).

Edit: After some poking around I believe this could be done with Cloudflare only, but would likely require a little fiddling which might result in a few small outages. Using Cloudflare would marginally decrease costs, but increase the number of control planes. Unless it becomes complicated for some reason, I think an addition to the current system would probably be better for now, with an eye on cert renewal or longer spans.

@kltm
Copy link
Contributor

kltm commented May 30, 2023

A set of concrete test URLs would also be useful.

@kltm
Copy link
Contributor

kltm commented May 31, 2023

Talking to @cmungall, it might be nice to try the Cloudflare version of the solution first, then do it with the community infrastructure.

@jamesaoverton
Copy link
Member

Talking to @cmungall, it might be nice to try the Cloudflare version of the solution first, then do it with the community infrastructure.

Fine by me.

Nico suggested these HTTP examples above:

Currently the HTTPS versions of those do no resolve, but after this change they should resolve to the same targets as their HTTP counterparts:

@kltm
Copy link
Contributor

kltm commented Jun 1, 2023

We had a test earlier where we had trouble getting a Cloudflare cert; I've tried again using the subdomain purssl and it seems to be working:

http://purssl.obolibrary.org/obo/hp.obo
http://purssl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa
http://purssl.obolibrary.org/obo/hp/hpoa/genes_to_phenotype.txt
https://purssl.obolibrary.org/obo/hp.obo
https://purssl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa
https://purssl.obolibrary.org/obo/hp/hpoa/genes_to_phenotype.txt

Please try your favorite variant of http(s)://purssl.obolibrary.org (making note of upgrades); if all goes well, we can try the purl subdomain again.

@jamesaoverton
Copy link
Member

Thanks @kltm! These work for me. I'll think a bit more about what other tests to run and get back to you tomorrow.

Let's call this solution "HTTP(S) in parallel". I've given it some more thought and I want to discuss it in depth before we commit to it:

HTTP(S) in parallel solves an immediate problem: People have an HTTPS webpage and they want to use PURLs to link to downloads or images that are served via HTTPS. If they use HTTP PURLs (which is the status quo) they get security warnings: HTTPS downgrade to HTTP PURL. With this solution they can use HTTPS PURLs and the whole chain of redirects uses HTTPS. Great!

People can still run into problems if their resources are served via HTTP. A partial solution to that would to update the PURL configs to redirect to HTTPS resources where possible. Most of the PURLs redirect to GitHub or a few sites, so we could automate much of that update, and we could add automated testing.

HTTP(S) in parallel does not solve the problem of IRIs as identifiers in the RDF stack. RDF considers http://purl.obolibrary.org/obo/obi.owl and https://purl.obolibrary.org/obo/obi.owl to be two distinct identifiers. Even if we know that they are "the same thing", our tools won't know that. So if (when) people start mixing HTTP and HTTPS in their ontology IRIs, version IRIs, and term IRIs, they will get into all sorts of trouble, and it will be hard to see the extra "s" that's the cause.

So the question is: Should we address the identifier problem by modifying the HTTP(S) in parallel approach to serve some IRI patterns over HTTP but not HTTPS? In other words, should we carve out patterns that we usually treat as identifiers, and refuse to support HTTPS for them?

My answer is no, I don't think that's viable. A key problem we're trying to solve is downloading ontology files, so excluding ontology IRIs and version IRIs from HTTPS support leaves that problem wide open. Term IRIs are a slightly different case, but I still don't think we want to make an exception for them.

So I still think HTTP(S) in parallel is the best option, without trying to get fancier by carving out certain patterns.

Instead I think we need to address the identifier problem at another level: add ROBOT report checks and other tests that will scream bloody murder when they see HTTPS PURLs used for ontology IRIs like https://purl.obolibrary.org/obo/obi.owl or for term IRIs like https://purl.obolibrary.org/obo/OBI_0000070.

Sorry for the long post, but I think it's important to get this right.

@kltm
Copy link
Contributor

kltm commented Jun 1, 2023

@jamesaoverton Okay, I want to clarify my action here: I'm going to wait until further notice before trying the "parallel" option again. Until then, the test URL set above will remain in place.

@jamesaoverton
Copy link
Member

@kltm My previous comment got the support I was looking for. I can't think of any other tests to run for purlssl. So please go ahead and add HTTPS support to the purl.obolibrary.org subdomain.

There's no particular rush. We'll do some more tests before we advertise the change.

Thank you!

@cmungall
Copy link
Contributor

cmungall commented Jun 2, 2023

I agree with @jamesaoverton's points and chain of reasoning.

As additional context for people coming to this thread, the w3c have this post:
https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/

Linked from this discussion:
https://lists.w3.org/Archives/Public/semantic-web/2023Apr/0042.html

Which mentions schema.org. I think they are in a bit of a mess, with some groups using https for identifiers, some using http. Let's not end up where they are. We need to keep hammering home the point that the identifiers for OBO are http, regardless of redirects and what the browser bar says. But this is a social and documentation issue, not a technical one, so once @kltm adds HTTPS support to the purl.obolibrary.org subdomain, we can close this issue and continue any further discussion on the main OBO tracker.

@kltm
Copy link
Contributor

kltm commented Jun 2, 2023

@jamesaoverton Now ready for testing.

@matentzn
Copy link
Contributor

matentzn commented Jun 3, 2023

I cant say enough how.... amazing. How perfect this is:

image

Chrome blocks the link on the left, and processes the link on the right.

I AM HAPPY.

@kltm
Copy link
Contributor

kltm commented Jun 13, 2023

@jamesaoverton Is this closed?

@jamesaoverton
Copy link
Member

Yes, I'll close this issue.

The next step is to update our configs to point to HTTPS targets when possible: #925

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants