-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create torrents for bulk data #226
Comments
Sunlight used to host these on S3, but doesn't do that anymore. It is a pretty decent use case for torrents, though I don't know if any of the organizers here have (or are familiar with) torrent management software, or want to take on the maintenance. |
@konklone, do you happen to know what hosting these on S3 cost Sunlight? |
No, I don't remember anymore...not even the order of magnitude. If it was hugely expensive I'd probably remember, but we also didn't promote them very well -- they are just linked to on the wiki. And actually, they still are: And the Sunlight downloads...still work. They're just not updated anymore. And are delivered over plain HTTP (gross). |
I don't recall the S3 costs associated with these but I'd be shocked if they were significant. Speaking as a former crazed Bittorrent evangelist, I kind of doubt you'll wind up with enough use to keep a healthy swarm going. Still, if you want to go this route, S3 offers torrent capability. In practice that will probably wind up with AWS as the single seed and no real difference in costs (it actually might be a bit higher since I think you wind up paying for more API ops for individual chunks, even as the bandwidth costs are the same -- still, we're probably talking about pocket change). What might make more sense is just configuring a requester-pays bucket. This will introduce some hassle for devs who aren't in the AWS ecosystem but is a pretty clean solution and protects against unexpected bills coming from devs who pull this data on an hourly cron. Unfortunately requester-pays buckets do not support Bittorrent. |
Right now, I'm using the
fdsys
script to scrape all bill texts for every Congress session that has data. This takes a long, long time, so having the data hosted somewhere makes sense. After all, bills from previous congressional sessions aren't going to be modified. However, it is about a gigabyte of data per session, so no host would make sense - on the other hand, this is a great use case for torrents. The main issue is that you would most likely end up being stuck with all the formats possible in one torrent, but that's okay for me. Thoughts on this?The text was updated successfully, but these errors were encountered: