-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't depend on rethinkdb #159
Comments
I've been looking at Brozzler, the mitm warc proxy, and some of the other IA things. I have been interested in deploying these as well but Rethink, and most importantly the configuration tied up within it, make that difficult to do. If someone had some high-level documentation about what rethink is used for (other than just Service Discovery) I might be able to extract that code, put it into some ABC class, and allow the selection of multiple backend configuration modes. From a high level it looks like much of what rethink is being used for can be done with redis and a lot of the service discovery it's doing could be managed (for most people) with config files. Was there a specific design decision that coupled all of these services together (job submission, job state tracking, warc serving, scraping) into rethinkdb? If there is a document that covers this I'd be really happy to take a look at it to get a better picture of what the motivations were for these design choices. |
I mean honestly rethinkdb used to be very easy to deploy, and was one DB that could handle all those different services nicely. It's a shame that development stalled like it did, and that no one was able to perform maintenance releases after they lost funding. |
It is a shame that the rethinkdb company folded, and that the community hasn't really gotten the project on track at this point. But it's still a really solid piece of software. Brozzler stores all the crawl state in rethinkdb - jobs, sites, and pages, finished, in progress, and queued. We chose rethinkdb primarily because it is truly distributed (implements raft consensus, thus has no single point of failure) and because it supports secondary indexes (unlike a key-value store). It's also very easy to deploy and cluster. Even now, I think it was a pretty good choice. The parts of brozzler that interact with rethinkdb are mostly in We're certainly open to pull requests adding support for another database backend, but have no plans to do this work ourselves. Since we're not planning to address issue as written ("don't depend on rethinkdb"), I'm gonna close it. Feel free to continue discussing the topic here though. |
@nlevitt is there a IRC/slack that the maintainers of this code use? I'd like to get a higher level feel for how this, warcproxy, and some of the other tooling in this group fit together. |
There's a channel on iipc.slack.com but unfortunately that's not completely open to the public. I just created a channel #brozzler on freenode and I'll hang out there. |
Rethinkdb seems to be essentially dead in the water. There are some attempts to get back on track, but right now it sadly doesn't seem to be being maintained.
The text was updated successfully, but these errors were encountered: