Skip to content

Communication

Lucas Parzianello edited this page Oct 11, 2019 · 1 revision

Communication between components

Crawlers x Internet

HTTP(S) via free public proxy servers.

Crawlers x Domain Balancer

Selected TCP sockets. Once connected, they maintain the link for both component's lifetime, as they'll be exchanging messages every few seconds at most.

Crawlers x Indexers

No direct communication, just via S3 - see below.

Crawlers x S3 x Indexers

See Python API for AWS S3

Domain Balancer x URL Map

See Redis API for Python. Redis will be in another container accessible by a hostname and port pair.

Indexers x Inverted Index

See below.

Search Service x Inverted Index

See MongoDB API for Python. MongoDB will be in another container accessible by a hostname and port pair: client = MongoClient(HOSTNAME, 27017).