Skip to content

Commit

Permalink
Merge pull request mementoweb#14 from tiborsimko/richer-docs-with-wiki
Browse files Browse the repository at this point in the history
docs: richer documentation
  • Loading branch information
hariharshankar committed May 17, 2016
2 parents 04f3cba + 7b072aa commit 6542f35
Show file tree
Hide file tree
Showing 12 changed files with 690 additions and 109 deletions.
111 changes: 16 additions & 95 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,117 +4,38 @@ Memento TimeGate
.. image:: https://img.shields.io/travis/mementoweb/timegate.svg
:target: https://travis-ci.org/mementoweb/timegate

About
-----

Make your web resources `Memento <http://www.mementoweb.org>`__ compliant in a
few easy steps.

The Memento framework enables datetime negotiation for web resources.
Knowing the URI of a Memento-compliant web resource, a user can select a
date and see what it was like around that time.

Introduction
------------

In order to support Memento, a web server must obviously have accessible
archives of its online resources. And it must also have a piece of
software that handles the datetime negotiation according to the Memento
protocol for those resources.

But in such datetime negotiation server, only a small proportion of the
code is specific to the particular web resources it handles. The main
part of logic will be very similar throughout many implementations.
TimeGate isolates the core components and functionality. With it,
there's no need to implement, or to re-implement the same logic and
algorithms over and over again. Its architecture is designed to accept
easy-to-code plugins to match any web resources.

From now on, this documentation will refer to the web server where
resources and archives are as the **web server** and to the Memento
TimeGate datetime negotiation server as the **TimeGate**.

- Suppose you have a web resource accessible in a web server by some
URI. We call the resource the **Original Resource** and refer to its
URI as **URI-R**.
- Suppose a web server has a snapshot of what this URI-R looked like in
the past. We call such a snapshot a **Memento** and we refer to its
URI as **URI-M**. There could be many snapshots of URI-R, taken at
different moments in time, each Memento i with its distinct URI-Mi.
The Mementos do not necessary need to be in the same web server as
the Original Resources.

Example
-------

.. figure:: https://raw.githubusercontent.com/mementoweb/timegate/master/docs/uris_example.png
:alt: Image

There are only two steps to make such resource Memento compliant.

Step 1: Setting up TimeGate
---------------------------

The first thing to do is to set up the TimeGate for the specific web
server.

* Run the TimeGate with your custom handler. The handler is the
piece of code that is specific to how the web server manages Original
Resources and Mementos. It needs to implement either one of the
following:

- Given a URI-R, return the list of URI-Ms along with their respective dates.
- Given a URI-R and a datetime, return one single URI-M along with its date.

Step 2: Providing the headers
-----------------------------

The second thing to do is to provide Memento's HTTP headers at the web
server.

* Add HTTP headers required by the Memento protocol to responses from the
Original Resource and its Mementos:

- For the Original Resource, add a "Link" header that points at its TimeGate
- For each Memento, add a "Link" header that points at the TimeGate
- For each Memento, add a "Link" header that points to the Original Resource
- For each Memento, add a Memento-Datetime header that conveys the snapshot datetime

Using the previous example, and supposing a TimeGate is running at
``http://example.com/timegate/``, Memento HTTP response headers for the
Original Resource and one Memento look as follows. |Image|

And that's it! With the TimeGate, datetime negotiation is now possible
for these resources.

How it works
Installation
------------

Read the `big
picture <https://github.com/mementoweb/timegate/wiki/The-Big-Picture>`__
to understand how it works and what are the requirements.

Getting Started
---------------

Start by `reading the
guide <https://github.com/mementoweb/timegate/wiki/Getting-Started>`__
for comprehensive information about how to use TimeGate for your own web
resources.
Memento TimeGate is on PyPI so all you need is: ::

Requirements
------------
pip install -e git+https://github.com/mementoweb/timegate.git#egg=TimeGate
uwsgi --http :9999 -s /tmp/mysock.sock --module timegate.application --callable application

- `Python <https://www.python.org>`__
- `uWSGI <http://uwsgi-docs.readthedocs.org/en/latest/>`__

Documentation
-------------

See the `wiki <https://github.com/mementoweb/timegate/wiki>`__.
The documentation is readable at http://timegate.readthedocs.io or can be built
using Sphinx: ::

pip install timegate[docs]
python setup.py build_sphinx


License
Testing
-------

See the
`LICENSE <https://github.com/mementoweb/timegate/blob/master/LICENSE>`__
file.
Running the test suite is as simple as: ::

.. |Image| image:: https://raw.githubusercontent.com/mementoweb/timegate/master/docs/headers_example.png
./run-tests.sh
54 changes: 54 additions & 0 deletions docs/advanced-features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.. _advanced_features:

TimeMaps
========

The TimeGate can easily be used as a TimeMap server too. ## Requirements
For that there are two requirements:

- The Handler must implement the ``get_all_mementos(uri_r)`` function to return
the entire history of an Original Resource.


- The ``conf/config.ini`` file must have the variable ``use_timemap = true``.

Resulting links
---------------

Once this setup is in place, the TimeGate responses' ``Link`` header
will contain two new relations, for two different formats (MIME types):

- ``<HOST/timemap/link/URI-R>; rel="timemap"; type="application/link-format"``
`Link TimeMaps <http://www.mementoweb.org/guide/rfc/#Pattern6>`_

- ``<HOST/timemap/json/URI-R>; rel="timemap"; type="application/json"`` JSON
TimeMaps

Where ``HOST`` is the base URI of the program and ``URI-R`` is the URI
of the Original Resource.

Example
-------

For example, suppose ``http://www.example.com/resourceA`` is the URI-R
of an Original Resource. And suppose the TimeGate/TimeMap server's
``host`` configuration is set to ``http://timegate.example.com`` Then,
HTTP responses from the TimeGate will include the following:

- ``<http://timegate.example.com/timemap/link/http://www.example.com/resourceA>; rel="timemap"; type="application/link-format"``
- ``<http://timegate.example.com/timemap/json/http://www.example.com/resourceA>; rel="timemap"; type="application/json"``

Now a user can request an ``HTTP GET`` on one of those link and the
server's response will have a ``200 OK`` status code and its body will
be the TimeMap.

HandlerErrors
=============

Custom error messages can be sent to the client using the custom
exception module: ``from errors.timegateerrors import HandlerError``.
For instance, a custom message with HTTP status ``400`` and body
``Custom error message`` can be sent using:
``raise HandlerError("Custom error message", status=400)``. Raising a
``HandlerError`` will stop the request and not return any Memento to the
client.
65 changes: 65 additions & 0 deletions docs/big-picture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
.. _big_picture:

Big picture
===========

Definitions
-----------

From now on, this documentation will refer to the web server where
resources and archives are as the **web server** and to the Memento
TimeGate datetime negotiation server as the **TimeGate**.

- Suppose you have a web resource accessible in a web server by some
URI. We call the resource the **Original Resource** and refer to its
URI as **URI-R**.
- Suppose a web server has a snapshot of what this URI-R looked like in
the past. We call such a snapshot a **Memento** and we refer to its
URI as **URI-M**. There could be many snapshots of URI-R, taken at
different moments in time, each with their distinct URI-Ms. The
Mementos do not necessary need to be in the same web server as the
Original Resources.

Client, Server and TimeGate
---------------------------

This figure represents the current situation; Without date time
negotiation, the client has to find by hand the URIs for the previous
versions of a web resource. If they exists: |client_server.png| To make
this web resources Memento compliant, two things need to be added. The
new components of the systems are the TimeGate and Memento HTTP headers
at the web server's side: |client_server_tg.png| With these links, the
client now gets the address of the TimeGate when retrieving an Original
Resource or a Memento. Then, he can use datetime negotiation with the
TimeGate to get the URI of an archived version (``URI-M2``) of the
Original Resource at specific a point in time (``T2``): |sequence.png|

Architecture
------------

The TimeGate will manage the framework's logic in a generic manner.
However, every web server has its specific way to store snapshots and to
construct URI-Ms. Thus, a specific plugin must be written for every web
server. Such a plugin is called a handler. A handler will typically talk
to an API to return the list of URI-Ms given a URI-R, but there are
several alternatives to this setup.

.. figure:: architecture.png
:alt: architecture.png

architecture.png

The system can be seen as three components.

- The Memento user who wishes to retrieve an older version of a
resource
- The web server where the active version (original URI) and revisions
(mementos) can be accessed. This entity must provide a way to access
these versions. Typically through an API.
- The TimeGate which itself is composed of two main elements:
- One API-specific handler
- The generic TimeGate code

.. |client_server.png| image:: client_server.png
.. |client_server_tg.png| image:: client_server_tg.png
.. |sequence.png| image:: sequence.png
59 changes: 59 additions & 0 deletions docs/cache.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
.. _cache:

Cache
=====

The TimeGate comes with a built-in cache that is activated by default. Change
this behavior editing in the configuration file. See :ref:`configuration`.

Populating the cache
--------------------

The cache stores TimeMaps which is the return values of the handler
function ``get_all_mementos()`` only: - If the Handler does not have
``get_all_mementos()`` implemented, the cache will never be filled. - If
the Handler has both the functions ``get_all_mementos()`` and
``get_memento()``, only TimeMap requests will fill the cache. All
TimeGate requests will use ``get_memento()`` which result will not be
cached.

Cache HIT conditions
--------------------

- Cached TimeMaps can be used used to respond to a TimeMap request from
a client if it is fresh enough. The tolerance for freshness can be
defined in the configuration file.
- Cached TimeMap can also be used to respond to a TimeGate requests
from a client. In this case, it is not the request's time that must
lie within the tolerance bounds, but the requested datetime.

Force Fresh value
-----------------

If the request contains the header ``Cache Control: no-cache``, then the
TimeGate will not return anything from cache.

Example
-------

Suppose you have a TimeMap that was cached at time ``T``. Suppose you
have a tolerance of ``d`` seconds. A TimeMap request arrives at time
``R1``. A TimeGate request arrives at time ``R2`` with requested
datetime j. This request does **not** contain the header
``Cache Control: no-cache``. - A TimeMap request will be served from
cache only if it arrives within the tolerance: ``R1 <= T+d``. - A
TimeGate request will be served from cache only if the requested
datetime happens within the tolerance: ``j <= T+d``, no matter ``R2``.
This means that even if a cached value is old, the cache can still
respond to TimeGate requests for requested datetimes that are until time
``T+d``. - All other requests will be cache misses.

Cache size
----------

There is no "maximum size" parameter. The reason for this is that the
cache size will depend on the average size of TimeMaps, which itself
depends on the length of each URI-Ms it contains, and their average
count. These variables will depend on your system. The cache can be
managed using the ``cache_max_values`` parameter which will affect
indirectly its size.
Loading

0 comments on commit 6542f35

Please sign in to comment.