Hifadhi is a distributed file server made with python that makes use of containers. In English it is just AWS S3 that you can run on your computer or computers and serve files with access control.
In this context a container running Hifadhi is refered to as an instance, and an instance has the following capabilities
- Store files.
- Serve stored files as a download.
- Stream files (Meaning you can build Netflix using Hifadhi).
- Generate presigned urls as an access control measure.
- Grant access to a file to a certin ipv4 (Another access control measure).
- Join a shard to increase the storage capacity.
- Cache files from other instances thereby increasing the file's availability.
- Ask other instances to cache a certain file given it is receiving too much load.
In this mechanism we want to increase the capacity of data we can store by storing data on multiple instances. Instances can obtain data from other instances and serve that data themselves as visulized below.
In this mechanism a file is cached on other instances to increase its availability, as the file does not have to be fetched from the instances storing that file when a request is made. Caching is done every time an instance receives a request for a file that is on another instance, that file is downloaded from the other instance and cached locally. Or be done from the adapter as below.
It is assumed you have Docker installed on your machine. If not follow the installation guide on the Docker website. Once you have Docker setup, go ahead and pull the Hifadhi image from docker hub.
docker pull rbryanben/hifadhi:test
Once that is done go ahead and run your first instance on port 7111 using the command
docker run --name instance-kalahari -d -e INSTANCE_NAME=Kalahari -e SHARD_KEY=BasicPassword -p 7110:80 rbryanben/hifadhi:test
And just like that we have our first instance running, and now lets check that our instance is healthy by using curl to get the instance health. If you do not have curl installed on your machine you can simply click this link to check using your browser, and dont forget to come back.
curl http://localhost:7111/api/v1/health
You should get something like this, that shows the status of the instance.
{"status": "healthy", "uptime": 110.42238450050354, "instance": "Kalahari", "known_instances": null}
It is assumed you have a program like postman for api requests installed on your machine. What we want to do is upload a file to the instance so that we can later on
retrive it, and to do this we need to send a POST request to the instance address http://localhost:7111/api/v1/store
with the following headers and body.
- Request: POST
- Headers:
- SHARD-KEY: Key we defined ealier on as an enviroment variable
- Body (Multipart-form):
- file: The file itself
- filename: The name to save the file as
Using this template upload any video of your choice with the filename video.mp4 and the shard-key set to BasicPassword like we defined on creating our container. As a curl request it should look like this
curl --request POST \
--url http://localhost:7111/api/v1/store \
--header 'Content-Type: multipart/form-data' \
--header 'SHARD-KEY: BasicPassword' \
--form 'file=@C:\Users\rbryanben\Videos\video.mp4' \
--form filename=video.mp4 \
--form =
Which we should receive a 200 response with a query string that looks like this
If you did not receive a 200 response with a query string here of the other reponse codes and what they mean.
- Response:
- 400 (Missing Parameters): One or more of the body parameters file and filename is missing.
- 401 (Denied): The SHARD-KEY defined on creating the instance container did not match the one supplied in the headers.
- 500 (Internal Server Error): You should not get this
- 200 (Success): File was uploaded successfully
For this you need nothing but a browser. There are two options to retriving a file Stream
or Download
retrival.
On streaming the file is returned in small bits and pieces called chunks. Streaming is suitable for video streaming apps where you want to start watching the video
immediately and have an option to skip to a certain part, these features which are not present on Download. Download retrival returnes the file faster than Stream.
Now go ahead and stream the video we uploaded earlier using this link.
http://localhost:7111/api/v1/stream/[email protected]
And to download the file simply change stream
to download
on the url
http://localhost:7111/api/v1/download/[email protected]
Now lets secure our files by putting access control measures. And there are two access control measures we can use Presigned URLs
and IPv4 Access
. In presigned-urls a unique url is generated for you, which in turn you can give to a client to use. These presigned urls are really long with the aim of making it hard to guess a url as there are 36^256 possibilities. As for IPv4 Access its as simple as giving access to some IP address. Lets upload another video of your choice but this time we will add another body parameter mode
which is used to define wheather a file is public or private.
- Request: POST
- Headers:
- SHARD-KEY: Key we defined ealier on as an enviroment variable
- Body (Multipart-form):
- file: The file itself
- filename: The name to save the file as
- mode: private
Your request should look something like this
curl --request POST \
--url http://localhost:7111/api/v1/store \
--header 'Content-Type: multipart/form-data' \
--header 'SHARD-KEY: BasicPassword' \
--form 'file=@C:\Users\rbryanben\Videos\another_video.mp4' \
--form filename=another_video.mp4 \
--form mode=private
A GET request to the path /api/v1/presign/[your_query_string]
is used to generate presigned urls. Within the request the parameter duration
and the header SHARD-KEY
have to be present as defined below.
- Request: GET
- Headers:
- SHARD-KEY: Key we defined ealier on as an enviroment variable
- Parameters:
- duration: Time in seconds the presigned url should remain valid
Using curl your request should look something like this. It is important to note that you cannot generate a presigned url for a public file.
curl --request GET \
--url 'http://localhost:7111/api/v1/presign/Kalahari@another_video.mp4?duration=60' \
--header 'SHARD-KEY: BasicPassword'
In return you should get a signed query string that looks like this
Kalahari@another_video.mp4?signature=9cfb765b-58fd-419a-8990-4a9b18ef6ffb5bc47f08-48be-4704-a2a4-648c4a212c3fd662bdb1-1a54-4381-8b81-a451b26e7962181ef282-ea9a-4709-bcb8-63d300bf652e
Send this string to the path /api/v1/stream
to stream the video.
http://localhost:7111/api/v1/stream/[your_signed_query_string]
A GET request to the path /api/v1/ipv4access/[your_query_string]
is used to grant IPv4 access. Within the request the parameters duration
,ipv4
and the header SHARD-KEY
have to be present as defined by the template below.
- Request: GET
- Headers:
- SHARD-KEY: Key we defined ealier on as an enviroment variable
- Parameters:
- duration: Time in seconds the presigned url should remain valid
- ipv4: IP address to give access to
Your curl request should look something like this
curl --request GET \
--url 'http://localhost:7111/api/v1/ipv4access/Kalahari@another_video.mp4?duration=60&ipv4=172.17.0.1' \
--header 'SHARD-KEY: BasicPassword'
You should get the IP address you specified as an indication of success.
172.17.0.1
It is important to note that we did not use the IP address 127.0.0.1 and used instead the docker network gateway 172.17.0.1 to grant access. This is because requests to containers come from the network gateway unless you have macvlan setup with the container network. To learn more about how you can use your actual IP read the feature docs on Access Control. Now go ahead and stream the file.
Now comes the key feature which is to distribute the instances to increase availability and storage capacity. And let us start off by understanding what a gossip instance is.
The gossip instance is sort of the leader among a group of instances. It keeps the records of all instances that are linked together and provides other instances with infomation about other instances, such that they can retrieve information from them. An instance becomes a gossip instance when another instance registers to it, meaning initially a gossip instance is a regular instance.
To register to an instance simply set the enviroment variable GOSSIP_INSTANCE for your container with the address of the instance you want to register to. NB: Do not define https:// or http:// for in your address otherwise on connection the instance will interpret the address as http://http://<your_address> which will not work. The instance will try to connect over HTTP and if that fails it will try over HTTPS.
Let us start off by creating an instance named Kalahari that will act as our gossip instance, assuming you do not have one from the previous turtorials.
docker run --name instance-kalahari -d -e INSTANCE_NAME=Kalahari -e SHARD_KEY=BasicPassword -p 7110:80 rbryanben/hifadhi:test
Now let us obtain the instance IP using this command
docker inspect -f "{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}" instance-kalahari
Once the instance is running and you have the ip, lets create another instance that runs on port 7510 with the same SHARD-KEY and GOSSIP_INSTANCE set to the IP address of the instance Kalahari. If the shard key does not match, the instance will not be registered
docker run --name instance-sahara -d -e INSTANCE_NAME=Sahara -e SHARD_KEY=BasicPassword -e GOSSIP_INSTANCE=<Kalahari_ip> -p 7510:80 rbryanben/hifadhi:test
That being done, let us check if the instance was registered on the gossip instance buy executing this curl command
curl --request GET \
--url http://localhost:7110/api/v1/registered_instances \
--header 'Content-Type: multipart/form-data' \
--header 'SHARD-KEY: BasicPassword'
You should get a list of all registered instances if everything executed correctly.
{
"Sahara": {
"ipv4": "172.17.0.3",
"total_memory": 244,
"used_memory": 90,
"stored_files_size": 0,
"cached_files_size": 0,
"instance_name": "Sahara",
"stored_files_count": 0,
"cached_files_count": 0,
"uptime": 0,
"healthy": true
},
"Kalahari": {
"ipv4": "localhost:7110",
"total_memory": 244.47995376586914,
"used_memory": 90.63045501708984,
"stored_files_size": 0,
"cached_files_size": 0,
"instance_name": "Kalahari",
"stored_files_count": 0,
"cached_files_count": 0,
"uptime": 798.4496290683746
}
}
Congratulations if you received the above response. You have setup your very first distributed file server. Now go ahead an upload a file to any one of the instances and then try to stream the file from both instances.
Lets now try to setup a 3 distributed instances using Docker Compose this time so that we can easily manage instances instead of doing everything randomly. Paste the following yml into your docker-compose.yml file and then run it.
version: "3.3"
services:
kalahari:
image: "rbryanben/hifadhi:test"
ports:
- 8000:80
environment:
- SHARD_KEY=BasicPassword
- INSTANCE_NAME=kalahari
namib:
image: "rbryanben/hifadhi:test"
ports:
- 8001:80
environment:
- SHARD_KEY=BasicPassword
- INSTANCE_NAME=namib
- GOSSIP_INSTANCE=kalahari
depends_on:
- kalahari
sahara:
image: "rbryanben/hifadhi:test"
ports:
- 8002:80
environment:
- SHARD_KEY=BasicPassword
- INSTANCE_NAME=sahara
- GOSSIP_INSTANCE=kalahari
depends_on:
- kalahari
In my case I used the command below to build the compose file
docker-compose up -d --build
Go ahead and navigate to localhost ports 8000 8001 and 8002 to verify that the instances are running.
Congrats if you have reached this point, which is basically all there is to it. The last thing to do is to run a production version of Hifadhi instead of the test version we were using that uses a development server.
For production you should use the latest tag instead of the test tag. Go ahead and pull the latest version of Hifadhi from Docker Hub
docker pull rbryanben/hifadhi:latest
The production version of Hifadhi uses Gunicorn as its HTTP Server. Gunicorn needs defined, the number of worker procceesses it will have, and the number of threads. If running one instance only it is recommened to set the number of workers between 4-12, as stated by their documentation. And then set the threads to four.
To run a production instance, you need WORKERS and THREADS set as enviroment variables.
docker run --name instance-production -d -e INSTANCE_NAME=ProductionDemo -e SHARD_KEY=BasicPassword -e WORKERS=4 -e THREADS=4 -p 9000:80 rbryanben/hifadhi:latest
Go ahead and navigate to port 9000