Vagrant is the recommend method for developing with DataHub. It provides a VM matching the DataHub production server, regardless of your host operating system.
If you would prefer to install DataHub manually, see Manual Installation below.
Install VirtualBox https://www.virtualbox.org/.
Install Vagrant https://www.vagrantup.com/downloads.html.
- Clone DataHub:
$ git clone https://github.com/datahuborg/datahub.git
- Add this line to your hosts file (/etc/hosts on most systems):
192.168.50.4 datahub-local.mit.edu
- From your clone, start the VM:
$ vagrant up
This last step might take several minutes depending on your connection and computer.
Once vagrant up
finishes, you can see your environment running at http://datahub-local.mit.edu.
If you see a Datahub module not found
error, this is due to an unresolved issue with thrift code not compiling only after the first vagrant up
. Please see this thread for a resolution: #119.
Note
Vagrant keeps your working copy and the VM in sync, so edits you make to DataHub's code will be reflected on datahub-local.mit.edu. Changes to static files like CSS, JS, and documentation must be collected before the server will notice them. For more information, see management commands below.
If your host environment does not allow use of ports 80 and 443, it is possible to use DataHub on forwarded ports but some extra configuration is required.
- Edit the Vagrantfile to expose ports 80 and/or 443 on usable ports.
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| ... config.vm.network "forwarded_port", guest: 80, host: 18080 config.vm.network "forwarded_port", guest: 443, host: 18081 ...
- Edit the nginx configuration file at provisions/nginx/default.conf to make the reverse proxy aware of what the new ports are.
# Uncomment and customize: map $scheme $port_to_forward { default 18080; https 18081; } ... location / { ... # Uncomment: proxy_set_header X-Forwarded-Host $host:$port_to_forward; proxy_set_header X-Forwarded-Server $server_name; proxy_set_header X-Forwarded-Port $port_to_forward; ... }
- Edit the Django settings file at src/config/settings.py to make Django look for those headers.
# Uncomment and set to True: USE_X_FORWARDED_HOST = True
From the host, run
vagrant reload
to bring up the VM with your custom ports forwarded.If you don't mind losing all of your existing DataHub data, running
vagrant destroy -f && vagrant up
instead will rebuild the entire site using your new custom config. If you want to keep your existing VM's data, follow step 5 below.- Inside the VM, run:
$ cd /vagrant $ sudo sh provisions/docker/build-images.sh $ sudo docker rm -f web $ sudo docker create --name web \ --volumes-from logs \ --volumes-from app \ -v /ssl/:/etc/nginx/ssl/ \ --net=datahub_dev \ -p 80:80 -p 443:443 \ datahuborg/nginx $ sudo docker start web
At the end of these steps, DataHub should be reachable at http://localhost:18080 and https://localhost:18081.
Follow these steps if you would prefer to forgo Vagrant and install DataHub locally. Please note that other sections of the documentation assume that you are using the Vagrant (quickstart) setup.
- Make sure to clone the repo,
git clone https://github.com/datahuborg/datahub.git
- Navigate into the the repo,
cd datahub
DataHub is built on the PostgreSQL database.
- Install Postgres and create a user called
postgres
. See here for step-by-step instructions. - When the Postgres server is running, open the Postgres shell
psql -U postgres
- Create a database for DataHub,
CREATE DATABASE datahub;
- Quit the shell with
\q
- Navigate to the root directory,
cd /
- Create the
user_data
directory as root user,sudo mkdir user_data
We realize that this is not the best location for the user_data
directory. In future commits, we'll make this option configurable and
perhaps default to a different location.
It's useful to install python dependencies in a virtual environment so they are isolated from other python packages in your system. To do this, use virtualenv.
- Install virtualenv with pip,
pip install virtualenv
- Create a virtual environment (called
venv
) within the datahub directory,virtualenv venv
- Activate the virtual environment,
source venv/bin/activate
.
When you are finished with the virtual environment, run deactivate
to close it.
Installing the dependencies for DataHub is easy using the pip package manager.
- Install the dependencies with
pip install -r requirements.txt
- Update
src/settings.py
with your postgres username and password. - Setup the server environment,
source src/setup.sh
(Please note that this must be sourced from the root directory.) - Generate a custom SECRET_KEY,
python src/scripts/generate_secret_key.py
- Sync with the database,
python src/manage.py migrate
- Migrate the data models,
python src/manage.py migrate inventory
h1.
- Run the server,
python src/manage.py runserver
- Navigate to localhost:8000
NOTE: If the server complains that a module is missing, you may need
to source src/setup.sh
and pip install -r requirements.txt
again. Then, python src/manage.py runserver
and navigate to
localhost:8000
DataHub uses Sphinx to build its documentation.
Using the default Vagrant setup:
$ vagrant ssh
$ sudo su
$ dh-rebuild-and-collect-static-files
Using a local installation of Sphinx (Sphinx is included in requirements.txt
):
$ cd /path/to/datahub
$ make html
When submitting a pull request, you must include Sphinx documentation. You can achieve this by adding *.rst
and linking them from other *.rst
files. See the Sphinx tutorial for more information.