You can install Presidio locally using Docker, as a service in Kubernetes or use it as a framework in a python application.
You will need to have Docker installed and running, and make installed on your system.
Sync this repo use make
to build and deploy locally.
For convenience the script build.sh
at the root of this repo will run the make
commands for you. If you use the script remember to make it executable by running chmod +x build.sh
after syncing the code.
NOTE: Building the deps images currently takes some time (~70 minutes, depending on the build machine). We are working on improving the build time through improving the build and providing pre-built dependencies.
NOTE: Error message You may see error messages like this:
Error response from daemon: pull access denied for presidio/presidio-golang-deps, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
when running the make
commands. These can be ignored.
NOTE: Memory requirements if you get an error message like this
tests/test_analyzer_engine.py ...............The command '/bin/sh -c pipenv run pytest' returned a non-zero code: 137
while building you may need to increase the docker memory limit for your machine
If you prefer to run each step by hand instead of using the build.sh
script these are as follows:
# Build the images
export DOCKER_REGISTRY=presidio
export PRESIDIO_LABEL=latest
make DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL} docker-build-deps
make DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL} docker-build
# Run the containers
docker network create mynetwork
docker run --rm --name redis --network mynetwork -d -p 6379:6379 redis
docker run --rm --name presidio-analyzer --network mynetwork -d -p 3000:3000 -e GRPC_PORT=3000 -e RECOGNIZERS_STORE_SVC_ADDRESS=presidio-recognizers-store:3004 $(DOCKER_REGISTRY)/presidio-analyzer:$(PRESIDIO_LABEL)
docker run --rm --name presidio-anonymizer --network mynetwork -d -p 3001:3001 -e GRPC_PORT=3001 $(DOCKER_REGISTRY)/presidio-anonymizer:$(PRESIDIO_LABEL)
docker run --rm --name presidio-recognizers-store --network mynetwork -d -p 3004:3004 -e GRPC_PORT=3004 -e REDIS_URL=redis:6379 $(DOCKER_REGISTRY)/presidio-recognizers-store:$(PRESIDIO_LABEL)
sleep 30 # Wait for the analyzer model to load
docker run --rm --name presidio-api --network mynetwork -d -p 8080:8080 -e WEB_PORT=8080 -e ANALYZER_SVC_ADDRESS=presidio-analyzer:3000 -e ANONYMIZER_SVC_ADDRESS=presidio-anonymizer:3001 -e RECOGNIZERS_STORE_SVC_ADDRESS=presidio-recognizers-store:3004 $(DOCKER_REGISTRY)/presidio-api:$(PRESIDIO_LABEL)
Once the build is complete you can verify the local deployment by running docker ps
. You should see 4 Presidio containers and one Redis container running with the following names:
presidio-api
presidio-recognizers-store
presidio-anonymizer
presidio-analyzer
redis
- Kubernetes 1.9+ with RBAC enabled.
- Note the pod's resources requirements (CPU and memory) and plan the cluster accordingly.
- Helm
Follow the installation guide at the Readme page
-
Install Redis (Cache for storage and database scanners)
helm install --name redis stable/redis --set usePassword=false,rbac.create=true --namespace presidio-system
-
Optional - Ingress controller for presidio API.
Note that presidio is not deployed with an ingress controller by default.
to change this behavior, deploy the helm chart with api.ingress.enabled=true and specify they type of ingress controller to be used with api.ingress.class=nginx (supported classes are: nginx, traefik or istio). -
Verify that Redis and Traefik/NGINX are installed correctly
-
Deploy from
/charts/presidio
# Based on the DOCKER_REGISTRY and PRESIDIO_LABEL from the previous steps helm install --name presidio-demo --set registry=${DOCKER_REGISTRY},tag=${PRESIDIO_LABEL} . --namespace presidio
-
For more options over the deployment, follow the Development guide
If you're interested in running the analyzer alone, you can install it as a standalone python package by packaging it into a wheel
file. Note that Presidio requires Python >= 3.6.
In the presidio-analyzer folder, run:
python setup.py bdist_wheel
-
Copy the created wheel file (from the
dist
folder of presidio-analyzer) into a clean virtual environment -
install
wheel
package
pip install wheel
- Install the presidio-analyzer wheel file
pip install WHEEL_FILE
Where WHEEL_FILE
is the path to the created wheel file
- Install the Spacy model from Github (not installed during the standard installation)
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.1.0/en_core_web_lg-2.1.0.tar.gz
Note that if you skip this step, the Spacy model would install lazily during the first call to the AnalyzerEngine
- Optional : install
re2
andpyre2
:
-
Install re2:
re2_version="2018-12-01" wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz mkdir re2 tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1 cd re2 && make install
-
Install
pyre2
's fork:pip install https://github.com/torosent/pyre2/archive/release/0.2.23.zip
Note: If you don't install
re2
, Presidio will use theregex
package for regular expressions handling
- Test the installation
To test, run Python on the virtual env you've installed the presidio-analyzer in. Then, make sure this code returns an answer:
from presidio_analyzer import AnalyzerEngine
engine = AnalyzerEngine()
text = "My name is David and I live in Miami"
response = engine.analyze(correlation_id=0,
text = text,
entities=[],
language='en',
all_fields=True,
score_threshold=0.5)
for item in response:
print("Start = {}, end = {}, entity = {}, confidence = {}".format(item.start,
item.end,
item.entity_type,
item.score))