Play piano using Roboflow Inference
This project shows how to do object detection using Roboflow, and gaze detection model using Roboflow Inference which is an open source project.
- We will use an object detection model which was trained for detecting soft drinks on a product shelf. More details about the dataset can be found at: https://universe.roboflow.com/product-recognition-h6t0g/drink-detection.
- We will run gaze detection provided by Roboflow Inference.
- The gazed drink will be highlighted, and it's nutrition info will be displayed. These nutrition info are manually collected by the author via Google search.
- Imagine that each drink is a key of the piano, let's play a sound when you gaze at it! Sound files are from here: https://github.com/py2ai/Piano .
- Have fun! :)
- Start Roboflow Inference docker container
docker run -p 9001:9001 --name inference-server -d roboflow/roboflow-inference-server-cpu
- Download this project to your local and run the following commands
python -m venv venv source venv/bin/activate pip install -r requirements.txt
- Replace the
API_KEY
insrc/main.py
with your api_key from Roboflow, here is the guide for generating a key: https://docs.roboflow.com/api-reference/authentication
Simply speaking, sing up, create a workspace, goto workspace setting, generate a key. - Start the program by:
cd src && python main.py
example.mp4
It's really depend on how do you define the relative movement between your head and camera.
The gaze detection model estimates two angles: yaw and pitch (radian) with the positive direction Right/Up.
It means if you turn your head right/up, the detected yaw/pitch will be positive.
However, you'll be shown as turning left in the camera view when you actually turn right.
This is because the laptop camera is like a mirror.
Feel free to flip the direction by adding/removing the -
sign of variables dx, dy
in main.py
.
With the gaze detection model, we can get yaw and pitch (radian).
Here I assume the distance between you and the object is around 1 meter, and you're looking at the camera center when both yaw and pitch are zero.
Then we can easily calculate the horizontal/vertical axis shift in physical metric (see next section).
To convert these values to pixels, I assume all the detected cans have the same width of 66.2mm.
- Horizontal shift:
DISTANCE_TO_OBJECT * tan(Yaw)
- Vertical shift:
DISTANCE_TO_OBJECT * arccos(Yaw) * tan(Pitch)