Play piano using Roboflow Inference

This project shows how to do object detection using Roboflow, and gaze detection model using Roboflow Inference which is an open source project.

We will use an object detection model which was trained for detecting soft drinks on a product shelf. More details about the dataset can be found at: https://universe.roboflow.com/product-recognition-h6t0g/drink-detection.
We will run gaze detection provided by Roboflow Inference.
The gazed drink will be highlighted, and it's nutrition info will be displayed. These nutrition info are manually collected by the author via Google search.
Imagine that each drink is a key of the piano, let's play a sound when you gaze at it! Sound files are from here: https://github.com/py2ai/Piano .
Have fun! :)

How to run it?

Start Roboflow Inference docker container

docker run -p 9001:9001 --name inference-server -d roboflow/roboflow-inference-server-cpu

Download this project to your local and run the following commands

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Replace the API_KEY in src/main.py with your api_key from Roboflow, here is the guide for generating a key: https://docs.roboflow.com/api-reference/authentication
Simply speaking, sing up, create a workspace, goto workspace setting, generate a key.
Start the program by:
```
cd src && python main.py
```

Here is an example

example.mp4

The horizontal direction seems flipped?

It's really depend on how do you define the relative movement between your head and camera.
The gaze detection model estimates two angles: yaw and pitch (radian) with the positive direction Right/Up. It means if you turn your head right/up, the detected yaw/pitch will be positive.
However, you'll be shown as turning left in the camera view when you actually turn right. This is because the laptop camera is like a mirror. Feel free to flip the direction by adding/removing the - sign of variables dx, dy in main.py.

How is the gazing point be calculated?

With the gaze detection model, we can get yaw and pitch (radian). Here I assume the distance between you and the object is around 1 meter, and you're looking at the camera center when both yaw and pitch are zero.
Then we can easily calculate the horizontal/vertical axis shift in physical metric (see next section). To convert these values to pixels, I assume all the detected cans have the same width of 66.2mm.

Formulas for calculating horizontal/vertical shifts

Horizontal shift: DISTANCE_TO_OBJECT * tan(Yaw)
Vertical shift: DISTANCE_TO_OBJECT * arccos(Yaw) * tan(Pitch)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
sounds		sounds
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Play piano using Roboflow Inference

How to run it?

Here is an example

The horizontal direction seems flipped?

How is the gazing point be calculated?

Formulas for calculating horizontal/vertical shifts

About

Releases

Packages

Languages

License

PacificDou/roboflow-piano

Folders and files

Latest commit

History

Repository files navigation

Play piano using Roboflow Inference

How to run it?

Here is an example

The horizontal direction seems flipped?

How is the gazing point be calculated?

Formulas for calculating horizontal/vertical shifts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages