- This is the repo for the Machine Learning for TensorFlow, TensorFlowJS experiments in Computer Vision. This class features experiments with browser-based machine learning.
User testings with two classes of 20+ students.
- Spearheading real-time HTTPS communication for web and mobile, enabling live video transmission and AI prediction.
- Implemented neural networks leveraging TensorFlow hand pose recognition, achieving 80%+ accuracy in classification.
- Integrated automated data collection system, enhancing user experience, and boosting operational efficiency by 50%.
- What problem am I trying to address❓ I noticed the difficulty to quickly interact with peers during multi-user livestream videos (e.g. Zoom, Google meet). For example, in a online class scenario, if a user want to raise hand to ask a question, the user has to click the emoji button -> select emoji -> deselect emoji (three steps) to complete the user flow of interaction with the professor.
- How can AI help to solve this problem ❓ An AI algorithm, potentially computer vision to classify users’ hand postures, and to directly emit signals to the peers.
- What data is needed to create an AI to help address the issue ❓ A series of input data that is able to precisely conclude humans’ hand postures.
This prototype is based on Daniel Shiffman's The Coding Train. I reduced data collection wait time, and extended data collection time, so that the data collection system can automatically input more data samples at a time. This design upgraded the user experience of data collection.
This prototype is based on TensorFlow Handpose and MediaPipe V2. It has higher performance and lower latency than the previous prototype.
- Deep Learning model trained with Jupyter Notebook: Link
- util.py: python funtions to load data (load json data into numpy arrays, shuffle data), preprocess data (slice X_train, y_train into train sets and validation sets), build model (establish neural networks), test model.
- main.ipynb: main workflow to train machine learning model step by step.
- Model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 2048
dense_1 (Dense) (None, 4) 132
=================================================================
Total params: 2180 (8.52 KB)
Trainable params: 2180 (8.52 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
- Model accuracy: 0.89552241563797
- Linux commands:
ssh [email protected]
[email protected]'s password:
root@ruby-zhang:~# cd ./live-web/week5
root@ruby-zhang:~/live-web/week5# node server.js
qz2432.itp.io:
- Implementing a video chat application: Zoom, Microsoft Teams, Google Meets.
- Technology: WebRTC provides APIs for capturing audio and video streams from the user's camera and microphone. These streams can be transmitted in real-time between peers, enabling video and audio calls directly in the browser without the need for third-party plugins.
- Experience: Participants can join meetings via web browsers or dedicated applications on various devices.
- Live Chatbox created using gsap library and DOM
- Live video prototype using WebSocket
- I tested the web application on webcams of my two laptops. This live video prototype is basing on HTML and . The web sockets receives canvas data and emit this data to all other clients. All clients update their src within .