-
-
Notifications
You must be signed in to change notification settings - Fork 89
Spoken to Signed
Amit Moryossef edited this page Aug 1, 2024
·
29 revisions
flowchart TD
A0[Spoken Language Audio] --> A1(Spoken Language Text)
A1[Spoken Language Text] --> B[<a href='https://github.com/sign/translate/issues/10'>Language Identification</a>]
A1 --> C(<a href='https://github.com/sign/translate/tree/master/functions/src/text-normalization'>Normalized Text</a>)
B --> C
C & B --> Q(<a href='https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter'>Sentence Splitter</a>)
Q & B --> D(<a href='https://github.com/sign-language-processing/signbank-plus'>SignWriting</a>)
C -.-> M(<a href='https://github.com/ZurichNLP/spoken-to-signed-translation' title='We would like to move away from glosses'>Glosses</a>)
M -.-> E
D --> E(<a href='https://github.com/sign-language-processing/signwriting-animation'>Pose Sequence</a>)
D -.-> I(<a href='https://github.com/sign-language-processing/signwriting-illustration'>Illustration</a>)
N --> H(<a href='https://github.com/sign/translate/issues/68'>3D Avatar</a>)
N --> G(<a href='https://github.com/sign-language-processing/pose'>Skeleton Viewer</a>)
N --> F(<a href='https://github.com/sign-language-processing/pose-to-video' title='Help wanted!'>Human GAN</a>)
H & G & F --> J(Video)
J --> K(Share Translation)
D -.-> L(<a href='https://github.com/sign-language-processing/signwriting-description' title='Poor performance. Help wanted!'>Description</a>)
O --> N(<a href='https://github.com/sign-language-processing/fluent-pose-synthesis' title='Currently skipped. Help Wanted!'>Fluent Pose Sequence</a>)
E --> O(<a href='https://github.com/sign-language-processing/pose-anonymization'>Pose Appearance Transfer</a>)
linkStyle default stroke:green;
linkStyle 3,5,7 stroke:lightgreen;
linkStyle 10,11,12,15 stroke:red;
linkStyle 8,9,14,19,20 stroke:orange;
-
Spoken Language Audio → Spoken language text:
- Full support of local speech-to-text (and text-to-speech), in all locally supported languages. (no Firefox support)
-
Spoken Language Text → Language Identification:
-
Spoken Language Text → Normalized Text:
- LLM Server side multilingual text normalization model
-
Spoken Language Text → SignWriting:
- Server side multilingual machine translation model (low~ quality)
- Client/Server side translation implementation with Bergamot (#46)
- Serve translation models (#57)
-
SignWriting → Pose Sequence:
- Server side implementation of sign-stitching (low~ quality), using OpenPose poses, reliant on spoken language text)
- New server side implementation, animating directly from SignWriting(/HamNoSys) sequences (work by Rotem, #15)
- Offline client-side inference support for the animation model
-
Pose Sequence →
- Skeleton Viewer: Barebones viewer using our in house Pose Viewer (Fast, low power, and helpful for debugging)
- Human GAN: Using a client side machine learning model to skin the pose like a human. Relying on a (heavy) model to generate low-resolution images (
256x256
), and a (fast) model to upscale the images (768x768
). (#25, #58) - 3D Avatar: Animates a 3D human-looking avatar using machine learning (#16), including AR support.
- Additional Features:
- Pose sequences are transformed into videos, saving device power and memory (#45)
- Once videos is ready, they support
Copy
,Download
, andShare
operations
-
Internationalization:
- Supports 104 languages, and both
LTR
andRTL
layouts. - Uses the user's browser/phone language, and different languages via a URL parameter.
- Supports 104 languages, and both