Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise use cases with transformers #507

Merged
merged 2 commits into from
Jan 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 209 additions & 7 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -437,8 +437,8 @@ A user joins a teleconference via a web-based video conferencing application at
her desk since no meeting room in her office is available. During the
teleconference, she does not wish that her room and people in the background are
visible. To protect the privacy of the other people and the surroundings, the
application runs a machine learning model such as [[DeepLabv3+]] or
[[MaskR-CNN]] to semantically split an image into segments and replaces
application runs a machine learning model such as [[DeepLabv3+]], [[MaskR-CNN]]
or [[SegAny]] to semantically split an image into segments and replaces
segments that represent other people and background with another picture.

### Skeleton Detection ### {#usecase-skeleton-detection}
Expand Down Expand Up @@ -490,6 +490,20 @@ For better accessibility, a web-based presentation application provides
automatic image captioning by running a machine learning model such as
[[im2txt]] which predicts explanatory words of the presentation slides.

### Text-to-image ### {#usecase-text-to-image}

Images are a core part of modern web experiences. An ability to generate images
based on text input in a privacy-preserving manner enables visual
personalization and adaptation of web applications and content. For example, a web
application can use as an input a natural language description on the web page
or a description provided by the user within a text prompt to produce an
image matching the text description. This text-to-image use case enabled by
latent diffusion model architecture [[LDM]] forms the basis for additional
text-to-image use cases. For example, inpainting where a portion of an existing
image on the web page is selectively modified using the newly generated content,
or the converse, outpainting, where an original image is extended beyond its
original dimensions filling the empty space with generated content.

### Machine Translation ### {#usecase-translation}

Multiple people from various countries are talking via a web-based real-time
Expand Down Expand Up @@ -520,6 +534,29 @@ noise suppression using Recurrent Neural Network such as [[RNNoise]] for
suppressing background dynamic noise like baby cry or dog barking to improve
audio experiences in video conferences.

### Speech Recognition ### {#usecase-speech-recognition}

Speech recognition, also known as speech to text, enables recognition and
translation of spoken language into text. Example applications of speech
recognition include transcription, automatic translation, multimodal interaction,
real-time captioning and virtual assistants. Speech recognition improves
accessibility of auditory content and makes it possible to interact with such
content in a privacy-preserving manner in a textual form. Examples of common
use cases include watching videos or participating in online meetings using
real-time captioning. Models such as [[Whisper]] approach humans in their accuracy
and robustness and are well positioned to improve accessibility of such use cases.

### Text Generation ### {#usecase-text-generation}

Various text generation use cases are enabled by large language models (LLM) that
are able to perform tasks where a general ability to predict the next item
in a text sequence is required. This class of models can translate texts, answer
questions based on a text input, summarize a larger body of text, or generate
text output based on a textual input. LLMs enable better performance compared to
older models based on RNN, CNN, or LSTM architectures and further improve the
performance of many other use cases discussed in this section.
Examples of LLMs include [[t5-small]], [[m2m100_418M]], [[gpt2]], and [[llama-2-7b]].

### Detecting fake video ### {#usecase-detecting-fake-video}

A user is exposed to realistic fake videos generated by ‘deepfake’ on the web.
Expand Down Expand Up @@ -6530,6 +6567,25 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
],
"date": "January 2018"
},
"SegAny": {
"href": "https://arxiv.org/abs/2304.02643",
"title": "Segment Anything",
"authors": [
"Alexander Kirillov",
"Alex Berg",
"Chloe Rolland",
"Eric Mintun",
"Hanzi Mao",
"Laura Gustafson",
"Nikhila Ravi",
"Piotr Dollar",
"Ross Girshick",
"Spencer Whitehead",
"Wan-Yen Lo",
"Tete Xiao"
],
"date": "April 2023"
},
"PoseNet": {
"href": "https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5",
"title": "Real-time Human Pose Estimation in the Browser with TensorFlow.js",
Expand Down Expand Up @@ -6607,6 +6663,18 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
],
"date": "September 2016"
},
"LDM": {
"href": "https://arxiv.org/abs/2112.10752",
"title": "High-Resolution Image Synthesis with Latent Diffusion Models",
"authors": [
"Robin Rombach",
"Andreas Blattmann",
"Dominik Lorenz",
"Patrick Esser",
"Björn Ommer"
],
"date": "April 2022"
},
"GNMT": {
"href": "https://github.com/tensorflow/nmt",
"title": "Neural Machine Translation (seq2seq) Tutorial",
Expand Down Expand Up @@ -6680,6 +6748,19 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
],
"date": "September 2017"
},
"Whisper": {
"href": "https://arxiv.org/abs/2212.04356",
"title": "Robust Speech Recognition via Large-Scale Weak Supervision",
"authors": [
"Alec Radford",
"Jong Wook Kim",
"Tao Xu",
"Greg Brockman",
"Christine McLeavey",
"Ilya Sutskever"
],
"date": "December 2022"
},
"GRU": {
"href": "https://arxiv.org/pdf/1406.1078.pdf",
"title": "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation",
Expand Down Expand Up @@ -6772,12 +6853,133 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
],
"date": "November 2019"
},
"POWERFUL-FEATURES": {
"href": "https://w3c.github.io/webappsec-secure-contexts/",
"title": "Secure Contexts",
"t5-small": {
"href": "https://jmlr.org/papers/volume21/20-074/20-074.pdf",
"title": "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer",
"authors": [
"Colin Raffel",
"Noam Shazeer",
"Adam Roberts",
"Katherine Lee",
"Sharan Narang",
"Michael Matena",
"Yanqi Zhou",
"Wei Li",
"Peter J. Liu"
],
"date": "June 2020"
},
"m2m100_418M": {
"href": "https://arxiv.org/abs/2010.11125",
"title": "Beyond English-Centric Multilingual Machine Translation",
"authors": [
"Mike West"
]
"Angela Fan",
"Shruti Bhosale",
"Holger Schwenk",
"Zhiyi Ma",
"Ahmed El-Kishky",
"Siddharth Goyal",
"Mandeep Baines",
"Onur Celebi",
"Guillaume Wenzek",
"Vishrav Chaudhary",
"Naman Goyal",
"Tom Birch",
"Vitaliy Liptchinsky",
"Sergey Edunov",
"Edouard Grave",
"Michael Auli",
"Armand Joulin"
],
"date": "October 2020"
},
"gpt2": {
"href": "https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf",
"title": "Language Models are Unsupervised Multitask Learners",
"authors": [
"Alec Radford",
"Jeffrey Wu",
"Rewon Child",
"David Luan",
"Dario Amodei",
"Ilya Sutskever"
],
"date": "February 2019"
},
"llama-2-7b": {
"href": "https://arxiv.org/abs/2307.09288",
"title": "Llama 2: Open Foundation and Fine-Tuned Chat Models",
"authors": [
"Hugo Touvron",
"Louis Martin",
"Kevin Stone",
"Peter Albert",
"Amjad Almahairi",
"Yasmine Babaei",
"Nikolay Bashlykov",
"Soumya Batra",
"Prajjwal Bhargava",
"Shruti Bhosale",
"Dan Bikel",
"Lukas Blecher",
"Cristian Canton Ferrer",
"Moya Chen",
"Guillem Cucurull",
"David Esiobu",
"Jude Fernandes",
"Jeremy Fu",
"Wenyin Fu",
"Brian Fuller",
"Cynthia Gao",
"Vedanuj Goswami",
"Naman Goyal",
"Anthony Hartshorn",
"Saghar Hosseini",
"Rui Hou",
"Hakan Inan",
"Marcin Kardas",
"Viktor Kerkez",
"Madian Khabsa",
"Isabel Kloumann",
"Artem Korenev",
"Punit Singh Koura",
"Marie-Anne Lachaux",
"Thibaut Lavril",
"Jenya Lee",
"Diana Liskovich",
"Yinghai Lu",
"Yuning Mao",
"Xavier Martinet",
"Todor Mihaylov",
"Pushkar Mishra",
"Igor Molybog",
"Yixin Nie",
"Andrew Poulton",
"Jeremy Reizenstein",
"Rashi Rungta",
"Kalyan Saladi",
"Alan Schelten",
"Ruan Silva",
"Eric Michael Smith",
"Ranjan Subramanian",
"Xiaoqing Ellen Tan",
"Binh Tang",
"Ross Taylor",
"Adina Williams",
"Jian Xiang Kuan",
"Puxin Xu",
"Zheng Yan",
"Iliyan Zarov",
"Yuchen Zhang",
"Angela Fan",
"Melanie Kambadur",
"Sharan Narang",
"Aurelien Rodriguez",
"Robert Stojnic",
"Sergey Edunov",
"Thomas Scialom"
],
"date": "July 2023"
}
}
</pre>