-
Notifications
You must be signed in to change notification settings - Fork 66
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #85 from PrathamSingla15/Text_invs
⚡ Add summary for Textual_inversion
- Loading branch information
Showing
5 changed files
with
41 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | ||
Rinon Gal1, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or | ||
**ICVR** **2023** | ||
|
||
## Summary | ||
The paper introduces a method which enables users a high level of creative freedom in generating images from natural language prompts by finding new words in the textual embedding space, represented by pseudo-words which helps in introducing new specific concepts into text-to-image models. | ||
<img src='../images/textual_inversion 1.png'> | ||
|
||
|
||
## Contributions | ||
|
||
- Addressing the challenge of large-scale text-to-image models which are constrained by user's ability to describe desired target through text, by introducing personalized text-to-image generation. | ||
|
||
- Introduction of 'Textual inversions' as a method to find pseudo words in embedding space for capturing semantics and visual details. | ||
|
||
- Address the challenges associated with fine-tuning or re-training models for each new concept while preserving model's existing capabilities. | ||
|
||
## Methodology | ||
|
||
The primary challenge being addressed in the text is the difficulty of introducing new concepts into large-scale text-to-image models. The common challenges faced include the high cost of re-training models with expanded datasets for each new concept and the problem of catastrophic forgetting when fine-tuning on few examples. The proposed solution aims to tackle these challenges by introducing a method called "Textual Inversion" which works by transforming specific concepts into new made-up words. These made-up words can then be used to describe and create new scenes using simple sentences, making it easy for users to make changes. | ||
|
||
<img src='../images/textual_inversion 2.png'> | ||
|
||
The approach looks at the word-embedding stage of the text encoders which focuses on visually reconstructing images ensuring that the generated pictures match the desired concepts more accurately. | ||
|
||
## Results | ||
The method is tested on Latent Diffusion models and the extensive experiments shows the effectiveness of proposed approach in capturing and representing concepts using natural language, especially with a single word and offering good flexibility. | ||
<img src='../images/textual_inversion 3.png'> | ||
<img src='../images/textual_inversion 4.png'> | ||
|
||
## Two-Cents | ||
|
||
- The paper provides more freedom in creating images from words. It's good at capturing the general idea of a concept, even though it might not be super precise with details. | ||
|
||
- This method makes it easier to edit pictures, but there's a risk that people might use it to create fake images, especially of private individuals. This suggests a need for more research in this area to understand and address these potential misuses. | ||
|
||
## Resources | ||
- Link to [Paper](https://arxiv.org/pdf/2208.01618.pdf) and [Code](https://github.com/rinongal/textual_inversion) | ||
- [YouTube video](https://youtu.be/opD_H9bED9Y?si=dMgGO0d_2ClKDUY7) | ||
- [Blog](https://medium.com/@onkarmishra/how-textual-inversion-works-and-its-applications-5e3fda4aa0bc) | ||
|