Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use pretrained model fine tuned with LORA? #159

Open
krNeko9t opened this issue Jun 20, 2023 · 8 comments
Open

Is it possible to use pretrained model fine tuned with LORA? #159

krNeko9t opened this issue Jun 20, 2023 · 8 comments

Comments

@krNeko9t
Copy link

Same as title, what if i want to use a model fine tuned by LORA to generate reference image? Does the paper support things like that?

@thuliu-yt16
Copy link
Collaborator

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

@krNeko9t
Copy link
Author

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.

@krNeko9t
Copy link
Author

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.

to be specific, the model used in prolificdreamer referenced by path system.guidance.pretrained_model_name_or_path

@thuliu-yt16
Copy link
Collaborator

Let me clarify. There are actually two different things related to what I just said. Maybe I am talking about one of them and you are referring to another.

  1. Just run the prolificdreamer pipeline without any change. And from the pipeline, we will get a lora-tuned model trained on the specific prompt and conditioned on camera pose. And we sample images from this model using t2i schedulers such as DPM-Solver. This is supported in threestudio, just add system.visualize_samples=True.

  2. In prolificdreamer, we replace the model that should be trained with lora in optimization with a model that has already been fine-tuned with lora. So in this case, I guess if the model is completely frozen, it should not work because the model should give an estimation of the distribution of the current UNDER-OPTIMIZED rendered image rather than NEARLY PERFECT rendered image. If the model is still trained during the pipeline, such as loading some weights from ControlNet and continuing to train with vsd, I guess it could work to some extent but I am not very sure how the input should be. In this case, we may need a function to load lora weights and it is very easy to implement.

@krNeko9t
Copy link
Author

16873284843760 In the paper's illustration, there is two model:T2I and LORA, so i guess it refers to the members in the StableDiffusionVSDGuidance class: pipe and pipe_lora? (forgive me for not reading it carefully)

So due to your description, the T2I model has already been fine-tuned with lora? But what does it fine-tuned for? There are three part of weights? the T2I model, the pretrained LORA, the optimization target LORA?

Or the pipe + pipe_lora is the T2I system mentioned in the paper? But the pipe_lora refers to a hugging face full model?

I'm a little confused, and new to this project. If I need to take more time to read the paper or the project, please tell me. Thanks for your help!

@DSaurus
Copy link
Collaborator

DSaurus commented Jun 21, 2023

Hi, @krNeko9t.

In the StableDiffusionVSDGuidance, the "pipe" represents the frozen T2I base model, while "pipe_lora" represents the frozen T2I model with an additional unfrozen LORA1(for 3D generation). So if you want to utilize a pre-trained model fine-tuned with LORA, your base model would become T2I + LORA2(your lora model). It's important to notice that LORA1(for 3D generation) and LORA2 (your lora model) are completely distinct. However, currently, threestudio only supports a single T2I model without any other additional modules as the base model. Therefore, you'll need to implement some code to enable support for your T2I + LORA2. One possible solution, as suggested by @thuliu-yt16, is as follows:

  • Implement a new pipe to load a T2I + LORA2 model as the base model. Both T2I and LORA2 should be frozen.
  • Implement a new pipe_lora to enable training a (T2I+LORA2)+LORA1 model. T2I and LORA2 need to be frozen, while LORA1 should be trained.
  • If splitting and training two LORA models is challenging. You have the option to incorporate an alternative module such as a ControlNet or a UNet (mentioned in prolificdreamer). In this case, your new pipe_lora would become (T2I+LORA2)+ControlNet/UNet. Here the effect of ControlNet/UNet is similar to that of LORA1.

@krNeko9t
Copy link
Author

Can i ask another question: why there is two sd model used in prolific system? the origin paper seems dosen't mention this. what's the benefit of this strategy?

@thuliu-yt16
Copy link
Collaborator

The SD model 2-1-base/2-1 is in eps-prediction/v-prediction mode. The authors said that using a v-prediction model for lora works better. You can definitely try both. I think there is no big difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants