Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for MiniCPM-V #92

Open
2 tasks done
DarioPTWR opened this issue Nov 30, 2024 · 4 comments
Open
2 tasks done

[Feature Request] Support for MiniCPM-V #92

DarioPTWR opened this issue Nov 30, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@DarioPTWR
Copy link

Required prerequisites

Motivation

Hi! Love the work done in this repo. I see that DPO currently supports various Text+Image -> Text models, and was wondering if you could extend this support to the MiniCPM-V MLLM as well? That would be greatly appreciated!

MiniCPM-V (i.e., OmniLMM-3B) is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and MiniCPM-2.4B.

Thank you!

Solution

No response

Alternatives

No response

Additional context

No response

@DarioPTWR DarioPTWR added the enhancement New feature or request label Nov 30, 2024
@Gaiejj
Copy link
Member

Gaiejj commented Dec 1, 2024

Thank you very much for your support and recognition of our work! We are preparing to refactor the chat_template, which will facilitate the integration of new models. In this PR, we will also update the support for MiniCPM-V. Stay tuned!

@DarioPTWR
Copy link
Author

Thank you! Can't wait to see the update soon for MiniCPM-V 1.0! In the meantime, I was just wondering if it would still be possible to apply DPO retraining to MiniCPM-V with the current state of the repo? Or would it be completely impossible? Thanks!

@Gaiejj
Copy link
Member

Gaiejj commented Dec 2, 2024

We have implemented DPO fine-tuning for MiniCPM-V in #93 (SFT and RLHF are on the way). We cordially invite you to experience this change as a beta user.

You can clone this repository using the following command.

git clone https://github.com/Gaiejj/align-anything.git -b dev-refactor

Then, install the necessary dependencies using the command provided below, with conda installed with CUDA.

conda install nvidia/label/cuda-12.2.0::cuda
export CUDA_HOME=$CONDA_PREFIX

pip install -e .[train]
pip install -e .[minicpmv]

After that, you only need to run the following script under the scripts folder:

./minicpmv_dpo.sh

If you feel that there is any anomaly or are not satisfied with the training, we sincerely hope that you can provide us with feedback, as it is very important to us.

@DarioPTWR
Copy link
Author

Thank you so much! Will test it out on my end, and comment in this PR again if there are any anomalies spotted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants