How do you compute the gradient projection? #1

aztec1900 · 2024-02-27T12:16:57Z

Impressive work on the innovative data selection method!
I recently finished reading your paper. I'm particularly curious about the computation of the gradient projection. In your paper, you mentioned using a 125M model and reducing the gradient dimension to 16384. Does this imply the need to store a 125M x 16384 = 2048G projection matrix? That seems impractical considering memory constraints. Even if one could generate the random projection matrix on-the-fly, the computational cost for projection would still be substantial. However, your paper suggests that the projection cost is only 1% of the forward-backward process. I find this aspect a bit confusing. Could you provide some information on this matter? Thank you very much!

yuzc19 · 2024-03-15T21:09:38Z

Hi @aztec1900, did you make some progress on this issue? I am also very interested in it, but I didn't find the code to estimate datamodels in this repo.

lengstrom · 2024-07-19T18:41:42Z

Hi, sorry about the late response, I will release the implementation code soon!

RohanAhluwalia · 2024-08-22T17:16:55Z

Is there any update on this! Would love to know how this works as well / what optimizations could be done?

311dada · 2025-01-09T11:23:55Z

Is there any update on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you compute the gradient projection? #1

How do you compute the gradient projection? #1

aztec1900 commented Feb 27, 2024

yuzc19 commented Mar 15, 2024 •

edited

Loading

lengstrom commented Jul 19, 2024

RohanAhluwalia commented Aug 22, 2024

311dada commented Jan 9, 2025

How do you compute the gradient projection? #1

How do you compute the gradient projection? #1

Comments

aztec1900 commented Feb 27, 2024

yuzc19 commented Mar 15, 2024 • edited Loading

lengstrom commented Jul 19, 2024

RohanAhluwalia commented Aug 22, 2024

311dada commented Jan 9, 2025

yuzc19 commented Mar 15, 2024 •

edited

Loading