how long convolution ensures causal language modeling #47

0205090923 · 2024-09-05T06:50:41Z

Hello, I would like to know how long convolution ensures causal language modeling. It seems that I couldn't find any explicit padding applied in the code.

0205090923 · 2024-09-05T07:08:08Z

I noticed in another response that you mentioned zero padding was applied to the kernel. I would like to know where this step is performed in the code. Looking forward to your reply

0205090923 · 2024-09-05T13:21:01Z

Hello, I noticed that in long_conv_kernel.py self.L = L*2 if not causal else L, so we should set the L = L for causal? This seems to be inconsistent with the explanations elsewhere.. I'm so confused, can you kindly explain the causal for Longconv?

DanFu09 · 2024-09-05T13:26:19Z

That looks like a bug. That code is only used for LRA, so it might affect some of those results. I don’t believe it’s used anywhere else.

…

On Thu, Sep 5, 2024 at 4:21 PM 0205090923 ***@***.***> wrote: Hello, I noticed that in long_conv_kernel.py self.L = L*2 if not causal else L, so we should set the L = L for causal? This seems to be inconsistent with the explanations elsewhere.. I'm so confused, can you kindly explain the causal for Longconv? — Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDDIIS2DAENPUO6WSWQDCLZVBLFLAVCNFSM6AAAAABNVZFXE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRGY3DSOBRGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0205090923 · 2024-09-05T13:36:48Z

So we should set L = 2 * L for causal? could kindly explain can it works for causal... it seems no explicit padding is applied in the code, thank you

DanFu09 · 2024-09-05T13:48:32Z

Actually I remembered that this is not how this code works. self.L = L creates a kernel of length L that gets padded implicitly up to 2L later on. self.L = 2L creates a kernel of length 2L. The FFT is still of length 2L, so this is actually not a bidirectional kernel. See this blog post, the 2L version is equivalent to the “wrap it around” in the blog, it computes a circular convolution.

…

On Thu, Sep 5, 2024 at 4:37 PM 0205090923 ***@***.***> wrote: So we should set L = 2 * L for causal? could kindly explain can it works for causal... it seems no explicit padding is applied in the code, thank you — Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDDIIVEF2WCUS4B34MFOBDZVBNAPAVCNFSM6AAAAABNVZFXE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRG4YDMMZSGU> . You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how long convolution ensures causal language modeling #47

how long convolution ensures causal language modeling #47

0205090923 commented Sep 5, 2024

0205090923 commented Sep 5, 2024

0205090923 commented Sep 5, 2024

DanFu09 commented Sep 5, 2024 via email

0205090923 commented Sep 5, 2024

DanFu09 commented Sep 5, 2024 via email

how long convolution ensures causal language modeling #47

how long convolution ensures causal language modeling #47

Comments

0205090923 commented Sep 5, 2024

0205090923 commented Sep 5, 2024

0205090923 commented Sep 5, 2024

DanFu09 commented Sep 5, 2024 via email

0205090923 commented Sep 5, 2024

DanFu09 commented Sep 5, 2024 via email