Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin torch and diffuser versions in requirements #158

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gu-ma
Copy link

@gu-ma gu-ma commented Nov 16, 2023

No description provided.

J4BEZ added a commit to J4BEZ/riffusion that referenced this pull request Dec 14, 2023
…x_iter' " Issue

Thank you for providing me with an interesting project.

While exploring Riffusion through streamlit app, 
I discovered an **issue** and a **solution** for it

So I came here to share the solution.

TL;DR
> pin `torchaudio` version to `2.0.1` 
> same solution as riffusion#158

---
# What is the Issue🤔
![image](https://github.com/riffusion/riffusion/assets/43560917/bdcb46a6-0915-4888-95b7-4ec97b3db682)
`TypeError: __init__() got an unexpected keyword argument 'max_iter'`

# When does it occur⏰
While running '**Text to Audio**' of Riffusion Playground.
The process of converting text into a spectrogram image works very well
but there is an issue of the process of **extracting audio through converted image**

# What is the cause of the issue
Due to update of torchaudio in pytorch
It is believed to be a structural change in **torchaudio.transforms.InverseMelScale.**
![difference between InverseMelScale in torchaudio 2.0.1 and 2.1.1](https://github.com/riffusion/riffusion/assets/43560917/89154faf-5a50-4b8a-af35-0aca40722022)

`riffusion-inference\riffusion\spectrogram_converter.py", line 87, in __init__`
has keyword arguments named '**max_iter**', '**torlerance_loss**', '**tolerance_change**', '**sgdargs**'
```
        # https://pytorch.org/audio/stable/generated/torchaudio.transforms.InverseMelScale.html
        self.inverse_mel_scaler = torchaudio.transforms.InverseMelScale(
            n_stft=params.n_fft // 2 + 1,
            n_mels=params.num_frequencies,
            sample_rate=params.sample_rate,
            f_min=params.min_frequency,
            f_max=params.max_frequency,
            max_iter=params.max_mel_iters,
            tolerance_loss=1e-5,
            tolerance_change=1e-8,
            sgdargs=None,
            norm=params.mel_scale_norm,
            mel_scale=params.mel_scale_type,
        ).to(self.device)
```

> 📑 A good article to refer to
> < [InverseMelScale - Torchaudio 2.0.1 documentation](https://pytorch.org/audio/2.0.1/generated/torchaudio.transforms.InverseMelScale.html?highlight=inversemelscale#torchaudio.transforms.InverseMelScale) >
> < [InverseMelScale - Torchaudio 2.1.1 documentation](https://pytorch.org/audio/2.1.1/generated/torchaudio.transforms.InverseMelScale.html?highlight=inverse#torchaudio.transforms.InverseMelScale) >


# How to solve the issue☑
We can solve this issue by installing torchaudio which v.2.0.x 

1. Please remove the existing torchaudio, torchvision and torch.

2. Please revise the `requirements.txt` as below to install pytorch 2.0.1
Before:
```
...
torch
torchaudio
torchvision
...
```

After:
```
...
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
...
```

3. if you want to use CUDA, install PyTorch and torchaudio with CUDA support before `python -m pip install -r requirements.txt`
> It is recommended to install pytorch 2.0.1on account of  we pinned the version of torchaudio to 2.0.x .
> < [PyTorch 2.0.1 Install Guide](https://pytorch.org/get-started/previous-versions/#v201) >

4. run `python -m pip install -r requirements.txt`

# Result:


---
Thank you for reading this long issue
I hope you all have a great and peaceful Christmas season.
Merry Christmas🎄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant