Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

Closed
chrbsg opened this issue Sep 17, 2024 · 4 comments
Labels
triage me I really want to be triaged.

Comments

@chrbsg
Copy link

chrbsg commented Sep 17, 2024

Client

Go v1 SDK but looks like a backend server regression/bug.

Environment

Linux

Code and Dependencies

        req := texttospeechpb.SynthesizeSpeechRequest{
                Voice: &texttospeechpb.VoiceSelectionParams{
                        LanguageCode: "en-US",
                        Name:         "en-US-Journey-D",
                },
                AudioConfig: &texttospeechpb.AudioConfig{
                        AudioEncoding:   texttospeechpb.AudioEncoding_OGG_OPUS,
                        SampleRateHertz: 48000,
                },
        }
        req.Input = &texttospeechpb.SynthesisInput{
                        InputSource: &texttospeechpb.SynthesisInput_Ssml{
                                Ssml: "hi",
                        },
                }
        resp, err := client.SynthesizeSpeech(ctx, &req)

Expected behavior

Opus audio should be generated.

Actual behavior

This code has been working for a long time (since Journey voices were first introduced), but today the client.SynthesizeSpeech call is returning an error:

code = InvalidArgument 
desc = This voice currently only supports LINEAR16 and MULAW output.

Additional context

This looks a lot like the transcription regression where Opus support was apparently removed accidentally: googleapis/google-cloud-node#5609 @danielbankhead

@chrbsg chrbsg added the triage me I really want to be triaged. label Sep 17, 2024
@codyoss
Copy link
Member

codyoss commented Sep 17, 2024

This sounds like a backend regression and not related to the client library itself. I would recommend reaching out from the service support page to report the issue: https://cloud.google.com/speech-to-text/docs/support

@codyoss codyoss closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2024
@chrbsg
Copy link
Author

chrbsg commented Sep 17, 2024

I opened issue https://issuetracker.google.com/issues/367647327 on the Cloud Platform issue tracker.

@SuppliedOrange
Copy link

SuppliedOrange commented Sep 18, 2024

As of the latest update, here's what they say:
https://cloud.google.com/text-to-speech/docs/voice-types#journey_voices_preview

Note: Journey Voices doesn't support SSML input, speaking rate and pitch audio encodings, and only returns LINEAR16 or MULAW audio.

Here's the full list of changes for future reference: https://cloud.google.com/text-to-speech/docs/release-notes

@SuppliedOrange
Copy link

SuppliedOrange commented Sep 19, 2024

Any advice with converting the resulting .wav to .mp3 is appreciated. I've tried using ffmpeg and pydub (which also uses ffmpeg underneath) but the audio turns weird and monodirectional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

3 participants