Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

chrbsg · 2024-09-17T10:21:00Z

Client

Go v1 SDK but looks like a backend server regression/bug.

Environment

Linux

Code and Dependencies

        req := texttospeechpb.SynthesizeSpeechRequest{
                Voice: &texttospeechpb.VoiceSelectionParams{
                        LanguageCode: "en-US",
                        Name:         "en-US-Journey-D",
                },
                AudioConfig: &texttospeechpb.AudioConfig{
                        AudioEncoding:   texttospeechpb.AudioEncoding_OGG_OPUS,
                        SampleRateHertz: 48000,
                },
        }
        req.Input = &texttospeechpb.SynthesisInput{
                        InputSource: &texttospeechpb.SynthesisInput_Ssml{
                                Ssml: "hi",
                        },
                }
        resp, err := client.SynthesizeSpeech(ctx, &req)

Expected behavior

Opus audio should be generated.

Actual behavior

This code has been working for a long time (since Journey voices were first introduced), but today the client.SynthesizeSpeech call is returning an error:

code = InvalidArgument 
desc = This voice currently only supports LINEAR16 and MULAW output.

Additional context

This looks a lot like the transcription regression where Opus support was apparently removed accidentally: googleapis/google-cloud-node#5609 @danielbankhead

The text was updated successfully, but these errors were encountered:

codyoss · 2024-09-17T13:29:40Z

This sounds like a backend regression and not related to the client library itself. I would recommend reaching out from the service support page to report the issue: https://cloud.google.com/speech-to-text/docs/support

chrbsg · 2024-09-17T15:46:48Z

I opened issue https://issuetracker.google.com/issues/367647327 on the Cloud Platform issue tracker.

SuppliedOrange · 2024-09-18T22:37:08Z

As of the latest update, here's what they say:
https://cloud.google.com/text-to-speech/docs/voice-types#journey_voices_preview

Note: Journey Voices doesn't support SSML input, speaking rate and pitch audio encodings, and only returns LINEAR16 or MULAW audio.

Here's the full list of changes for future reference: https://cloud.google.com/text-to-speech/docs/release-notes

SuppliedOrange · 2024-09-19T11:36:14Z

Any advice with converting the resulting .wav to .mp3 is appreciated. I've tried using ffmpeg and pydub (which also uses ffmpeg underneath) but the audio turns weird and monodirectional.

chrbsg added the triage me I really want to be triaged. label Sep 17, 2024

codyoss closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

chrbsg commented Sep 17, 2024

codyoss commented Sep 17, 2024

chrbsg commented Sep 17, 2024

SuppliedOrange commented Sep 18, 2024 •

edited

Loading

SuppliedOrange commented Sep 19, 2024 •

edited

Loading

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". #10879

Comments

chrbsg commented Sep 17, 2024

Client

Environment

Code and Dependencies

Expected behavior

Actual behavior

Additional context

codyoss commented Sep 17, 2024

chrbsg commented Sep 17, 2024

SuppliedOrange commented Sep 18, 2024 • edited Loading

SuppliedOrange commented Sep 19, 2024 • edited Loading

SuppliedOrange commented Sep 18, 2024 •

edited

Loading

SuppliedOrange commented Sep 19, 2024 •

edited

Loading