Surprising errors in production code using v2 of @google-cloud/speech #5609

sorokinvj · 2024-08-09T20:25:21Z

Hey guys, our system today started to produce surprisingly many errors, our PROD server is affected and all our users.

Error on "error" in recognizeStream {"code":3,"details":"Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.","metadata":{}}

We did not change any implementation and I hope that recent update of Google Chrome also did not touch audio interfaces.
We are using MediaRecorder API and up until today all users were happy and get their streams recognized successfully.

Here is our main service:

type StreamingRecognitionConfig =
  protos.google.cloud.speech.v2.IStreamingRecognitionConfig;

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const streamingConfig: StreamingRecognitionConfig = {
        config: {
          autoDecodingConfig: {},
        },
        streamingFeatures: {
          interimResults: false,
          enableVoiceActivityEvents: true, // Add this line to enable voice activity events
          voiceActivityTimeout: {
            speechStartTimeout: { seconds: 60 },
            speechEndTimeout: { seconds: 60 },
          },
        },
      };
      const configRequest = {
        recognizer,
        streamingConfig,
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error(
            'Error on "error" in recognizeStream',
            JSON.stringify(error)
          );
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;
      let headersSent = false;
      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
          return;
        }
        if (configSent && !headersSent) {
          recognizeStream.write({ audio: headers });
          headersSent = true;
          return;
        }
        recognizeStream.write({ audio });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };
      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

The text was updated successfully, but these errors were encountered:

danielbankhead · 2024-08-09T20:47:28Z

Hey @sorokinvj, which file types are affected?

sorokinvj · 2024-08-10T13:00:50Z

Hey @sorokinvj, which file types are affected?

Hey @danielbankhead, we are using real-time transcriptions.
Surprisingly until yesterday we were able to use real-time with v2 and with WEBM_OPUS encoding, although I see now that in v2 there is no such thing! only

AUDIO_ENCODING_UNSPECIFIED = 0,
LINEAR16 = 1,
MULAW = 2,
ALAW = 3

Though our setup involved autoDecodingConfig: {}. Do you guys support 'audio/webm;codecs=opus' in v2?

Currently we rolled back to v1 with this code and everything went back to normal:

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizeStream = client
        .streamingRecognize({
          config: {
            encoding: 'WEBM_OPUS',
            sampleRateHertz: 48000,
            languageCode: language,
            enableAutomaticPunctuation: true,
            enableSpokenPunctuation: {
              value: true,
            },
          },
          interimResults: false,
          enableVoiceActivityEvents: true,
        })
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
          reject(error);
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          send({
            type: 'TRANSCRIPTION_SERVICE_CLOSED',
            data: 'TRANSCRIPTION_SERVICE_CLOSED',
          });
        });

      let headersSent = false;

      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!headersSent) {
          recognizeStream.write(headers);
          headersSent = true;
          return;
        }
        recognizeStream.write(audio);
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

on the frontend we are using basic new MediaRecorder api to send the data:

    navigator.mediaDevices
      .getUserMedia(constraints)
      .then((media) => {
        // Continue to play the captured audio to the user.
        const output = new AudioContext();
        const source = output.createMediaStreamSource(media);
        source.connect(output.destination);

        const audioStream = new MediaStream(media.getAudioTracks());
        const silenceDetector = new SilenceDetector(audioStream);
        const mediaRecorder = new MediaRecorder(audioStream, {
          mimeType: MIME_TYPE,
        });

        let audioHeaders: BlobEvent;
        mediaRecorder.ondataavailable = (event: BlobEvent) => {
          if (!audioHeaders) {
            audioHeaders = event;
          }

          const isSilent = silenceDetector?.getIsSilent();
          if (!isSilent) {
            if (!audioHeaders) {
              logger.error('No audio headers found');
              return;
            }
            sendAudioChunk(event, audioHeaders);
          }
        };

        mediaRecorder.start(TIMESLICE_INTERVAL);

meitarbe · 2024-08-12T11:03:39Z

@danielbankhead We have the same use case and issue, it is very difficult for us to move to v1. Any update about it? It is considered critical to our system.

danielbankhead · 2024-08-12T14:49:03Z

WEBM OPUS should be supported:

https://cloud.google.com/speech-to-text/docs/encoding
google-cloud-node/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto

Line 708 in 34e36a6

// * WEBM_OPUS: Opus audio frames in a WebM container.

I will see what’s going on.

meitarbe · 2024-08-12T15:05:07Z

WEBM OPUS should be supported:

https://cloud.google.com/speech-to-text/docs/encoding

google-cloud-node/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto

Line 708 in 34e36a6

// * WEBM_OPUS: Opus audio frames in a WebM container.

I will see what’s going on.

@danielbankhead
Thanks for the quick reply! If you need more info, it seems like it was started around Aug 6 (we started to see tones of these errors in our logs on GCP). I also tested webm files that I am 100% sure worked before (we save a pair of the audio and produced text), and they do not work now when nothing is changed from our side.

paullombardcartello · 2024-08-12T19:38:59Z

Also experiencing this issue, also with WebM and seems to have broken a few days ago.

asafda · 2024-08-13T08:09:14Z

Also experiencing this issue

danielbankhead · 2024-08-13T22:02:44Z

Update: the service team is aware of this issue; I should have another update soon.

paullombardcartello · 2024-08-21T10:39:20Z

Any updates?

danielbankhead · 2024-08-21T21:12:24Z

A fix is rolling out and should be available shortly

sorokinvj · 2024-08-28T10:51:04Z

@danielbankhead any news on the fix? is it available all ready? do you know the release version I should be looking for?

danielbankhead · 2024-08-28T18:28:59Z

The issue is on the service side; no update required on the client side. The rollback should be rolled out by now, however I'm waiting for the service team to confirm.

danielbankhead · 2024-08-30T16:28:57Z

The fix should be widely available now.

felabrecque · 2024-09-03T15:43:48Z

I can confirm that the problem is fixed. I sent an audio/webm file to the google-cloud-speech v2 recognize functionality and it worked (didn't told me that file format was invalid)

danielbankhead self-assigned this Aug 9, 2024

danielbankhead added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 13, 2024

danielbankhead removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Aug 14, 2024

danielbankhead closed this as completed Sep 3, 2024

chrbsg mentioned this issue Sep 17, 2024

Regression: TTS Journey voices with Opus: "This voice currently only supports LINEAR16 and MULAW output". googleapis/google-cloud-go#10879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising errors in production code using v2 of @google-cloud/speech #5609

Surprising errors in production code using v2 of @google-cloud/speech #5609

sorokinvj commented Aug 9, 2024 •

edited

Loading

danielbankhead commented Aug 9, 2024

sorokinvj commented Aug 10, 2024 •

edited

Loading

meitarbe commented Aug 12, 2024 •

edited

Loading

danielbankhead commented Aug 12, 2024

meitarbe commented Aug 12, 2024

paullombardcartello commented Aug 12, 2024 •

edited

Loading

asafda commented Aug 13, 2024

danielbankhead commented Aug 13, 2024

paullombardcartello commented Aug 21, 2024

danielbankhead commented Aug 21, 2024

sorokinvj commented Aug 28, 2024

danielbankhead commented Aug 28, 2024

danielbankhead commented Aug 30, 2024

felabrecque commented Sep 3, 2024

Surprising errors in production code using v2 of @google-cloud/speech #5609

Surprising errors in production code using v2 of @google-cloud/speech #5609

Comments

sorokinvj commented Aug 9, 2024 • edited Loading

danielbankhead commented Aug 9, 2024

sorokinvj commented Aug 10, 2024 • edited Loading

meitarbe commented Aug 12, 2024 • edited Loading

danielbankhead commented Aug 12, 2024

meitarbe commented Aug 12, 2024

paullombardcartello commented Aug 12, 2024 • edited Loading

asafda commented Aug 13, 2024

danielbankhead commented Aug 13, 2024

paullombardcartello commented Aug 21, 2024

danielbankhead commented Aug 21, 2024

sorokinvj commented Aug 28, 2024

danielbankhead commented Aug 28, 2024

danielbankhead commented Aug 30, 2024

felabrecque commented Sep 3, 2024

sorokinvj commented Aug 9, 2024 •

edited

Loading

sorokinvj commented Aug 10, 2024 •

edited

Loading

meitarbe commented Aug 12, 2024 •

edited

Loading

paullombardcartello commented Aug 12, 2024 •

edited

Loading