Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising errors in production code using v2 of @google-cloud/speech #5609

Closed
sorokinvj opened this issue Aug 9, 2024 · 14 comments
Closed
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sorokinvj
Copy link

sorokinvj commented Aug 9, 2024

Hey guys, our system today started to produce surprisingly many errors, our PROD server is affected and all our users.

Error on "error" in recognizeStream {"code":3,"details":"Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.","metadata":{}}

We did not change any implementation and I hope that recent update of Google Chrome also did not touch audio interfaces.
We are using MediaRecorder API and up until today all users were happy and get their streams recognized successfully.

Here is our main service:

type StreamingRecognitionConfig =
  protos.google.cloud.speech.v2.IStreamingRecognitionConfig;

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const streamingConfig: StreamingRecognitionConfig = {
        config: {
          autoDecodingConfig: {},
        },
        streamingFeatures: {
          interimResults: false,
          enableVoiceActivityEvents: true, // Add this line to enable voice activity events
          voiceActivityTimeout: {
            speechStartTimeout: { seconds: 60 },
            speechEndTimeout: { seconds: 60 },
          },
        },
      };
      const configRequest = {
        recognizer,
        streamingConfig,
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error(
            'Error on "error" in recognizeStream',
            JSON.stringify(error)
          );
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;
      let headersSent = false;
      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
          return;
        }
        if (configSent && !headersSent) {
          recognizeStream.write({ audio: headers });
          headersSent = true;
          return;
        }
        recognizeStream.write({ audio });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };
      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};
@danielbankhead danielbankhead self-assigned this Aug 9, 2024
@danielbankhead
Copy link
Contributor

Hey @sorokinvj, which file types are affected?

@sorokinvj
Copy link
Author

sorokinvj commented Aug 10, 2024

Hey @sorokinvj, which file types are affected?

Hey @danielbankhead, we are using real-time transcriptions.
Surprisingly until yesterday we were able to use real-time with v2 and with WEBM_OPUS encoding, although I see now that in v2 there is no such thing! only

AUDIO_ENCODING_UNSPECIFIED = 0,
LINEAR16 = 1,
MULAW = 2,
ALAW = 3

Though our setup involved autoDecodingConfig: {}. Do you guys support 'audio/webm;codecs=opus' in v2?

Currently we rolled back to v1 with this code and everything went back to normal:

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizeStream = client
        .streamingRecognize({
          config: {
            encoding: 'WEBM_OPUS',
            sampleRateHertz: 48000,
            languageCode: language,
            enableAutomaticPunctuation: true,
            enableSpokenPunctuation: {
              value: true,
            },
          },
          interimResults: false,
          enableVoiceActivityEvents: true,
        })
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
          reject(error);
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          send({
            type: 'TRANSCRIPTION_SERVICE_CLOSED',
            data: 'TRANSCRIPTION_SERVICE_CLOSED',
          });
        });

      let headersSent = false;

      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!headersSent) {
          recognizeStream.write(headers);
          headersSent = true;
          return;
        }
        recognizeStream.write(audio);
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

on the frontend we are using basic new MediaRecorder api to send the data:

    navigator.mediaDevices
      .getUserMedia(constraints)
      .then((media) => {
        // Continue to play the captured audio to the user.
        const output = new AudioContext();
        const source = output.createMediaStreamSource(media);
        source.connect(output.destination);

        const audioStream = new MediaStream(media.getAudioTracks());
        const silenceDetector = new SilenceDetector(audioStream);
        const mediaRecorder = new MediaRecorder(audioStream, {
          mimeType: MIME_TYPE,
        });

        let audioHeaders: BlobEvent;
        mediaRecorder.ondataavailable = (event: BlobEvent) => {
          if (!audioHeaders) {
            audioHeaders = event;
          }

          const isSilent = silenceDetector?.getIsSilent();
          if (!isSilent) {
            if (!audioHeaders) {
              logger.error('No audio headers found');
              return;
            }
            sendAudioChunk(event, audioHeaders);
          }
        };

        mediaRecorder.start(TIMESLICE_INTERVAL);

@meitarbe
Copy link

meitarbe commented Aug 12, 2024

@danielbankhead We have the same use case and issue, it is very difficult for us to move to v1. Any update about it? It is considered critical to our system.

@danielbankhead
Copy link
Contributor

WEBM OPUS should be supported:

I will see what’s going on.

@danielbankhead danielbankhead added status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 12, 2024
@meitarbe
Copy link

WEBM OPUS should be supported:

I will see what’s going on.

@danielbankhead
Thanks for the quick reply! If you need more info, it seems like it was started around Aug 6 (we started to see tones of these errors in our logs on GCP). I also tested webm files that I am 100% sure worked before (we save a pair of the audio and produced text), and they do not work now when nothing is changed from our side.

@paullombardcartello
Copy link

paullombardcartello commented Aug 12, 2024

Also experiencing this issue, also with WebM and seems to have broken a few days ago.

@asafda
Copy link

asafda commented Aug 13, 2024

Also experiencing this issue

@danielbankhead danielbankhead added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 13, 2024
@danielbankhead
Copy link
Contributor

Update: the service team is aware of this issue; I should have another update soon.

@danielbankhead danielbankhead removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Aug 14, 2024
@paullombardcartello
Copy link

Any updates?

@danielbankhead
Copy link
Contributor

A fix is rolling out and should be available shortly

@sorokinvj
Copy link
Author

@danielbankhead any news on the fix? is it available all ready? do you know the release version I should be looking for?

@danielbankhead
Copy link
Contributor

The issue is on the service side; no update required on the client side. The rollback should be rolled out by now, however I'm waiting for the service team to confirm.

@danielbankhead
Copy link
Contributor

The fix should be widely available now.

@felabrecque
Copy link

I can confirm that the problem is fixed. I sent an audio/webm file to the google-cloud-speech v2 recognize functionality and it worked (didn't told me that file format was invalid)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants