Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Gemini in video and audio transcript #359

Merged
merged 29 commits into from
Feb 7, 2025
Merged

Use Gemini in video and audio transcript #359

merged 29 commits into from
Feb 7, 2025

Conversation

MrOrz
Copy link
Member

@MrOrz MrOrz commented Feb 2, 2025

Fixes #322.
This pull request introduces the use of Gemini for handling video and audio transcripts.

  • Remove ffmpeg + whisper; cleans up ffmpeg installation in various places
  • Implement Gemini-based transcript
    • Uses Gemini-2.0-flash as main model, Gemini-1.5-pro-002 as backup
    • Connects to langfuse
  • Introduce tests on different types of videos
  • Refactor: add current env (staging / production, etc) tag to all langfuse instances

Langfuse traces for the transcripts

@MrOrz MrOrz force-pushed the llm-transcript branch 2 times, most recently from 820190c to faf254a Compare February 2, 2025 05:37
@MrOrz MrOrz changed the base branch from master to refactor-uploadmedia February 2, 2025 05:38
Base automatically changed from refactor-uploadmedia to master February 2, 2025 15:39
MrOrz added 16 commits February 3, 2025 00:24
… to pass

- Gemini-1.5-pro-002 works OK, just much slower than Gemini-2.0-flash-exp
- When hallucination happens, Gemini 1.5 will output lots of "\n" or space
- Usually video / audio transcripts are < 1K token
- try to address this error on CI: ClientError: [VertexAI.ClientError]: got status: 400 Bad Request. {\"error\":{\"code\":400,\"message\":\"Failed to initialize the ffmpeg demuxer, please make sure 1. Vertex's P4SA has permission to access input data; 2. video URL is valid; 3. video data is valid.\",\"status\":\"INVALID_ARGUMENT\"}}"
@coveralls
Copy link

coveralls commented Feb 6, 2025

Coverage Status

coverage: 83.053% (+0.3%) from 82.719%
when pulling a88e2b7 on llm-transcript
into 251fe5f on master.

- us-west4 is Oregon, this is where sea cable goes on land
@MrOrz MrOrz requested a review from nonumpa February 6, 2025 18:42
@MrOrz MrOrz self-assigned this Feb 6, 2025
@MrOrz MrOrz requested review from andyy0216 and bil4444 February 6, 2025 18:42
@MrOrz MrOrz marked this pull request as ready for review February 6, 2025 18:42
@bil4444 bil4444 merged commit e188e81 into master Feb 7, 2025
4 checks passed
@bil4444 bil4444 deleted the llm-transcript branch February 7, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce Text-to-speech hallucination
4 participants