- Transcripts: Speech recognition to convert speech into text and a timestamp.
- Search: Search the transcript and jump to a particular part of a conversation.
- Chapters: Break down an episode into auto-generated chapters based on topics.
- Subtitles: Make podcasts accessible to people with hearing difficulties.
This project is the result of a combination and continued development of two of my previous projects:
- Spotify Topics: During the summer of 2020, I participated in Spotify’s summer hackathon and developed a tool that lets you fast forward to timestamps where certain topics were being discussed.
- Spotify Subtitles: In 2022, I continued to experiment by building subtitles for podcasts based on a feature idea which received 4500+ upvotes on Spotify’s community forum.
In 2023, in the midst of the ChatGPT hype, I got inspired to combine my two previous projects into one podcast player and improve it by utilizing Open AI’s APIs.
The technologies used in this project can be found in the table below.
Technology | Use case |
---|---|
React | Frontend framework |
Tailwind | CSS styling library |
Python | Backend to handle transcription logic |
Flask | Connects python backend with react frontend |
Spotify API | To get information about podcast episodes |
Google Speech Recognition API | Converts speech to text, i.e transcribes the podcast |
Open AI's GPT 3.5 API | Segment transcript into chapters based on transcript |
I wanted to learn how to connect a React frontend to a Python backend so I used this project as a learning opportunity to do that. As a result, I did some overengineering by building my own API to handle transcriptions on a Python backend instead of calling a plug-and-play API in the frontend.
More specifically, the frontend makes a call to the Spotify API and gets the URL of the requested podcast. The URL is sent as a request to the backend that downloads the podcast as an mp3 in order to process it.
The reason that the mp3 needs to be processed is that I need to get timestamps for each sentence in order to display them at the correct time in the subtitles. I identify sentences in the transcript by listening for a silence (< 14 decibels) longer than 500 ms. When a silence is identified, I split the original audio file to create a set of smaller audio files, one for each sentence. By doing this, I was able to calculate the start and end time of each sentence by looking at the length of each smaller audio file, see figure below.
All of the audio files are now sent to Google’s speech recognition API and returns a string of the transcribed audio. The transcription is now being sent back to the frontend who makes a request to Open AI’s API to segment the transcript and identify potential topics to divide the episode into different chapters.
Spotify’s API does not allow you to download full podcast episodes, only 30 second previews. This makes the app very limited to use and it is therefore only a proof of concept.
Create a .env file in the root directory and add your API keys:
REACT_APP_SPOTFY_CLIENT_ID=YOUR_SPOTIFY_CLIENT_ID_GOES_HERE
REACT_APP_OPEN_AI_KEY=YOUR_OPEN_AI_KEY_GOES_HERE
Use the following commands to run the project. Start the frontend in one terminal and the backend in another terminal.
export FLASK_APP=backend
export FLASK_DEBUG=1
flask run
cd frontend
npm start
Watch a 1 min demo of the project here.
Home page with Spotify authentication
Discovery page
Loading screen
Episode screen
Episode screen
Subtitles in fullscreen
Overview of chapters within an episode
Audio player divided by chapters
Search transcript