In this tech preview we apply
Whisper from
OpenAI and the waveform visualization library from
wavesurfer.js to transcribe audio into text and visualize the audio as waveforms. The waveforms may be dragged to try out syncronization.
This Preview is designed to run on a single Linux server with limited memory and storage, so we are not allowing files larger than 20 MB.
The code is available at:
https://gitlab.origo.io/origosys/whisper-playing-field.
If you are into Kubernetes, you can run this web-app as a pre-built Docker image with this
yaml file.
The back-end of this application consists of a single Python file, the front-end of a html file and a Javascript file. No frameworks, no shadow DOMs, no hydrating, no object stores, no Github actions, no queueing systems (though this is obviously needed if we want to scale it); just bit of plain Javascript and a Python file. Sometimes the right tool for driving in a nail is just a hammer.