Speech to Text with Python or Word

I am thinking about writing a book on Python scripting.

I started to tinker with Text To Speech from PDF with python scripts and then looked at reversing the process from Speech to Text. There was a video on it so I found one and fossiked on my mobile for an Audi Recorder to create a file to use to convert.

Python Speech to Text script

I found the video below and this article How to Convert Speech to Text in Python and used the code from the article.

The audio file I’d recorded was in .MP3 format and it wouldn’t work with the script as the library would only do .WAV files , so I looked to:

  1. Use online process for converting MP3 to WAV
  2. Get a script that converted MP3 to WAV then did Speech to text on file.
  3. Use a different app that recorded in .WAV format

The first process worked well, and I was impressed with the results of the conversion,. only there was no punctuation, I hadn’t used words like comma or full stop in the audio file so it was a continuous stream of text, and it removed my “umms’ of which there were many.

The 2nd process not as successful as I needed some other files on my PC to make this conversion work. After downloading files and trying to run them, I wad unable to get the results from the script.

Process 3 was the easiest solution and I found an app in the Store that would do this. I did a recording and sent it to my PC and converted it and it did a good job, still no punctuation of course.

import speech_recognition as sr

path='Text2Speech2Text/Speach2Text/'
fileIn='Test.wav'
r = sr.Recognizer()
audio_file = sr.AudioFile(path+fileIn)
with audio_file as source:
    audio = r.record(source)
    text = r.recognize_google(audio)
    print(text)    

This script just printed it out to the terminal. I now had a process for doing the conversion.

the rain in Spain falls mainly in the plane and this is a recording to see it because I want to start recording audio so that I can actually write my python booking put my python ideas for the book down because a python book is going to be about beginner to intermediate practical tools for doing personal Bespoke apps that you want to use yourself

Result of running script on audiofile

I had planned to use this as a method of dictating part of the book as I was out and about. Then I came across another method of doing the same process.

Speech to Text Dictation or Transcribe in Word Online

This video showed me another method that I could use. I could use the audio file and get it to write the text, or I could dictate and get it to transcribe on the screen.

I tried the 2nd process and was quite impressed with the results and decided to use it to write some paragraphs for my book.

It did a very good job of removing the “umms” and as you can see the text being transcribed you can add punctuation like comma and full stop.

It was a little slow, and after a while, whilst editing it I just continued to type and found that I stopped using the voice transcribing and continued with the typing.

I think I’m used to speed and thought process with typing, so would find it diffucult to change to the transcribing process as it worked at a different rhythm than I work at.

I will need to think about using the dictation process when I’m out and about and then using word online to convert to text. I may try this method later.

advantages:

  1. it will transcribe into Word, which is the tool I’m using to write my book.
  2. I can quickly make voice notes on my phone, so capture thoughts while out and about

disadvantages

  1. After recording separate files I’ve to email them to my PC. Then download, then open Word Online and upload and transcribe
  2. After transcribing I need to edit then cut-copy/paste into document it pertains to.

Writing idea down when out and about I still need to email and then cut-copy/paste into document it pertains to and also format it as well.

Comment & more – Live Transcribe for Android phone & Google Keep

Some interesting tools I was not aware of, and after playing with them I can see some uses for them.

As I’m dictating, unless I’m totally focused, I forget the next sentence and so lots of umming and erring trying to get the next point down. This is not such a great issue with typing, but a more tedious process on a mobile phone small keyboard.

This made me wonder if there was a voice to text on android phone and it came up with Google Live Transcribe. The only issue is cut/paste is a bit of a nuisance but will definitely try it to capture ideas and paste to note. This may get around keyboard issues on mobile phone.

I found 10 best dictation apps for Android to transcribe audio to text and one of them was Google Keep. So I’ll try using that to transcribe notes as well.

I’ve Google Keep linked to PC as well as Mobile so that may be best process for ideas so that may be best process for capturing ideas.

AI transcribe with Colab & with Whisper on PC

This looks interesting. Using audiofiles and transcribes to file with punctuation.

You can put it on your PC as per:

This is definitely an interesting project to explore.

End comment

Halfway through article I thought about voice instead of keyboard for android and that threw up some oppertunities, then later I saw the video on AI Whisper with python for PC so that may be another way to go, especially as it uses LLM’s and that is an opportunity to play with that technology. So there will be more on this topic.

I did also wonder about different AI text to Text translation for my books for different languages. That may be worth exploring.