Skip to content

Voice recognition still needs some work

T-Mobile has a feature that transcribes voice mail and then texts it to you. I imagine other mobile carriers do the same. Overall it's remarkably good, but it does have a weak spot: names.

I frequently get messages from Kaiser Permanente that begin "This call is for Kevin Drum." The last four of them have been transcribed as Kevin Dum, Kevin Durham, Kevin Drones, and Kevin Jerome. Clearly there's still some work to be done.

17 thoughts on “Voice recognition still needs some work

  1. MattBallAZ

    Although many people rave about voice recognition on the Google Pixel with their Tensor chip, I've not been very wowed. And I have a flat Ohio accent.

    1. trying_to_be_optimistic

      Interesting that it went with “Dum” instead of the more common “Dumb”. Dum seems to be a word but obviously uncommon. Does it have a filter “do not return any word from insulting word list as a name”?

  2. Yehouda

    They should a feature that asks you to give your name, and then when it closely match it assume that it is your name.
    Considering the amount of such errors that it must be making, I am surprised they haven't thought about it.

  3. rick_jones

    The last four of them have been transcribed as Kevin Dum, Kevin Durham, Kevin Drones, and Kevin Jerome. Clearly there's still some work to be done.

    Yes, but is it all on the recognition side, or might some of it be on the “input” side and how well the people calling are pronouncing your name?

  4. azumbrunn

    I don't think the problem is only with voice recognition. The other factor is how people talk when they leave messages. I have heard quite a few messages I didn't understand because the pronunciation of the speaker was so sloppy. At least voice recognition generally gets phone numbers right even if they are spoken with such insane speed that I would be forced to listen several times before being sure I had noted it down correctly.

    1. Altoid

      "The other factor is how people talk when they leave messages"

      I'm with you on that. People are so used to saying their own names or the business they're calling from that they'll speed-mumble through all that just to get it out of the way. Plus so many people's phone etiquette these days is just atrocious and they can mumble through a message while waving the phone in the air. Plus the audio quality in smartphone calling can be abysmal. All that can make auto-transcription a minor miracle to get halfway right.

      OTOH google's phone app now does in-call transcription that's been very good. And I used to use an app called Visual Voicemail that did a better job all around than google phone's voicemail transcription has been doing.

      Still, reading even a badly garbled message is a whole lot quicker than trying to listen to voicemails. I've never liked voicemails. (And get off my lawn already!)

  5. iamr4man

    I was watching a podcast about AI music this morning. The podcaster is Rick Beato. He said that he tried out a new software that you train with your voice then it will take one of his podcasts and translate it into another language but with his American accent. So you would be able to do a podcast and have different language versions that would sync with the American version. He said that the lips don’t sync but they are working on that. He said he tried it in German because his assistant speaks it and the assistant told him the translation was very good.
    https://www.youtube.com/live/97iMxC3FF6E?si=qhwC5Wvbys13rZ7K

  6. Kalimac

    I was in the overflow jury pool in court. We sat in another room watching the voir dire proceedings on tv with an instant transcript service. The word "juror," which predictably appeared rather frequently in the proceedings, was not in its vocabulary, so we got frequent "German" and "Karen" and the occasional "jerk."

    The one moment that made everybody laugh was when a prospective juror whose name was Stephen Kirwan was transcribed as "Stephen King Kong." Stephen King ... Kong?

    1. Altoid

      Great stuff. And reminds me that for some good laughs there's nothing like the closed captioning on MSNBC or sometimes even CNN. MSNBC's are obviously auto-generated and can make such a hash of the audio you'll think they're on Mars or something. Especially fun with names (Rhonda Santos, for example).

      But not even that can beat the occasional howler on Jeopardy captions where it's obvious the caption writer doesn't have the foggiest notion what's going on. Especially hilarious when I think how many months they have between taping and airing, and unlimited money, to get it right.

  7. danove

    I once used voice to dictate a text in which I referred to my mother, Ethel. It printed as a not very kind part of one's derriere. I admit we've had our problems but that's a bit harsh. Always proofread.

Comments are closed.