[ros-users] [Discourse.ros.org] Do I need a cloud for speech recognition or speech-to-text?

Thu Mar 29 12:25:38 UTC 2018

This is an [X-Post to StackExchange "Robotics" section](https://robotics.stackexchange.com/q/15422/10747).

**Definition**
In the context of _speaking to a robot and make it understand_: Is there a difference between the two words **speech recognition** and **speech to text**?

**My tasks**
In the current state of my knowledge and plans I only need speech to text in the meaning that the robot will record spoken words via its microphones and convert that to strings - not more. How the strings are interpreted is the business of the programmer - the scripts I put on the machine.

It is not my goal to bring automaticly "sense" into the spoken words like the "smart" speakers trying to do.

**Do I need a "cloud"?**
When and why do I need a extern computer system (e. g. a cloud-based servicer like one of the usa-data-hungry "KI"-systems or a NVIDIA Jetson System) for that tasks? What are the "borders" of the different solutions?

**Why I ask?**
My question is not about specific products or coding problems. I prepare to buy a research robot (don't want to make advertising at this point) and try to figuring out the conrete setup/configuration of the machine.

The machine will have contact to vulnarable people in case of research. So there are a lot of reasons why cloud-based service are not an option: Privacy of the subject, data security laws, ethical concers (ethics commissions won't never say OK).

The goal of my question is to get a bigger picture about that topic and it's side topics.

---
[Visit Topic](https://discourse.ros.org/t/do-i-need-a-cloud-for-speech-recognition-or-speech-to-text/4347/1) or reply to this email to respond.