Modern speech to text (STT) technology, i.e. software for turning your voice into text on screen, is amazing. If you've never tried Whisper I recommend give it a try to see what I mean. There are quite a few apps for this, but here's an online one: https://huggingface.co/spaces/openai/whisper When using Whisper you _DO NOT_ have to worry about transcription errors—That's how good it is. Compare this to the built-in transcription on your phone[^1]. [^1]: To be fair, I'm not up to date on the state of Android. Perhaps it offers better transcription than iOS. # It's faster to speak than to type There's a reason stenographers don't use a standard keybaord—it's not fast enough to keep up with human speech. It's faster to talk than to type. Need I say more? There is of course the minor downside that you have zero privacy and might even disrupt the quiet environment your in. Fair enough, but when possible talking to the computer is a great way to get information out of your head and into a digital format. This is 10x true for people that can't touch type, and there are many of those people. It's also 10x true for typing on mobile, where the touch keyboards make rapid text entry impossible. Some people are certainly faster than others, speaking is faster yet and it's accessible to all. Unfortunately Apple continues to ship subpar STT technology. It bears reiterating: With a state of the art STT system in 2025 you don't have to worry about it misunderstanding you, you just speak. # Speaking is more intuitive and accessible than typing Everyone is used to speaking, so it's a more natural way to interact with a computer. I'm merely speaking of text entry in this post, but future iterations of UIs will likely allow you to speak (or type) to operate the interface. LLMs enable very flexible interfaces that take unstructured human input and do something meaningful with it. # Speaking out loud lends itself well to instructing AI As of early 2025 as I write this LLMs do quite well with lots fo information. When typing, there's a temptation to be more concise. When trying to get the AI to write some code the entire point of the exercise is to save time, so any additional moment spent typing is a cost. This is of course still true while speaking, but since speaking is easier then typing its less of a burden. It's also easy to adopt the figurative hat of a product manager instructing an IC on what is needed. Tell the AI what needs to be done and then let it rip. This works quite well in my experience. # A caveat on formatting STT tools like Whisper are great at turning speech into text, but they don't handle any formatting for you. This means when you want formatting you have to either: - Reformat text yourself, adding line breaks or headings or whatnot - Pass your transcript to a second AI to reformat For prose the second approach works _very_ well. A competent LLM can easily handle turning a text wall into multiple paragraphs and even bulleted lists where appropriate. It won't, however, add bold or italics. In the future perhaps the transcription model can try to encode emotion or emphasis in the metadata. For code I still don't have a good solution other than dictating instructions, inserting code manually, and then dictating more instructions. it works well enough.