Best Voice Recognition Solutions

Hi,

After testing several voice-recognition solutions on Linux, I found that each one has its own strengths depending on what you're looking for.

Murmur is probably the simplest free and lightest option. It works locally with Whisper.cpp, so privacy is excellent since everything stays offline. I also found the recognition surprisingly good with medium or large Whisper models. On the other hand, the project still feels a bit young: integration with the whole system can sometimes be rough, and performance depends a lot on CPU power unless GPU acceleration is configured properly.

Speech Note is another very solid free choice. Installation through Flatpak is straightforward, and the software remains lightweight while offering very good transcription quality in both English and French. Whisper.cpp performs impressively well offline, but real-time dictation can occasionally introduce some latency on weaker machines. The interface is functional, though still less polished than commercial solutions.

Voicy, a commercial software with a Linux version, is the one that impressed me the most overall. It goes far beyond simple dictation and feels more like a complete AI productivity assistant. Voice recognition quality is excellent, multilingual support works very well, and the software is available across most platforms, which is still relatively rare in this field. What really stands out are the AI features: automatic rewriting, summarization, custom prompts, text formatting, smart replacements, and more... The interface is modern, comfortable to use, and clearly designed for long-term productivity. It also has strong accessibility potential for users who rely heavily on voice interaction.

Are there other solutions you know?

Handy I've tested once for a user

Personally I use Blabby

Never used it as I think it was built for Debian. The only one I have ever heard of is Julius. Using Brave A.I. to get more information:

" is an open-source, high-performance speech recognition engine (not a pre-packaged app) designed for large vocabulary continuous speech recognition (LVCSR). It is primarily written in C and released under a BSD 3-clause license , making it a lightweight, modular tool for developers and researchers rather than a consumer-facing application.

Key characteristics of Julius include:

  • Performance : It uses a two-pass tree-trellis search algorithm to achieve real-time decoding on various devices, from micro-computers to cloud servers, with low memory requirements (less than 32MB for basic work areas).
  • Technical Standards : The engine supports N-gram language models (ARPA format) and acoustic models (HTK ASCII format), along with support for Deep Neural Networks (DNN) in recent versions.
  • Capabilities : It handles continuous dictation , grammar-based parsing (via its integrated parser Julian), and isolated word recognition . It also supports multi-model decoding in a single thread and offers a server mode with a control API.
  • Platform Support : While primarily designed for Linux and Unix systems, it also runs on Windows , macOS , and Android .

Because Julius is a backend engine, users typically integrate it into their own software or use community-created wrappers (such as those for SIGVerse or ROS ) to create a speech-to-text interface. It does not come with a built-in graphical user interface for end-users.

AI-generated answer. Please verify critical facts."

How do I install Julius on Linux?

" To install Julius on Linux, you have two primary methods:

1. Install from Package Manager (Easiest)

For Debian-based distributions like Ubuntu , the simplest method is using apt-get :

sudo apt-get install julius

You may also want the VoxForge language model package for English:

sudo apt-get install julius-voxforge

2. Compile from Source (Latest Version)

For the most up-to-date version, compile from the official GitHub repository:

# Install build dependencies
sudo apt-get install build-essential zlib1g-dev libsdl2-dev libasound2-dev

# Clone the repository and compile
git clone https://github.com/julius-speech/julius.git
cd julius
./configure --enable-words-int
make -j4

After compilation, the executable will be located at julius/julius ."

How do I use Julius for speech recognition?

" To use Julius for speech recognition, you need to provide it with three essential models: an acoustic model , a language model , and a dictionary .

1. Prepare the Required Models

  • Acoustic Model : A Hidden Markov Model (HMM) in HTK ASCII format, describing the audio characteristics of phonemes. Pre-built models are available for languages like Japanese and English (e.g., from VoxForge).
  • Language Model : A file (in ARPA format) that defines the probability of word sequences (N-gram) or a grammar-based rule (DFA) for specific commands.
  • Dictionary : A file listing words and their phonetic pronunciations.

2. Run Julius

After installing the models, use a command like this to start recognition from the microphone:

julius -input mic -C /path/to/your/jconf_file.jconf

The .jconf file is a configuration file that specifies the paths to your acoustic model, language model, and dictionary, along with other settings."

How do I create a custom dictionary for Julius?

"To create a custom dictionary for Julius , you do not write the .dict file directly. Instead, you create a .voca file and compile it using the mkdfa.pl script.

1. Create a .voca File

The .voca file defines your words and their phonetic pronunciations. Each line starts with the word to be recognized, followed by its phonemes (based on your acoustic model).

Format:

% CATEGORY_NAME
word1 phoneme1 phoneme2 ...
word2 phoneme1 phoneme2 ...

Example (custom.voca ):

% COMMANDS
HELLO hh ax l ow
WORLD w er ld
STOP s t aa p

2. Compile with mkdfa.pl

Use the mkdfa.pl script (included with Julius) to compile your .voca file (and its corresponding .grammar file) into the .dict format Julius uses:

mkdfa.pl custom

This command generates custom.dfa and custom.dict files."

How do I handle multi-word phrases in Julius?

" To handle multi-word phrases in Julius , you need to define them in a grammar file (.grammar ) and link it with your vocabulary file (.voca ). The mkdfa.pl script then compiles these into a deterministic finite automaton (.dfa ) that recognizes the exact phrase sequences.

1. Create a .grammar File

Define the sentence structure using a BNF-like syntax. Use category names (in uppercase) from your .voca file to build phrases.

Example (command.grammar ):

S : NS_B COMMAND NS_E
COMMAND : TURN_ON_LIGHT
COMMAND : TURN_OFF_LIGHT

Here, S is the start symbol, NS_B and NS_E represent the beginning and end of a sentence, and TURN_ON_LIGHT is a category name for a multi-word phrase.

2. Define Words in the .voca File

Map the category names to actual words and their phonemes.

Example (command.voca ):

% TURN_ON_LIGHT
turn on the light t er n ao n dh ax l ay t
% TURN_OFF_LIGHT
turn off the light t er n ao f dh ax l ay t
% NS_B
[s] silB
% NS_E
[/s] silE

3. Compile with mkdfa.pl

Run the script to generate the .dfa and .dict files:

mkdfa.pl command

This creates command.dfa and command.dict , which you then use when starting Julius."

How does Julius speech recognition compare to newer models?

" is an older, compact open-source speech recognition engine developed with older techniques , meaning its model quality and noise robustness —particularly in noisy environments—are inferior to modern speech recognition engines . While Julius is lightweight, runs locally on CPU only, and offers privacy by not requiring internet access, its accuracy generally lags behind contemporary solutions.

Key comparisons include:

  • Accuracy and Robustness : Modern engines like Google STT or local models such as Whisper (via Python) provide significantly higher accuracy and better performance in challenging acoustic conditions. Julius’s accuracy is heavily dependent on the availability and quality of specific language models, with Japanese continuous dictation models being the most robust due to the project's origins in Japan.

  • Technology : Julius relies on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) within a two-pass decoding framework. In contrast, many modern systems utilize end-to-end deep learning architectures that have seen substantial advancements in accuracy and efficiency over the last decade.

  • Use Case : Julius remains suitable for embedded systems , low-resource devices (like Raspberry Pi), or scenarios where offline operation and low memory usage are prioritized over state-of-the-art accuracy. For general-purpose or high-accuracy applications, integrating cloud-based STT or modern local models is recommended.

AI-generated answer. Please verify critical facts."