Unraveling speech-to-text transcription processes in children with and without reading and writing difficulties


  • Sanna Kraft Linnæus University, Sweden
  • Åsa Wengelin Gothenburg university, Sweden
  • Vibeke Rønneberg University of Stavanger, Norway
  • John Rack Linnæus univeristy, Sweden
  • Fredrik Thurfjell Habiliteringens resurscenter, Stockholm


Children facing reading and writing difficulties encounter obstacles in achieving fluent transcription due to spelling difficulties (Beers et al., 2017). This lack of fluency can impede their formulation processes and negatively impact the final text (Sumner et al., 2013). To address this, speech-to-text (STT) technology has been proposed as a potential solution, bypassing the spelling process. However, there exists a risk that other factors may hinder the production process instead.

In our study, we investigated transcription and error correction processes in 28 Swedish 10–13-year-olds, both with and without reading and writing difficulties, using STT for writing. We examined the influence of individual abilities—working memory, spelling, decoding, and general STT skill—on various text production processes: burst length (words dictated in one go), burst accuracy, and overall production rate (text length/time on task). We used linear mixed-effects regression analysis to investigate whether the independent variables predicted text production processes.

Our findings revealed that production rate was influenced by working-memory capacity, burst length, and burst accuracy. Interestingly, burst accuracy was solely predicted by general STT skill, not by any other individual ability. We also identified two effective transcription strategies: dictating more than one word at a time and combining STT with keyboard use.

The results underscore that producing text using STT is a cognitively intricate process, placing substantial demands on working memory. Moreover, STT skill (the combined effect of technical capabilities of the tool and the participant’s output) plays a pivotal role in achieving fluent transcription without unnecessary interruptions. Pedagogical implications will be discussed.


Metrics Loading ...


