Newsletter

Text-to-speech in publisher apps has shifted from a nice-to-have to a habit-builder

In-app audio is evolving from a fringe experiment into a core publisher tool – helping news apps boost engagement, build daily listening habits and extend the reach of journalism without the overhead of traditional audio production.

4th March 2026

Text-to-speech is enabling publishers to leverage traditional text content in a new, engaging, format

For years, audio in publishing was almost synonymous with podcasts, a resource-intensive format that was mostly consumed off platform. Occasionally, a few publishers experimented with narrated articles or long reads voiced by journalists and actors, but these remained niche. Fast forward to today and AI-driven text-to-speech is reshaping that landscape. The technology has evolved rapidly, offering a scalable way to transform written journalism into a fluid, mobile listening experience. It allows readers to stay with a story, even when they can’t keep their eyes on the screen and gives publishers a new dimension of engagement with low operational weight.

In today’s edition of the Pugpig Media Bulletin, we will examine how forward-thinking publishers are integrating text-to-speech into their apps, what’s powering this increase in audio and how newsrooms can maximise its value as part of a modern content strategy.

Why audio is now on everyone’s roadmap

News publishers are experimenting with new media formats as the old growth engine, driven by search and social referrals, weakens amid the rise of “zero-click” experiences and AI summaries. A couple of weeks ago, we recapped FT Strategies’ News in the Digital Age event in London, where speakers repeatedly connected the shift toward formats like audio to these changes in discovery and distribution, noting that the “traditional middle” of commodity news is becoming harder to sustain.

Our 2025 Media App Report demonstrated that building a deeper connection with audiences increasingly depends on using the full capabilities of mobile devices. Apps that offer richer content formats consistently deliver stronger engagement with more visits, longer sessions and higher minutes per user each month. Audio is only one part of that mix, but when we looked specifically at apps that include audio, users who engage with it spend nearly twice as much time in the app as those who don’t.

Audience expectations are shifting, too. As more publishers invest in audio and video, users are getting used to apps as multi-format environments rather than text-only experiences. For many, text-to-speech is increasingly viewed as table stakes for a dynamic mobile experience, another way to earn attention and keep it.

Turning articles into audio is a low-cost way to increase engagement and habit

Over the past year, using AI-powered text-to-speech technology to translate article content into audio has become an increasingly important component in publishers’ apps. The reason is simple. Readers want to stay close to journalism at moments when reading is awkward or impossible.

Text-to-speech lets publishers repurpose the content they already produce into an engaging audio experience at relatively low cost. It extends the utility and reach of existing reporting and it opens up new times in the day, when screens aren’t practical but habit-building is still possible. It also provides a fast way to trial audio-led experiences and learn what resonates, before deciding where human narration or richer production is worth investment. By expanding what a mobile app is usable for, publishers can compete for attention in moments they previously struggled to reach.

From a revenue perspective, the first goal is often to increase time spent and frequency of use to support existing subscription and advertising models. Text-to-speech tends to work best as part of a broader value proposition where retention and engagement feature within existing subscriptions, or a differentiator in premium tiers and membership offers when paired with other benefits like offline access, extra newsletters or events.

How publishers are building listening flows for commutes, catch-ups and long reads

Publishers increasingly see the opportunity to build around moments and formats where reading can require significant effort, like long-form explainers or features. It also works well for “catch-up” flows in the morning or evening that let users stay on top of breaking news while commuting, as well as topic-led listening in areas like sport, business, or culture, where audiences are happy for the app to keep playing related stories. Publishers are increasingly leaning into “listen to all” experiences built around a section or edition, alongside thematic playlists. We’re also seeing more hybrid audio editions that combine human-narrated flagship content, such as podcasts, columns and cover stories, with text-to-speech from a wider set of articles.

When apps package this into something more finite and intentional, like an audio edition or a curated playlist, text-to-speech starts to shift from an add-on to a core product feature. At that point, it’s less about scattered one-off plays and more about a structured, time-boxed listening routine that people can return to every day.

In news apps, text-to-speech is often prioritised for live news, explainers and analysis, where “listen while I do something else” moments are most common. The Independent, for instance, lets readers listen to “5 things you need to know today” from the top of the home screen in its app, while The New York Times app features a Listen tab that blends a daily curated playlist and NYT podcasts with text-to-speech audio articles.

Consumer magazines and business titles are often drawn to text-to-speech for long-form commentary and features with pieces that define the brand but can be hardest to fit into busy lives. Both the New Scientist and The Economist allow readers to queue an entire issue and listen to it as a series of audio tracks, preserving the habit-building feel of a regular edition while making it easier to consume on the move.

Publishers are mitigating risk with labelling, voices and pronunciation controls

Despite all the interest and clear benefits, text-to-speech isn’t without its challenges and there are several concerns we tend to come across in our conversations with publishers. In an environment where AI is already a source of anxiety for newsrooms, some worry about how synthetic voices will be perceived, especially in sensitive topics. As a result, a spectrum of approaches is emerging, from very explicit labelling, of which the NY Times is a notable advocate, to lighter-touch iconography or brief help text. Underneath that sits a concern that anything blurring the line between human reporting and machine output risks weakening trust that may have taken years to build.

Quality and tone are another major risk. Mispronounced names, places or acronyms can undermine credibility quickly whilst tonal missteps on stories involving conflict, tragedy or politics can be just as damaging.

Some of the mitigation is technical and requires teams to correct pronunciations, maintain custom dictionaries or assign different voices to different content types. But much of the perceived risk is editorial and reputational, particularly important in an era where many studies, including from the Reuters Institute, are seeing trust in news decline.

Even as text-to-speech becomes more affordable, it isn’t free, either in direct costs or in operational overhead. Publishers need to consider how much audio they need to generate to make this worthwhile, and if they should roll it out across everything or focus on specific sections and formats. Product teams need to prove text-to-speech is adding value rather than simply cannibalising reading time.

Publishers should generate a clear narrative for how text-to-speech contributes to habit, retention and ARPU. That’s pushing more product and audience teams to treat it not just as a feature, but as an experiment that needs proper measurement.

What’s next

Text-to-speech quality has advanced rapidly over the past year, narrowing the gap with human narration, but expectations are rising just as fast. Publishers are demanding more natural pacing, greater tonal control and consistent pronunciation across key names, places and brand vocabulary .The next wave of providers is already responding.

As audio becomes more embedded in news and magazine apps, discovery will define success as much as sound. Even the best listening experiences will underperform if users don’t encounter them at the right moments, so smart design, proactive onboarding and well-timed prompts will be crucial to turning casual listening into routine use.

The publishers pulling ahead will be those treating text-to-speech not as a bolt-on feature but as a product in its own right where there are clear goals, defined user behaviours and robust measurement. Done well, it can deepen engagement without alienating readers who prefer to consume in other ways.

In 2026, the differentiator won’t be who offers text-to-speech, but who turns it into a daily habit that audiences depend on.

Industry News

Here are some of the most important headlines about the business of news and publishing as well as strategies and tactics product, audience growth and newsroom strategy.