Comparing the best automated transcription services for qualitative research

Comparing the best automated transcription services for qualitative research

In the last couple of years, several qualitative analysis software providers have added extra add-ons for automatic transcription of qualitative data. But are they worth the extra money, and how do they compare with more mainstream options? And what about free alternatives?

We did a test of three different transcription services from CAQDAS providers, Nvivo, MAXQDA and our own, Quirkos. We also tested the same interview transcript on otter.ai, one of the most popular general transcription services.

Now, I have to admit, the audio file we used for the testing is the most awful recording I have in more than 10 years of recording interviews for qualitative research.

It's in a really noisy cafe, with music in the background. I left my dictaphone at home, so it's recorded on my phone. The interviewee has a very soft voice, and a very thick Scottish accent. We were running late, so ended up having lunch – there's noise of eating and cutlery, and the server asking 'Do you want sauce with that?'. When I had this professionally transcribed many years ago, I had to pay double because the quality of the recording was so bad.

But those are all things that make it a good test – they are often realities about qualitative research: sometimes you need to do interviews in places that make the participant feel comfortable, and places where you can't always control the sound. And you shouldn't have to exclude people from research because they have accents!

So last week, I fed the same audio file to all the services (after trimming it – NVivo only gives you 15 minutes on the trial, and you'll soon see why...). I timed how long it took each service, compared the un-edited output, and also managed to find out what each service costs – something that MAXQDA and NVivo make it quite hard to find! So let's look at the transcription results. I've literally just taken the first sentence of the interview, starting with the 'definitive' results from the professional (human) transcriber:

Professional:
In the 70’s actually when I lived up in the highlands but kind of a bit lapsed since then. Once you have children, get married and have a family and whatever, other priorities take over a bit.

NVivo:
Actually, when I lived up in the Highlands, part of it was insane. Once you showed the bodies of Obama and the other color of just take over of it.

NVivo's transcription struggled the most with the noise and accents, and the output was nonsensical and read like a political conspiracy theory!

otter.AI:
Someone who's actually lived up in the highlands. Been a bit lost since then once you have children that know the love bombing, other priorities take over of it.

otter.AI fared a bit better, but still contained some bizarre nonsense about children who love bombing (or love-bombing?).

MAXQDA:
70s, actually, when I lived up in the Highlands, but kind of a bit lax. Since then, once you have children, married, never family and other priorities take over a bit.

This one isn't bad – it has 'never family' instead of 'have a family' and for some reason has missed the first few words (as the others did too). In fact, this is interesting, because if you trawl through the privacy conditions of the MAXQDA and NVivo services, you'll see they both send the data to the same third party provider, Speechmatics. So why would MAXQDA do so much better than NVivo? Well, in Speechmatics' own documentation you'll note that they have different levels of accuracy available, including 'Standard' and 'Enhanced'. The prices are different too, (at least for small customers) from $0.30/hr for 'lite' (where data is transcribed using the Standard model at a slower rate, and the data may be retained by Speechmatics to improve their services) to $1.04/hr for 'enhanced'. NVivo seems to be selling you one of the cheaper options, while MAXQDA isn't quite so stingy. However if this pricing is accurate, even MAXQDA's is passed to you at a whopping 9x markup compared to the 'enhanced accuracy' pricing.

Finally, let's look at Quirkos Transcribe:

Quirkos:
In the 70s actually, when I lived up in the Highlands, but kind of a bit lax, since then, once you have children, get married and have a family, where the other priorities take over a bit, actually.

Nearly perfect. The only differences are 'where the other priorities' instead of 'whatever priorities', and 'lax' for 'lapsed'. And it's also the only one that got the first few words of the interview for some reason.

So Quirkos Transcribe wins the accuracy benchmark on a noisy interview, but what about other aspects of the service? How does Quirkos compare for speed, cost and security?

QuirkosMAXQDANvivootter.ai
Encrypted transferCheck
No data shared with 3rd partiesCheck
Free minutes100601530
Cost per hour$0.24$9*$20*$5*
Time to transcribe one hour12min24min30min20min

First of all, Quirkos is the only provider that guarantees end-to-end encryption of the data, using your own computer to encrypt the data before it's sent. As we run our own server, with open-source but offline transcription software, it's also the only one that doesn't send the data anywhere else.

But it's also nearly half the turnaround speed of the next fastest (otter.ai) and nearly 3 times faster than the slowest and least accurate in this test (NVivo).

However, the cost may be the most significant difference, especially if you have many hours of interviews. On an hourly basis, $15 a month for 50 hours of transcription ($45 a quarter for 150 hours) subscription gives a cost of $0.24 an hour - and it's even less if you subscribe for a whole year. That's 20 times cheaper than the next nearest (otter.ai). That's why we've said that our service can change how you do qualitative research, as you don't have to worry about transcribing everything - including your own thoughts and whole days of ethnography.

MADQDA and Nvivo are a lot more expensive, and each requires you to buy 'blocks' of transcription time. For example with MAXQDA it's €80 for 10 hours of transcription, so if you had 12 hours of interviews you'd be paying a second lot of €80 just for 2 extra hours! Admittedly Quirkos Transcibe has a minimum 3 month subscription, so the least you can pay is $45 (€41), but that gives you 150 hours of transcription! It's half the outlay, for 15 times the transcription!

There are some limitations though: Quirkos doesn't offer automatic speaker identification at the moment (although we hope to add this soon) or allow you to code directly with the audio in your projects. And while we don't make any specific promises about supporting dozens of languages, so far our system is working well - we've had great feedback from users on how good it is in Swahili!

Quirkos also gives away the most transcription time for free: 100 minutes before you need to pay anything. We chose that number because most qualitative interviews are just over an hour, so this makes sure you can test with a full interview.

So what are you waiting for? Don't just take our word for it, give it a try today for free! You'll need a Quirkos Cloud account, so if you don't have one you can sign up for the free 14 day trial. Even on the trial, you have 100 minutes of free transcription time before you have to pay for anything!

Prices for other services were checked on 23 November 2023, and may have changed since then.