diy qualitative transcription

In a previous blog article I talked about some of the practicalities and costs involved in using a professional transcribing service to turn your beautifully recorded qualitative interviews and focus groups into text data ready for analysis. However, hiring a transcriber is expensive, and is often beyond the means of most post-graduate researchers.


There are also serious advantages to doing the transcription yourself that make a better end result and get you much closer to your data. In this article I’m going to go through some practical tips that should make doing transcription a little less painful.


But first, a little more on the benefits of transcribing your own data. If you were there in the room with the respondent, you asked the questions, and were watching and listening to the participant. Do the transcription soon after the interview and you are likely to remember words that might be muffled in the recording, points that the respondent emphasised by shaking their head – lots of little details to capture.


It’s important to remember that transcription is an interpretive act (Bailey 2008), you can’t just convert an interview into a perfect text version of that data. While this might be obvious when working between different languages where translation is required, I would argue that a transcriber always makes subjective decisions about misheard words, how to record pauses and inflictions, or unconsciously changes words or their order.


As I’ve mentioned before, you loose a lot of the nuance of an interview when moving to text, and the transcriber has to make choices about how to mitigate this: Was this hesitation or just pausing for breath? How should I indicate the participant banged on the table for emphasis? Capturing this non-verbal communication in a transcript can really change the interpretation in qualitative data, so I like it when this process is in the control of the researcher. For a lot more on these and other issues there is a review of the qualitative transcription literature by Davidson (2009).


What do I actually type?

In a word, everything: the questions, the answers, the hesitations and mumbles, and things that were communicated, but not said verbally.


First, some guidelines for what the transcription should look like, bearing in mind that there is no one standard. You can use a word processor, or a spreadsheet like Excel. It can be a little more difficult to get formatting right in a spreadsheet, for example you will need to use Shift+Return to make a new paragraph within a cell, and getting it to look right on a printed page is more of a challenge. Yet since interviews and especially focus groups will usually have more than one voice to assign text to, you need some way to structure the data.


In a spreadsheet you can use three columns: the first for an occasional time index (so you see where in the audio this section of text occurs), the second for name of voice, and the third widest one for text. While you can use a table to do the same thing in Word, spreadsheets will do auto-complete for your names, making things a bit faster. However, for just a one-on-one interview, it’s easy to just use a Q: / A: formatting for each respondent in a spreadsheet, and put periodic time stamps in brackets at the top of each page.


Second, record non-verbal data in a consistent way, usually in square brackets. For example [hesitates], [laughter], [bangs fist on table], or even when [coffee is delivered]. You may choose to use italics or bold type to show when someone puts emphasis on a word, but choose one or the other and be consistent.


Next, consider your system for indicating pauses. Usually a short pause is represented by three dots ‘…’ Anything longer is recorded in square brackets and roughly timed [5 second pause]. These pauses can show hesitation in the participant to answer a difficult question, and long pauses may have special meaning. There is actually a whole article on the importance of silences by Poland and Pederson (1998).


When you are transcribing, you also need to decide on the level of detail. Will you record every Um, Er, and stutter? In verbal speech these are surprisingly common. Most qualitative research does want this level of detail, but it is obviously more time consuming to type. You’ll often have corrections in the speech as well, commonly “I’ve… I’ll never say that ag... any more”. Do you include the first self correction? It’s clear in the audio the participant was going to say ‘again’ but changed themselves to ‘any more’ - should I record this? Decide on the level of detail early on, and be consistent.


Sometimes people can go completely off topic, or there will be a section in the audio where you were complaining about the traffic, ordering coffee, or a phone call interrupted things. If you decide it’s not relevant to capture, just indicate with time markings what happened in square brackets: [cup smashed on the floor, 5min to clear up].


Once you are done with an interview, it’s a good idea to listen to it back, reading through the transcript and correcting any mistakes. The first few times you will be surprised at how often you swapped a few words, or got strange typos.



So how long will it all take?

Starting out with all this can be daunting, especially if you have a large number of interviews to transcribe. A good rule of thumb is that transcribing an interview verbatim will take between 3 and 6 times longer than the audio. So for an hour of recording, it could take as little as three hours, or as much as six to type up.


This sounds horrifying, and it is. I’m quite a fast typer, and have done quite a bit of transcription before, but I average between 3x and 4x the audio time. If you are slow at typing, need to pause the audio a lot, or have to put in a lot of extra descriptive detail it can take a lot longer. The tips below should help you get towards the 3x benchmark, but it’s worth planning out your time a little before you begin.


If you have twenty interviews each lasting on average 1 hour, you should probably plan for at least 60 hours of transcription time. You are looking at nearly 9 days or two weeks of work at a standard 9-5 work day. I don’t say this to frighten you, just to mentally acclimatise you to the task ahead!


It’s also worth noting that transcription is very intensive work. You will be frantically typing as fast as you can, and it requires extreme mental concentration to listen and type simultaneously, while also watching for errors and fixing typos. I don’t think most people could just do two or three hour sessions at a time without going a little crazy! So you need to plan in some breaks, or at least some different non-typing work.


If this sounds insurmountable, don’t panic. Just spread out the work, especially if you can do the transcripts after each interview, instead of in one huge batch. This is generally better since you can review one interview before you do the next one, giving you a chance to change how you ask questions and cover any gaps. Transcription can also be quite engrossing (since you can’t possibly do anything else at the same time), and it’s nice to see the hours ticking off.




So how can you make this faster?

You need to set up your computer (or laptop) to be a professional transcribing station, where you can hear the audio, start and stop it easily, and type comfortably for a long period of time.


Even if you type really fast, you won’t be able to keep up with the speed that people speak, meaning you will have to frequently start and stop the audio to catch up. Most professionals will use a ‘foot-pedal’ to do this, so that they don’t have to stop typing, come out of the word processing software and pause an audio player. Even if you are playing audio from a dictaphone next to you, going away from the keyboard, stopping and starting the buttons on the dictaphone and coming back to type again quickly becomes tedious.


A foot-pedal lets you start and stop the audio by tapping with your foot (or toe) and often has additional buttons to rewind a little (very useful) or fast-forward through the audio. Now, these cost around £30/$40 or more, but can be a worthwhile investment. However, it’s also worth checking to see if you can borrow one from a colleague, or even if your department or library has one for hire.


But if you are a cheapskate like me, there are other ways to do this. Did you know that you can have two or more keyboards attached to a computer, and they will both work? An extra keyboard (with a USB connector) can cost as little as £10/$15 if you don’t already have a spare lying around, and can be plugged into a laptop as well. Put it on the floor, and you can set up one of the keys as a ‘global shortcut’ in an audio player like VLC. Here’s a forum detailing how to set up a certain key so that it will start and stop the audio even if you are typing in another programme. Put your second keyboard on the floor, and tap your chosen key with your toe to start and stop! Even if you only use one keyboard, you can set a shortcut in VCL (for example Alt+1), and every time you press that combination it will play or pause the audio, even if VLC player is hidden.


There’s another advantage to using VLC: it can slow down your recordings as they are played back! Once your audio is playing, click on the Playback menu item, then Speed. Change to Slower, and listen as your participants magically start talking like sleepy drunks! This helps me more than anything, because I can slow down the speech to a level that means I can type constantly without getting behind. This method does warp the speech, and having the setting too high can make it difficult to understand. However, the less you have to pause and stop the audio to catch up with your typing, the faster your transcription will go.


You can also do this with audio software like Audacity. Here, import your audio file, and click on Effect, and Change Tempo. Drag the slider to the left to slow down the speech (try 20% – 50%) without changing the ‘pitch’ so everyone doesn’t end up sounding like Barry White. You can then save the file with your desired speed, and the quality can be a little better than the live speed changes in VLC.


General tips for good typing can help too. Watch the screen as you type, not your fingers, so that you can quickly pick up on mistakes. Learn to use all your fingers to type, don’t just ‘hunt and peck’ - a quick typing tutorial might save you hours in the long run if you don’t do this already.


Last of all, consider your posture. I’m serious! If you are going to be hunched up and typing for days and days, bad posture is going to make you ache and get stressed. Make sure your desk and chair are the right height for you, try using a proper keyboard if working from a laptop (or at least prop up the laptop to a good angle). Make sure the lighting is good, there is no screen glare, and use a foot rest if this helps the position of your back. Scrunched up on a sofa with a laptop in your lap for 60 hours is a great way to get cramp, back-ache and RSI. Try and take a break at least every half an hour: get up and stretch, especially your hands and arms.


So, you have your beautiful and detailed transcripts? Now you can bring them into Quirkos to analyse them! Quirkos is ideal for students doing their first qualitative analysis project, as it makes coding and analysis of text visual, colourful and easy to learn. There’s a free trial on our website, and you can bring in data from lots of different sources to work with.


Tags : qualitativetranscriptionaudiofocusgroupsinterviews