Codebooks for qualitative research

A codebook doesn’t include the extracts of data themselves, but a detailed description of the codes, how they should be used, their relationship to each other, what should be included and excluded in each code.


Many people undertaking qualitative analysis will use some form of coding to help explore and categorise their data. Often, the researcher will use hundreds of codes to do this, and the list of codes, themes or topics that are used to analyse the data is called the coding framework, and a codebook describes them. A codebook doesn’t include the extracts of data themselves, but a detailed description of the codes, how they should be used, their relationship to each other, what should be included and excluded in each code.

The term codebook actually comes from quantitative statistics, where a codebook is used to keep metadata such as variable names, valid ranges, data types etc. In qualitative coding, the codebook has a similar function: collecting useful meta-information about the codes that is more than the code name itself. Often the concept of codebooks is connected to framework analysis, or analytical techniques where most of the codes are decided in advance. However, it’s just as helpful for grounded theory or emergent analysis, although here it will be constantly updated with new codes and themes as they are identified in the data.

It may feel that creating a formal codebook isn’t necessary when there is just one person doing the analysis, as there isn’t an immediate need to communicate what the codes are with someone else. After all, I know what I mean, right? However, there is often a need for self-communication – a note to your self that acts as an aide-mémoire to how a code should be used, through a dynamic process that might take months. As you work through your data, often the exact meaning of a code or theme often evolves, as different people have different perspectives on the same thing. A code that sounds simple like Anger may start off being easy to understand, but as you hear from more people you may realise what seems like ‘Anger’ is really better theorised as ‘Emotional Outlets’, ‘Injustice’, ‘Powerlessness’ or even ‘Rage’.

The subtleties of how a code is defined can be crucial to correctly interpreting the data, so while it can be convenient to just use a single word to define a theme, you might change that word over time, and have a more detailed explanation for what that code means. In Quirkos this becomes the ‘Description’ you can write for each code (or Quirk), a much longer definition that appears when you hover your mouse over a code or Quirk (although all qualitative software have something similar). A description may also note what type of code it is (for example deductive/emergent, axial, thematic or linguistic), especially if you use more than one coding approach, or have several stages.

A code book is also useful for communicating with your future-self – the poor person who actually has to write up the data, describe how it was analysed, and possibly debate the coding with supervisors or journal reviewers. If your codebook contains a history of how codes evolved, guidelines for what to code into them, and detailed descriptions of what they mean, it makes the writing up process a lot easier. It’s tempting to consider that the coded data itself (ie the quotes/highlights) are the main thing to draw on when writing up findings, but justifying the way it was structured and organised is often just as important. Often the meaning of themes and codes will shift as more data is coded, and the structure that a coding framework imposes on a data set fundamentally shapes how it is interpreted and the conclusions drawn from it.

One good way to document how codes should be used is to write coding examples, or use-case rules for each code. This might be a literal example: ‘I was really angry with them’ or a set of descriptive guidelines:
‘Use for anger, rage, other violent emotions, when felt by the participant, about other people or circumstances. Do not include instances of ‘Angry with self’ - use the code ‘Personal Frustrations’. Do not use when the person is talking about someone else being angry – use ‘Other people’s feelings’.

Now it might seem really daunting to have this level of detail about your coding framework when you start, but it’s usually something that will develop as you progress through coding, especially when taking an iterative cyclical process. However, thinking though exactly how a code should (and shouldn’t) be used can help you work through what other codes might be needed, and how each is conceptualised to link back to answering the research question.

Another key part of a codebook is the relationship between codes. For example, you may have sub-categories (and sub-sub-categories), and you can show these hierarchies and relationships in the codes. Alternatively you may have a ‘flat’ structure, where there are no codes under each other. A codebook might also show the connections between codes, even when non-hierarchical (for example that Anger was often coded together with Frustration). It may also show a diagram of how codes developed into themes, and snapshots through the process, important when using iterative approaches like open and axial coding. It might also have ‘origin’ details, either who created the code (if working as part of a group) or which source of data it was first created for.

And of course, if you are using a collaborative or team based approach to coding, codebooks are a really, really good idea. Regardless of whether people are coding together, each taking different sources, or just reviewing or appraising coding at the end, every person involved needs to know exactly what codes to use, and when. A codebook is invaluable to doing this, and while easy to integrate in a software tool (like Quirkos), you should also have one that you can share when working manually – this might be printed out, or a shared document somewhere. You will also need to consider whether everyone can update and modify the codebook, or if it will be ‘set in stone’ at the beginning of the analysis, having been decided on by the whole team.

There’s also another crucial use for codebooks, and that’s when archiving, reviewing or allowing secondary analysis of your data. If a thesis or paper reviewer wants to see how you have coded qualitative data, the codebook should be a clear way to document that. And when archiving data (increasingly a requirement for publicly funded research), the codebook would be key to someone else being able to understand your coding and interpretation.

There are a couple of articles that can guide the creation of codebooks, and go into more detail. For example DeCuit-Gunby et al. (2011) and Sage Methods Datasets (2019). However, it’s one of those classic things that everyone seems to use, but few people bother formally discussing. It’s not mentioned in any of my go-to textbooks on general qualitative research (although of course it’s in the wonderful Coding Manual for Qualitative Research by Johnny Saldana (2012)).

One of the great reasons to use qualitative analysis software like Quirkos is that it will allow you to manage and update your codebook as you go along, and export it at any time. Quirkos aims to make qualitative analysis more accessible, by making software that is affordable, simple to learn and visual. Download a free trial and see for yourself!