archive secondary data


Last month, Quirkos was invited to a one day workshop in New York on archiving qualitative data. The event was hosted by Syracuse University, and you can read a short summary of the event here. This links neatly into the KWALON led initiative to create a common standard for interchange of coded data between qualitative software packages.

The eventual aim is to develop a standardised file format for qualitative data, which not only allows use of data on any qualitative analysis software, but also for coded qualitative data sets to be available in online archives for researchers to access and explore. There are several such initiatives around the world, for example QDR in the United States, and the UK Data Archive in the United Kingdom. Increasingly, research funders are requiring that data from research is deposited in such public archives.

A qualitative archive should be a dynamic and accessible resource. It is of little use creating something like the warehouse at the end of ‘Raiders of the Lost Ark’: a giant safehouse of boxed up data which is difficult to get to. Just like the UK Data Archive it must be searchable, easy to download and display, and have enough detail and metadata to make sure data is discoverable and properly catalogued. While there is a direct benefit to having data from previous projects archived, the real value comes from the reuse of that data: to answer new questions or provide historical context. To do this, we need a change in practice from researchers, participants and funding bodies.

In some disciplines, secondary data analysis of archival data is common place, think for example of history research that looks at letters and news from the Jacobean period (eg Coast 2014). The ‘Digital Humanities’ movement in academia is a cross-dicipline look at how digital archives of data (often qualitative and text based) can be better utilised by researchers. However, there is a missed opportunity here, as many researchers in digital humanities don’t use standard qualitative analysis software, preferring instead their own bespoke solutions. Most of the time this is because the archived data is in such a variety of different formats and formatting.

However, there is a real benefit to making not just historical data like letters and newspapers, but contemporary qualitative research projects (of the typical semi-structured interview / survey / focus-group type) openly available. First, it allows others to examine the full dataset, and check or challenge conclusions drawn from the data. This allows for a higher level of rigour in research - a benefit not just restricted to qualitative data, but which can help challenge the view that qualitative research is subjective, by making the full data and analysis process available to everyone.


Second, there is huge potential for secondary analysis of qualitative data. Researchers from different parts of the country working on similar themes can examine other data sources to compare and contrast differences in social trends regionally or historically. Data that was collected for a particular research question may also have valuable insights about other issues, something especially applicable to rich qualitative data. Asking people about healthy eating for example may also record answers which cover related topics like attitudes to health fads in the media, or diet and exercise regimes.


At the QDR meeting last month, it was postulated that the generation of new data might become a rare activity for researchers: it is expensive, time consuming, and often the questions can be answered by examining existing data. With research funding facing an increasing squeeze, many funding bodies are taking the opinion that they get better value from grants when the research has impact beyond one project, when the data can be reused again and again.


There is still an element of elitism about generating new data in research – it is an important career pathway, and the prestige given to PIs running their own large research projects are not shared with those doing ‘desk-based’ secondary analysis of someone else’s data. However, these attitudes are mostly irrational and institutionally driven: they need to change. Those undertaking new data collection will increasingly need to make sure that they design research projects to maximise secondary analysis of their data, by providing good documentation on the collection process, research questions and detailed metadata of the sources.


The UK now has a very popular research grant programme specifically to fund researchers to do secondary data analysis. Loiuse Corti told the group that despite uptake being slow in the first year, the call has become very competitive (like the rest of grant funding). These initiatives will really help raise the profile and acceptability of doing secondary analysis, and make sure that the most value is being gained from existing data.

However, making qualitative data available for secondary analysis does not necessarily mean that it is publicly available. Consent and ethics may not allow for the release of the complete data, making it difficult to archive. While researchers should increasingly start planning research projects and consent forms to make data archivable and shareable, it is not always appropriate. Sometimes, despite attempts to anonymise qualitative data, the detail of life stories and circumstances that participants share can make them identifiable. However, the UK Data Archive has a sort of ‘vetting’ scheme, where more sensitive datasets can only be accessed by verified researchers who have signed appropriate confidentiality agreements. There are many levels of access so that the maximum amount of data can be made available, with appropriate safeguards to protect participants.

I also think that it is a fallacy to claim that participants wouldn’t want their data to be shared with other researchers – in fact I think many respondents assume this will happen. If they are going to give their time to take part in a research project, they expect this to have maximum value and impact, many would be outraged to think of it sitting on the shelf unused for many years.

But these data archives still don’t have a good way to share coded qualitative data or the project files generated by qualitative software. Again, there was much discussion on this at the meeting, and Louise Corti gave a talk about her attempts to produce a standard for qualitative data (QuDex). But such a format can become very complicated, if we wish to support and share all the detailed aspects of qualitative research projects. These include not just multimedia sources and coding, but date and metadata for sources, information about the researchers and research questions, and possibly even the researcher’s journals and notes, which could be argued to be part of the research itself.


The QuDex format is extensive, but complicated to implement – especially as it requires researchers to spend a lot of time entering metadata to be worthwhile. Really, this requires a behaviour change for researchers: they need to record and document more aspects of their research projects. However, in some ways the standard was not comprehensive enough – it could not capture the different ways that notes and memos were recorded in Atlas.ti for example. This standard has yet to gain traction as there was little support from the qualitative software developers (for commercial and other reasons). However, the software landscape and attitudes to open data are changing, and major agreements from qualitative software developers (including Quirkos) mean that work is underway to create a standard that should eventually create allow not just for the interchange of coded qualitative data, but hopefully for easy archival storage as well.

So the future of archived qualitative archive requires behaviour change from researchers, ethics boards, participants, funding bodies and the designers of qualitative analysis software. However, I would say that these changes, while not necessarily fast enough, are already in motion, and the word is moving towards a more open world for qualitative (and quantitative) research.

For things to be aware of when considering using secondary sources of qualitative data and social media, read this post. And while you are at it, why not give Quirkos a try and see if it would be a good fit for your next qualitative research project – be it primary or secondary data. There is a free trial of the full version to download, with no registration or obligations. Quirkos is the simplest and most visual software tool for qualitative research, so see if it is right for you today!



Tags : archivequalitativedataresearchsecondary