The importance of the new qualitative data exchange standard

Last week, a group of software developers from ATLAS.ti, f4analyse, Nvivo (QSR), Transana, QDA Miner (Provalis) and Quirkos were in Montreal for the third international meeting on the creation of a common file format for exchanging qualitative data projects

The importance of the new qualitative data exchange standard

Last week, a group of software developers from ATLAS.ti, f4analyse, Nvivo (QSR), Transana, QDA Miner (Provalis) and Quirkos were in Montreal for the third international meeting on the creation of a common file format for exchanging qualitative data projects. The initiative is also supported by Dedoose and MAXQDA, which means that all the major qualitative data analysis software (QDAS) providers have agreed to support a standard that will allow researchers to bring data across any existing QDAS platform.

This work has been almost two years in the making already, and so far the first part of the standard was announced last week – a ‘codebook’ exchange file, which lets users share their coding framework, i.e. the list of codes/nodes/themes/Quirks that you use in your project. This is already pretty useful if you have developed a long, or standardised coding framework for analysis, and want to use it in another project in a different qualitative analysis software package.

However, this is really the tip of the iceberg. It is hoped that by early next year, the full standard will be complete and released, allowing for much more complete projects (including text and multimedia sources and coding) to be exchanged between whatever software package you like. Although the official page: qdasoftware.org  (currently redirecting to here https://web.ato.uqam.ca/developpements/formats_echange/QDAS-XML) lists more technical details of the aim and format of the exchange initiative, it’s a necessarily technical. I’d like here to briefly discuss why I think this is the most important piece of news in the last 20 years for qualitative research.

Analysing and coding qualitative data is extremely time consuming, even when using software to help. It can also be mentally and emotionally draining, and the idea of having to redo this work is impossible for most researchers to swallow: it would be like trying to rewrite a novel from scratch – for many large qualitative projects, it is probably a similar amount of work.

And until now, there were very few options to move this project from one piece of software for another. Imagine after writing your novel in Word if you couldn’t share it with the public, or even your editor because they were using a different software package? While some QDAS allow limited import and export of certain features from certain other packages, this can be tortuous or usually impossible. For example, MAXQDA seems to currently be able to import projects from NVivo 8 or 9 (but not the more recent versions 10, 11 or 12) and only by installing MS SQL Server 2008, and only on Windows. You can’t save stuff back again, and every time there is a new version of the software, this procedure has to change again (or like this example, gets stuck in an older version), and your data might get trapped.

If you move to a different university that has a subscription to a different tool from where you worked or studied before you can’t access your data. You can’t work with someone who has a licence for a different qualitative software package, because you probably can’t share your data projects. In the past this has limited cross-institutional research projects I’ve been part of. And if you’ve done most of your work in one package, but want to use one cool feature in another one, you are out of luck.

Qualitative analysis software is expensive, and the university departments which buy them only let you have one at a time. And woe betide you if someone high-up decides they aren’t buying, say, Atlas.ti anymore, you all have to use MAXQDA. All your previous work is probably inaccessible or can only be restored by using painstaking procedures of recollecting and redoing all you had done in your former software.

And even if you finished that previous project, the richness of qualitative data means that there are often many different things that could be read from the same set of sources. For example, a project that interviewed people about job prospects and training might also have interesting data about people’s self-esteem and identity through their career. The current situation where data is trapped in a single, proprietary format really limits potential for revisiting analysis again in the future.

So that’s the internal problem for qualitative researchers. But the impacts to wider society are far greater.

In theory, when writing a research article for publication, the editor or reviewers can ask to examine any of the data for the project, checking for bias or errors in statistical interpretation. But for qualitative research this is made much more difficult due to the large numbers of different formats that data might be in. I feel this is has led to some of the accusations of bias and lack of replicability in qualitative research. It’s really hard to see someone’s analysis process, even if they are reviewing your article for publication – the fundamental basis of trust in science publications.

This links into problems with data archiving. Making an anonymised version of your data publically available is increasingly a requirement with publicly funded research. Some of this is possible since the raw data will likely be transcripts in a common text format. But the working out, the coding and details of your analysis and conclusions may be in for example a .nvp (NVivo project file) or similar. And if you don’t have that exact version of the software or work on a Mac, you can’t open that file. Again, the rapid changing of these file formats does not create much future-proofing – in 10 years from now there may be no software that can open your old project.

This means that data archives of qualitative data are currently of limited use, since they don’t have coded data, or it is shared in a proprietary format that most people can’t open. There is no free ‘reader’ app for most of these proprietary project files.


So why has this happened, and taken so long to fix?

Firstly, there are commercial arguments – it seems to make business sense to lock users to a particular software package, as you make it less likely for them to change to a rival software package. I’m not sure how big a consideration this actually is, but it’s a common practice across many industries. Personally, I am always surprised by the fantastic level of comradery between the ‘rival’ software developers in the meetings about creating the exchange format – we are all here for the users (many are qualitative researchers themselves).

Secondly, it is very hard to develop these open standards, and this was not the first attempt - For example the UK DExT format. There have been several such proposals and specifications previously published, but none of them have attracted support from more than one developer. Getting that cross-developer support is obviously crucial to getting adoption, otherwise you add new complexity and uncertainty to the field:


https://xkcd.com/927/

And this is why I think this QDA-XML exchange format is going to succeed. A great and independent committee, led by Jeanine Evers from KWALON  and Erasmus University Rotterdam have managed to get signed commitments of support from all the major qualitative software developers, and nearly all of them have been working on the standard for the last two years.

There is likely to be good support since decisions made about the format have been negotiated (often at great length) between all the contributing members. Participants in the meetings have a good idea of what their software can and can’t do, and the best way to implement it. It has been an often painful process of compromise for this first version, as many software packages have unique features.

So that is the one caveat – this format will not be 100% comprehensive. A particular pretty output graph you crafted in one software package can’t be shown in another in the same way, as certain ways of working which are unique to a software will be lost in translation.

But, I think that for most users the format will allow them to transfer and preserve 90% of their work, and certainly all the basics; codes and coding, sources and metadata, groupings and categories, notes and memos. These things won’t look exactly the same in all packages (for example Quirkos supports 16 million colours for codes, some don’t support colours at all). However, the important parts of your data and analysis will come through, allowing for greater flexibility and opportunities for sharing, archiving and secondary analysis. To me, this opens the door to a fundamentally better understanding of the world.

An open, liberally licenced (MIT) standard means that anyone can support it, so it is not limited to the current developers, it is very much a future looking initiative. While I suspect it will still be some time in 2019 until this full support appears in releases of your favourite qualitative analysis software (CAQDAS), I think the promise of an open standard is nearer to being delivered than ever before, and that it will fundamentally change for the better the world of qualitative research.