How to organise notes and memos in Quirkos

EraserGirl post-it-notes

 

Many people have asked how they can integrate notes or memos into their project in Quirkos. At the moment, there isn’t a dedicated memo feature in the current version of Quirkos (v1.0), but this is planned for a free upgrade later in the year.


However, there are actually two ways in which users can integrate notes and memos into their project already using methods that give a great deal of flexibility.


The first, and most obvious ‘workaround’ is to create a separate source for notes and memos. First, create a blank source by pressing the (+) button on the bottom right of the screen, and select ‘New Source’. In the source properties view (Top right) you can change the name of this to ‘Memos’ or ‘Thoughts’ or something appropriate. You can then edit this source by long-clicking or right clicking in the source and selecting ‘Edit Source Text’. Now you have a dialogue box into which you can keep track of all your thoughts or memos during the coding process, and keep coming back to add more.


The advantage to having your memo as a source is that you can code with it in exactly the same way you would with any of your other sources. So you can write a note ‘I’m not sure about grouping Fear and Anxiety as separate codes’ and actually drag and drop that text onto the Anxiety and Fear bubbles – assigning that section of your note as being about those categories. When running queries or reports, you can easily see your comments together with the coding for that source, or just look at all your notes together.


This approach is most useful if you want to record your thoughts on the coding process or on developing your analysis framework. You can also have a series of note sources – for example if you had several people coding on a project. Don’t forget that you can export a source as a Word file with all the annotations, should you want to print or share just your notes on a project. One further tip is to create a Yes/No source property called ‘Memo’ or ‘Note’ so you can record which source(s) contain memos. Then when running queries or reports you can quickly choose whether to include coded memos or not.


However, if you want to record specific notes about each source, the second method is to actually create a source property for comments and notes. So for example, you might want to record some details of the interview that might have contextual importance. You can create a source property for ‘Interview conditions’ and note things like ‘Noisy room’ and ‘Respondent scared by Dictaphone’. By changing this property to be multiple choice, you can record several notes here, which of course can be used again across all the sources. This would let you quickly mark which interviewees were nervous about being recorded, and even see if responses from these people differed in a query comparison view.


However, you can also have a source category for more general notes, and add as many values to this property as you like. At the moment you can have very long values for source properties, but more than the first few words will not be shown. We are going to change this in an update in the next few weeks that will allow you to view much longer notes stored as property values.


These two different approaches should allow you plenty of ways to record notes, memos and musings as you go through and analyse your project. They also give you a lot of ways to sort and explore those notes – useful once you get to the stage of having lots of them! In future releases we will add a specific memo feature which will allow you to also have the option to add a note to a specific coding event, and will be implemented in unique but intuitive way. Watch this space!

The dangers of data mining for text

 Alexandre Dulaunoy CC - flickr.com/photos/adulau/12528646393

There is an interesting new article out, which looks at some of the commonly used algorithms in data mining, and finds that they are generally not very accurate, or even reproducible.

 

Specifically, the study by Lancichinetti et al. (2015) looks at automated topic classification using the commonly used latent Dirichlet allocation algorithm (LDA), a machine learning process which uses a probabilistic approach to categorise and filter large groups of text. Essentially this is a common approach used in data mining.

 

But the Lancichinetti et al. (2015) article finds that, even using a well structured source of data, such as Wikipedia, the results are – to put it mildly, disappointing. Around 20% of the time, the results did not come back the same, and when looking at a more complex group of scientific articles, reliability was as low as 55%.

 

As the authors point out, there has been little attempt to test the accuracy and validity of these data mining approaches, but they caution that users should be cautious about relying on inferences using these methods. They then go-on to describe a method that produces much better levels of reliability, yet until now, most analysis would have had this unknown level of inaccuracy: even if the test had been re-run with the same data, there is a good chance the results would have been different!

 

This underlines one of the perils with statistical attempts to mine large amounts of text data automatically: it's too easy to do without really knowing what you are doing. There is still no reliable alternative to having a trained researcher and their brain (or even an average person off the street) reading through text and telling you what it is about. The forums I engage with are full of people asking how they can do qualitative analysis automatically, and if there is some software that will do all their transcription for them – but the realistic answer is nothing like this currently exists.

 

Data mining can be a powerful tool, but it is essentially all based on statistical probabilities, churned out by a computer that doesn't know what it is supposed to be looking at. Data mining is usually a process akin to giving your text to a large number of fairly dumb monkeys on typewriters. Sure, they'll get through the data quickly, but odds are most of it won't be much use! Like monkeys, computers don't have that much intuition, and can't guess what you might be interested in, or what parts are more emotionally important than others.

 

The closest we have come so far is probably a system like IBM's Watson computer, a natural language processing machine which requires a supercomputer with 2,880 CPU cores, 16 terabytes of ram (16,384GB), and is essentially doing the same thing – a really really large number of dumb monkeys, and a process that picks the best looking stats from a lot of numbers. If loads of really smart researchers programme it for months, it can then win a TV show like Jeopardy. But if you wanted to win Family Feud, you'd have to programme it again.

 

Now, a statisical overview can be a good place to start, but researchers need to understand what is going on, look at the results intelligently, and work out what parts of the output don't make sense. And to do this well, you still need to be familiar with some of the source material, and have a good grip on the topics, themes and likely outcomes. Since a human can't read and remember thousands of documents, I still think that for most cases, in-depth reading of a few dozen good sources probably gives better outcomes than statistically scan-reading thousands.

 

Algorithms will improve, as outlined above, and as computers get more powerful and data gets more plentiful, statistical inferences will improve. But until then, most users are better off with a computer as a tool to aid their thought process, not to provide a single statistic answer to a complicated question.

 

Analysing text using qualitative software

I'm really happy to see that the talks from the University of Surrey CAQDAS 2014 are now up online (that's 'Computer Assisted Qualitative Data Analysis Software' to you and me). It was a great conference about the current state of software for qualitative analysis, but for me the most interesting talks were from experienced software trainers, about how people actually were using packages in practice.

There were many important findings being shared, but for me one of the most striking was that people spend most of their time coding, and most of what they are coding is text.

In a small survey of CAQDAS users from a qualitative research network in Poland, Haratyk and Kordasiewicz found that 97% of users were coding text, while only 28% were coding images, and 23% directly coding audio. In many ways, the low numbers of people using images and audio are not surprising, but it is a shame. Text is a lot quicker to skip though to find passages compared to audio, and most people (especially researchers) and read a lot faster than people speak. At the moment, most of the software available for qualitative analysis struggles to match audio with meaning, either by syncing up transcripts, or through automatic transcription to help people understand what someone is saying.

Most qualitative researchers use audio as an intermediary stage, to create a recording of a research event, such as in interview or focus group, and have the text typed up word-for-word to analyse. But with this approach you risk losing all of the nuance that we are attuned to hear in the spoken word, emphasis, emotion, sarcasm – and these can subtly or completely transform the meaning of the text. However, since audio is usually much more laborious to work with, I can understand why 97% of people code with text. Still, I always try to keep the audio of an interview close to hand when coding, so that I can listen to any interesting or ambiguous sections, and make sure I am interpreting them fairly.

Since coding text is what most people spend most of their time doing, we spent a lot of time making the text coding process in Quirkos was as good as it could be. We certainly plan to add audio capabilities in the future, but this needs to be carefully done to make sure it connects closely with the text, but can be coded and retrieved as easily as possible.

 

But the main focus of the talk was the gaps in users' theoretical knowledge, that the survey revealed. For example, when asked which analytical framework the researchers used, only 23% described their approach as Grounded Theory. However, when the Grounded Theory approach was described in more detail, 61% of respondents recognised this method as being how they worked. You may recall from the previous top-up, bottom-down blog article that Grounded Theory is essentially finding themes from the text as they appear, rather than having a pre-defined list of what a researcher is looking for. An excellent and detailed overview can be found here.

Did a third of people in this sample really not know what analytical approach they were using? Of course it could be simply that they know it by another name, Emergent Coding for example, or as Dey (1999) laments, there may be “as many versions of grounded theory as there were grounded theorists”.

 

Finally, the study noted users comments on advantages and disadvantages with current software packages. People found that CAQDAS software helped them analyse text faster, and manage lots of different sources. But they also mentioned a difficult learning curve, and licence costs that were more than the monthly salary of a PhD student in Poland. Hopefully Quirkos will be able to help on both of these points...