Spring software update for Quirkos

snowdrops

Even in Edinburgh it’s finally beginning to get warmer, and we are planning the first update for Quirkos. This will be a minor release, but will add several features that users have been requesting.


The first of these is a batch import facility, you will be able to import a whole folder of text files, or just multiple files at once. This will be very useful for bringing in all your transcripts in one go, or for importing data from existing research projects.


Secondly, we are improving support for non-Latin based scripts, so that reports and other outputs will be able to show different languages such as Arabic, Chinese, Hebrew and Korean to name a few. Quirkos already allows you to work with all these languages in your project, but now generated reports will show all of these scripts properly.


We’ve also made a series of tweaks and alterations to the software, which will improve usability. These include fixing some small display issues with longer source properties, and scrolling on Windows 8 touchscreen devices. There should also be a few speed improvements, and better growth of quirks as text is added to them. This will make differences between emerging themes much more obvious.


Finally, we are really excited that this release will add two new platforms for Quirkos. First is our Android ‘app’, designed specifically for tablets. Unlike any other qualitative software for mobile platforms, the Android version of Quirkos has exactly the same feature set as the full desktop/laptop version. This means you can not only code on the go, on a great touchscreen interface, but also generate reports, run searches and add sources. Files are completely compatible across all platforms, so you can start work on your laptop, send your project to your tablet to do some coding on the train, and finish it off at home on your desktop.


Please note, that while there is technically nothing to stop you using this on your phone as well, it obviously becomes very fiddly on small screens! I’ve used Quirkos on a 4” phone, and while everything works, you’d need very small fingers and a lot of patience! Having said that, pair a Bluetooth mouse with a phone or tablet, and you quickly have a very flexible and portable coding tool.


Finally, I am thrilled to be able to release the first commercial qualitative software package for Linux! I’ve long been a user and advocate of Linux in all it’s different flavours, so supporting it is a great step to allow people to work on any platform they like. Again, the features and file compatibility will be identical, but with all the stability and security that Linux offers. While there are already two great open-source packages for Linux, the RQDA plugin for R is best for statistical analysis, and the lovely Weft QDA hasn’t been updated in nearly a decade. 


This update will be free for all existing users, and any new downloads will include the latest release. It doesn’t change the file structure at all, so there will be no compatibility problems. We hope to have this all ready and tested for you in March, so keep following this blog for the latest announcements!

 

How to organise notes and memos in Quirkos

EraserGirl post-it-notes

 

Many people have asked how they can integrate notes or memos into their project in Quirkos. At the moment, there isn’t a dedicated memo feature in the current version of Quirkos (v1.0), but this is planned for a free upgrade later in the year.


However, there are actually two ways in which users can integrate notes and memos into their project already using methods that give a great deal of flexibility.


The first, and most obvious ‘workaround’ is to create a separate source for notes and memos. First, create a blank source by pressing the (+) button on the bottom right of the screen, and select ‘New Source’. In the source properties view (Top right) you can change the name of this to ‘Memos’ or ‘Thoughts’ or something appropriate. You can then edit this source by long-clicking or right clicking in the source and selecting ‘Edit Source Text’. Now you have a dialogue box into which you can keep track of all your thoughts or memos during the coding process, and keep coming back to add more.


The advantage to having your memo as a source is that you can code with it in exactly the same way you would with any of your other sources. So you can write a note ‘I’m not sure about grouping Fear and Anxiety as separate codes’ and actually drag and drop that text onto the Anxiety and Fear bubbles – assigning that section of your note as being about those categories. When running queries or reports, you can easily see your comments together with the coding for that source, or just look at all your notes together.


This approach is most useful if you want to record your thoughts on the coding process or on developing your analysis framework. You can also have a series of note sources – for example if you had several people coding on a project. Don’t forget that you can export a source as a Word file with all the annotations, should you want to print or share just your notes on a project. One further tip is to create a Yes/No source property called ‘Memo’ or ‘Note’ so you can record which source(s) contain memos. Then when running queries or reports you can quickly choose whether to include coded memos or not.


However, if you want to record specific notes about each source, the second method is to actually create a source property for comments and notes. So for example, you might want to record some details of the interview that might have contextual importance. You can create a source property for ‘Interview conditions’ and note things like ‘Noisy room’ and ‘Respondent scared by Dictaphone’. By changing this property to be multiple choice, you can record several notes here, which of course can be used again across all the sources. This would let you quickly mark which interviewees were nervous about being recorded, and even see if responses from these people differed in a query comparison view.


However, you can also have a source category for more general notes, and add as many values to this property as you like. At the moment you can have very long values for source properties, but more than the first few words will not be shown. We are going to change this in an update in the next few weeks that will allow you to view much longer notes stored as property values.


These two different approaches should allow you plenty of ways to record notes, memos and musings as you go through and analyse your project. They also give you a lot of ways to sort and explore those notes – useful once you get to the stage of having lots of them! In future releases we will add a specific memo feature which will allow you to also have the option to add a note to a specific coding event, and will be implemented in unique but intuitive way. Watch this space!

The dangers of data mining for text

 Alexandre Dulaunoy CC - flickr.com/photos/adulau/12528646393

There is an interesting new article out, which looks at some of the commonly used algorithms in data mining, and finds that they are generally not very accurate, or even reproducible.

 

Specifically, the study by Lancichinetti et al. (2015) looks at automated topic classification using the commonly used latent Dirichlet allocation algorithm (LDA), a machine learning process which uses a probabilistic approach to categorise and filter large groups of text. Essentially this is a common approach used in data mining.

 

But the Lancichinetti et al. (2015) article finds that, even using a well structured source of data, such as Wikipedia, the results are – to put it mildly, disappointing. Around 20% of the time, the results did not come back the same, and when looking at a more complex group of scientific articles, reliability was as low as 55%.

 

As the authors point out, there has been little attempt to test the accuracy and validity of these data mining approaches, but they caution that users should be cautious about relying on inferences using these methods. They then go-on to describe a method that produces much better levels of reliability, yet until now, most analysis would have had this unknown level of inaccuracy: even if the test had been re-run with the same data, there is a good chance the results would have been different!

 

This underlines one of the perils with statistical attempts to mine large amounts of text data automatically: it's too easy to do without really knowing what you are doing. There is still no reliable alternative to having a trained researcher and their brain (or even an average person off the street) reading through text and telling you what it is about. The forums I engage with are full of people asking how they can do qualitative analysis automatically, and if there is some software that will do all their transcription for them – but the realistic answer is nothing like this currently exists.

 

Data mining can be a powerful tool, but it is essentially all based on statistical probabilities, churned out by a computer that doesn't know what it is supposed to be looking at. Data mining is usually a process akin to giving your text to a large number of fairly dumb monkeys on typewriters. Sure, they'll get through the data quickly, but odds are most of it won't be much use! Like monkeys, computers don't have that much intuition, and can't guess what you might be interested in, or what parts are more emotionally important than others.

 

The closest we have come so far is probably a system like IBM's Watson computer, a natural language processing machine which requires a supercomputer with 2,880 CPU cores, 16 terabytes of ram (16,384GB), and is essentially doing the same thing – a really really large number of dumb monkeys, and a process that picks the best looking stats from a lot of numbers. If loads of really smart researchers programme it for months, it can then win a TV show like Jeopardy. But if you wanted to win Family Feud, you'd have to programme it again.

 

Now, a statisical overview can be a good place to start, but researchers need to understand what is going on, look at the results intelligently, and work out what parts of the output don't make sense. And to do this well, you still need to be familiar with some of the source material, and have a good grip on the topics, themes and likely outcomes. Since a human can't read and remember thousands of documents, I still think that for most cases, in-depth reading of a few dozen good sources probably gives better outcomes than statistically scan-reading thousands.

 

Algorithms will improve, as outlined above, and as computers get more powerful and data gets more plentiful, statistical inferences will improve. But until then, most users are better off with a computer as a tool to aid their thought process, not to provide a single statistic answer to a complicated question.