Finding, using and some cautions on secondary qualitative data

secondary data analysis

 

Many researchers instinctively plan to collect and create new data when starting a research project. However, this is not always needed, and even if you end up having to collect your own data, looking for other sources already out there can help prevent redundancy and improve your conceptualisation of a project. Broadly you can think of two different types of secondary data: sources collected previously for specific research projects, and data 'scraped' from other sources, such as Hansard transcripts or Tweets.

 

Other data sources include policy documents, news articles, social media posts, other research articles, and repositories of qualitative data collected by other researchers, which might be interviews, focus groups, diaries, videos or other sources. Secondary analysis of qualitative data is not very common, since the data tends to be collected with a very narrow research interest, and of a depth that makes anonymisation and aggregation difficult. However, there is a huge amount of rich data that could be used again for other subjects, see this article by Heaton 2008 for a general overview, and Irwin 2013 for discussion of some of the ethical issues.

 

The advantage with using qualitative data analysis software is that you can keep many disperate sources of evidence together in one place. If you have an article positing a particular theory, you can quickly cross code supportive evidence with sections of text from that article, and evidence why your data does or does not support the literature. Examining social policy? Put the law or government guidelines into your project file, and cross reference statements from participants that challenge or support these dictates in practice. The best research does not exist in isolation: it must engage with both the existing literature and real policy to make an impact.

 


Data from other sources has the advantage that the researcher doesn’t have to spend time and resources collecting and recruiting. However, the constant disadvantage is that the data was not specifically obtained to meet your particular research questions or needs. For example, data from Twitter might give valuable insights into people’s political views. But statements that people make do not always equate with their views (this is true for directly collected research methods as well), so someone may make a controversial statement just to get more followers, or be suppressing their true beliefs if they believe that expressing them will be unpopular. 

 

Each different website has it’s own culture as well, which can affect what people share, and in how much detail. A paper by Hine (2012) shows that posts from the popular UK 'Mumsnet' forum have particular attitudes that are acceptable, and often posters are looking for validation of their behaviour from others. Twitter and Facebook are no exception, their each have different styles and acceptable posts that true internet ethnographers should understand well!

 

Even when using secondary data collected for a specific academic research project, the data might not be suitable for your needs. A great series of qualitative interviews about political views may seem to be a great fit for your research, but might not have asked a key question (for example about respondents parent’s beliefs) which renders the data unusable for your purpose. Additionally, it is usually impossible to identify respondents in secondary data sets to ask follow on questions, since data is anonymised. It’s sometimes even difficult to see the original research questions and interview schedule, and so to find out what questions were asked and for what purpose.

 

But despite all this, it is usually a good idea to look for secondary sources. It might give you an insight into the area of study you hadn’t considered, highlighting interesting issues that other research as picked up on. It might also reduce the amount of data you need to collect: if someone has done something similar in this area, you can design your data collection to highlight relevant gaps, and build on what has been done before (theoretically, all research should do this to a certain degree).

 

I know it’s something I keep reiterating, but it’s really important to understand who your data represents: you need some kind of contextual or demographic data. This is sometimes difficult to find when using data gathered from social media, where people are often given the option to only state very basic data, such as gender, location or age, but many people may not disclose. It can also be a pain to extract comments from social media posts in such a way that the identity of the poster and their posts are kept with it – however there are 3rd party tools that can help with this.

 

When writing up your research, you will also want to make explicit how you found and collected this source of data. For example, if you are searching Twitter for a particular hashtag or phrase, when did you run the search? If you run it the next day, or even minute, the results will be different, and how far back did you include posts? What languages? Are there comments that you excluded - especially if they look like spam or promotional posts? Think about making it replicable: what information would someone need to get the same data as you?

 

You should also try and be as comprehensive as possible. If you are examining newspaper articles for something that has changed over time (such as use of the phrase ‘tactical warfare’) don’t assume all your results will be online. While some projects have digitised newspaper archives from major titles, there are a lot of sources that are still print only, or reside in special databases. You can gain help and access to these from national libraries, such as the British Library.


There are growing repositories of open access data, including qualitative datasets. A good place to start is the UK Data Service, even if you are outside the UK, as it contains links to a number of international stores of qualitative data. Start here, but note that you will generally have to register, or even gain approval to access some datasets. This shouldn’t put you off, but don’t expect to always be able to access the data immediately, and plan to prepare a case for why you should be granted access. In the USA there is a qualitative specific data repository called the Qualitative Data Repository (or QDR) hosted by Syracuse University.

 

If you have found a research article based on interesting data that is not held on a public repository, it is worth contacting the authors anyway to see if they are able to share it. Research based on government funding increasingly comes with stipulations that the data should be made freely available, but this is still a fairly new requirement, and investigators from other projects may still be willing and able to grant access. However, authors can be protective over their data, and may not have acquired consent from participants in such a way that they would be allowed to share the data with third parties. This is something to consider when you do your own work: make sure that you are able to give back to the research community and share your own data in the future.

 


Finally, a note of caution about tailored results. Google, Facebook and other search engines do not show the same results in the same order to all people. They are customised for what they think you will be interested in seeing, based on your own search history and their assumptions of your gender, location, ethnicity, and political leanings. This article explains the impact of the ‘research bubble’ and this will impact the results that you get from social media (especially Facebook).

 

To get around this in a search, you can use a privacy focused search engine (like DuckDuckGo), add &pws=0 to the end of a Google search, or use a browser in ‘Private’ or ‘Incognito’ mode. However, it’s much more difficult to get neutral results from Facebook, so bare this in mind.

 

Hopefully these tips have given you food-for-thought about sources and limitations of secondary data sources, if you have any comments or suggestions, please share them in the forum. Don’t forget that Quirkos has simple copy and paste source generation that allows you to bring in secondary data from lots of different formats and internet feeds, and the visual interface makes coding and exploring them a breeze. Download a free trial from www.quirkos.com/get.html.

 

 

Sampling considerations in qualitative research

sampling crowd image by https://www.flickr.com/photos/jamescridland/613445810/in/photostream/

 

Two weeks ago I talked about the importance of developing a recruitment strategy when designing a research project. This week we will do a brief overview of sampling for qualitative research, but it is a huge and complicated issue. There’s a great chapter ‘Designing and Selecting Samples’ in the book Qualitative Research Practice (Ritchie et al 2013) which goes over many of these methods in detail.

 

Your research questions and methodological approach (ie grounded theory) will guide you to the right sampling methods for your study – there is never a one-size-fits-all approach in qualitative research! For more detail on this, especially on the importance of culturally embedded sampling, there is a well cited article by Luborsky and Rubinstein (1995). But it’s also worth talking to colleagues, supervisors and peers to get advice and feedback on your proposals.

 

Marshall (1996) briefly describes three different approaches to qualitative sampling: judgement/purposeful sampling, theoretical sampling and convenience sampling.

 

But before you choose any approach, you need to decide what you are trying to achieve with your sampling. Do you have a specific group of people that you need to have in your study, or should it be representative of the general population? Are you trying to discover something about a niche, or something that is generalizable to everyone? A lot of qualitative research is about a specific group of people, and Marshall notes:
“This is a more intellectual strategy than the simple demographic stratification of epidemiological studies, though age, gender and social class might be important variables. If the subjects are known to the research, they may be stratified according to known public attitudes or beliefs.”

 

Broadly speaking, convenience, judgement and theoretical sampling can be seen as purposeful – deliberately selecting people of interest in some way. However, randomly selecting people from a large population is still a desirable approach in some qualitative research. Because qualitative studies tend to have a small sample size due to the in-depth nature of engagement with each participant, this can have an impact if you want a representative sample. If you randomly select 15 people, you might by chance end up with more women than men, or a younger than desired sample. That is why qualitative studies may use a little bit of purposeful sampling, finding people to make sure the final profile matches the desired sampling frame. For much more on this, check out the last blog post article on recruitment.

 

Sample size will often also depend on conceptual approach: if you are testing a prior hypothesis, you may be able to get away with a smaller sample size, while a grounded theory approach to develop new insights might need a larger group of respondents to test that the findings are applicable. Here, you are likely to take a ‘theoretical sampling’ approach (Glaser and Strauss 1967) where you specifically choose people who have experiences that would contribute to a theoretical construct. This is often iterative, in that after reviewing the data (for theoretical insights) the researcher goes out again to find other participants the model suggests might be of interest.

 

The convenience sampling approach which Marshal mentions as being the ‘least rigorous technique’ is where researchers target the most ‘easily accessible’ respondents. This could even be friends, family or faculty. This approach can rarely be methodologically justified, and is unlikely to provide a representative sample. However, it is endemic in many fields, especially psychology, where researchers tend to turn to easily accessible psychology students for experiments: skewing the results towards white, rich, well-educated Western students.

 

Now we turn to snowball sampling (Goodman 1961). This is different from purposeful sampling in that new respondents are suggested by others. In general, this is most suited to work with ‘marginalised or hard-to-reach’ populations, where responders are not often forthcoming (Sadler et al 2010). For example, people may not be open about their drug use, political views or living with stigmatising conditions, yet often form closely connected networks. Thus, by gaining trust with one person in the group, others can be recommended to the researcher. However, it is important to note the limitations with this approach. Here, there is the risk of systemic bias: if the first person you recruit is not representive in some way, their referrals may not be either. So you may be looking at people living with HIV/AIDS, and recruit through a support group that is formed entirely of men: they are unlikely to suggest women for the study.

 

For these reasons there are limits to the generalisability and appropriateness of snowball sampling for most subjects of inquiry, and it should not be taken as an easy fix. Yet while many practitioners explain the limitations with snowball research, it can be very well suited for certain kinds of social and action research, this article by Noy (2008) outlines some of the potential benefits to power relations and studying social networks.

 

Finally, there is the issue of sample size and ‘saturation’. This is when there is enough data collected to confidently answer the research questions. For a lot of qualitative research this means collected and coded data as well, especially if using some variant of grounded theory. However, saturation is often a source of anxiety for researchers: see for example the amusingly titled article “Are We There Yet?” by Fusch and Ness (2015). Unlike quantitative studies where a sample size can be determined by the desired effect size and confidence interval in a chosen statistical test, it is more difficult to put an exact number on the right number of participant responses. This is especially because responses are themselves qualitative, not just numbers in a list: so one response may be more data rich than another.

 

While a general rule of thumb would indicate there is no harm in collecting more data than is strictly necessary, there is always a practical limitation, especially in resource and time constrained post-graduate studies. It can also be more difficult to recruit than anticipated, and many projects working with very specific or hard-to-reach groups can struggle to find a large enough sample size. This is not always a disaster, but may require a re-examination of the research questions, to see what insights and conclusions are still obtainable.

 

Generally, researchers should have a target sample size and definition of what data saturation will look like for their project before they begin sampling and recruitment. Don’t forget that qualitative case studies may only include one respondent or data point, and in some situations that can be appropriate. However, getting the sampling approach and sample size right is something that comes with experience, advice and practice.

 

As I always seem to be saying in this blog, it’s also worth considering the intended audience for your research outputs. If you want to publish in a certain journal or academic discipline, it may not be responsive to research based on qualitative methods with small or ‘non-representative’ samples. Silverman (2013 p424) mentions this explicitly with examples of students who had publications rejected for these reasons.

 

So as ever, plan ahead for what you want to achieve for your research project, the questions you want to answer, and work backwards to choose the appropriate methodology, methods and sample for your work. Also, check the companion article about recruitment, most of these issues need to be considered in tandem.

 

Once you have your data, Quirkos can be a great way to analyse it, whether your sample size has one or dozens of respondents! There is a free trial and example data sets to see for yourself if it suits your way of working, and much more information in these pages. We also have a newly relaunched forum, with specific sections on qualitative methodology if you wanted to ask questions, or comment on anything raised in this blog series.