Many researchers instinctively plan to collect and create new data when starting a research project. However, this is not always needed, and even if you end up having to collect your own data, looking for other sources already out there can help prevent redundancy and improve your conceptualisation of a project. Broadly you can think of two different types of secondary data: sources collected previously for specific research projects, and data 'scraped' from other sources, such as Hansard transcripts or Tweets.
Other data sources include policy documents, news articles, social media posts, other research articles, and repositories of qualitative data collected by other researchers, which might be interviews, focus groups, diaries, videos or other sources. Secondary analysis of qualitative data is not very common, since the data tends to be collected with a very narrow research interest, and of a depth that makes anonymisation and aggregation difficult. However, there is a huge amount of rich data that could be used again for other subjects, see this article by Heaton 2008 for a general overview, and Irwin 2013 for discussion of some of the ethical issues.
The advantage with using qualitative data analysis software is that you can keep many disperate sources of evidence together in one place. If you have an article positing a particular theory, you can quickly cross code supportive evidence with sections of text from that article, and evidence why your data does or does not support the literature. Examining social policy? Put the law or government guidelines into your project file, and cross reference statements from participants that challenge or support these dictates in practice. The best research does not exist in isolation: it must engage with both the existing literature and real policy to make an impact.
Data from other sources has the advantage that the researcher doesn’t have to spend time and resources collecting and recruiting. However, the constant disadvantage is that the data was not specifically obtained to meet your particular research questions or needs. For example, data from Twitter might give valuable insights into people’s political views. But statements that people make do not always equate with their views (this is true for directly collected research methods as well), so someone may make a controversial statement just to get more followers, or be suppressing their true beliefs if they believe that expressing them will be unpopular.
Each different website has it’s own culture as well, which can affect what people share, and in how much detail. A paper by Hine (2012) shows that posts from the popular UK 'Mumsnet' forum have particular attitudes that are acceptable, and often posters are looking for validation of their behaviour from others. Twitter and Facebook are no exception, their each have different styles and acceptable posts that true internet ethnographers should understand well!
Even when using secondary data collected for a specific academic research project, the data might not be suitable for your needs. A great series of qualitative interviews about political views may seem to be a great fit for your research, but might not have asked a key question (for example about respondents parent’s beliefs) which renders the data unusable for your purpose. Additionally, it is usually impossible to identify respondents in secondary data sets to ask follow on questions, since data is anonymised. It’s sometimes even difficult to see the original research questions and interview schedule, and so to find out what questions were asked and for what purpose.
But despite all this, it is usually a good idea to look for secondary sources. It might give you an insight into the area of study you hadn’t considered, highlighting interesting issues that other research as picked up on. It might also reduce the amount of data you need to collect: if someone has done something similar in this area, you can design your data collection to highlight relevant gaps, and build on what has been done before (theoretically, all research should do this to a certain degree).
I know it’s something I keep reiterating, but it’s really important to understand who your data represents: you need some kind of contextual or demographic data. This is sometimes difficult to find when using data gathered from social media, where people are often given the option to only state very basic data, such as gender, location or age, but many people may not disclose. It can also be a pain to extract comments from social media posts in such a way that the identity of the poster and their posts are kept with it – however there are 3rd party tools that can help with this.
When writing up your research, you will also want to make explicit how you found and collected this source of data. For example, if you are searching Twitter for a particular hashtag or phrase, when did you run the search? If you run it the next day, or even minute, the results will be different, and how far back did you include posts? What languages? Are there comments that you excluded - especially if they look like spam or promotional posts? Think about making it replicable: what information would someone need to get the same data as you?
You should also try and be as comprehensive as possible. If you are examining newspaper articles for something that has changed over time (such as use of the phrase ‘tactical warfare’) don’t assume all your results will be online. While some projects have digitised newspaper archives from major titles, there are a lot of sources that are still print only, or reside in special databases. You can gain help and access to these from national libraries, such as the British Library.
There are growing repositories of open access data, including qualitative datasets. A good place to start is the UK Data Service, even if you are outside the UK, as it contains links to a number of international stores of qualitative data. Start here, but note that you will generally have to register, or even gain approval to access some datasets. This shouldn’t put you off, but don’t expect to always be able to access the data immediately, and plan to prepare a case for why you should be granted access. In the USA there is a qualitative specific data repository called the Qualitative Data Repository (or QDR) hosted by Syracuse University.
If you have found a research article based on interesting data that is not held on a public repository, it is worth contacting the authors anyway to see if they are able to share it. Research based on government funding increasingly comes with stipulations that the data should be made freely available, but this is still a fairly new requirement, and investigators from other projects may still be willing and able to grant access. However, authors can be protective over their data, and may not have acquired consent from participants in such a way that they would be allowed to share the data with third parties. This is something to consider when you do your own work: make sure that you are able to give back to the research community and share your own data in the future.
Finally, a note of caution about tailored results. Google, Facebook and other search engines do not show the same results in the same order to all people. They are customised for what they think you will be interested in seeing, based on your own search history and their assumptions of your gender, location, ethnicity, and political leanings. This article explains the impact of the ‘research bubble’ and this will impact the results that you get from social media (especially Facebook).
To get around this in a search, you can use a privacy focused search engine (like DuckDuckGo), add &pws=0 to the end of a Google search, or use a browser in ‘Private’ or ‘Incognito’ mode. However, it’s much more difficult to get neutral results from Facebook, so bare this in mind.
Hopefully these tips have given you food-for-thought about sources and limitations of secondary data sources, if you have any comments or suggestions, please share them in the forum. Don’t forget that Quirkos has simple copy and paste source generation that allows you to bring in secondary data from lots of different formats and internet feeds, and the visual interface makes coding and exploring them a breeze. Download a free trial from www.quirkos.com/get.html.