Skip to Main Content
University of York Library
Library Subject Guides

Data: a Practical Guide

Qualitative data analysis

Final prototype for the Data practical guide.
Feedback
X

Qualitative data analysis

Qualitative data is information which is non-numerical, which makes it a lot more time-consuming to analyse effectively. But there are ways...

Tools for working with text data

A lot of qualitative data analysis involves working with text data. There are many ways of working with text data, including using qualitative data software like NVivo, but also using a text editor or processor (e.g. Notepad/TextEdit, Word or Google Docs), a spreadsheet, or a coding language like Python.

Using text editing tools

If you have text data, it makes sense that you might use text editing tools. These can do basic searching for text strings, often given numerical outputs for search terms, and you can use find and replace features to clean up text data.

You might use a text editor like Notepad (on Windows) or TextEdit (on Mac) to work with plain text (that is, text that is not visually formatted). You might have files in a .txt format that have been saved as plain text. These can do find and replace and search files, and can be good to use with coding languages like Python when going further with text data analysis.

You could also use word processing tools like Microsoft Word or Google Docs. These are designed for creating documents, but you can also use them to find and replace text data if needed.

If you're interested in a more advanced example of using find and replace features in text tools, The Programming Historian has a lesson on Understanding Regular Expressions which uses a word processing tool to manipulate a text document using regular expressions, a way of expressing more complicated search patterns for text data. If you want to learn more about using regular expressions, or regex for short, there are many helpful resources out there, for example RegExr.

Using spreadsheets

If you have text data in a structured format like CSV, then spreadsheets can be a useful way to work with this data. Again, you can do searching and use find and replace to look for particular terms, as well as using filtering features to drill even further into text data. You can also transform data in various ways, depending on its structure. The Spreadsheets guide goes over much of what you'll need to get started:

Qualitative analysis doesn't have to be purely digital. Sometimes just getting the highlighter pens out is a perfectly valid approach. But however you're working with your qualitative information, using a spreadsheet as somewhere to tally counts, references, and the like, can be a really good idea.

Using coding languages

If you need more customised functionality when working with text data, or want to be able to automate analysis, then coding can be very useful. Python is one example of a coding language often used for working with text data, and our Python guide page has some suggestions for online resources to explore related to text data:

The Programming Historian also has a lesson on Cleaning OCR'd text with Regular Expressions using Python, if you're looking for an example of how Python has been used to work with text data.

Marking up

Marking up or coding is the process of going through some qualitative data and applying labels (codes) to areas that refer to a particular subject or have a particular sentiment. That way you can count up the number of times a particular thing was mentioned or a particular emotion was expressed.

Effectively the aim is to get something quantifiable out of something qualitative, and we need to be aware that this is a reductive process: we're taking data that is potentially rich and nuanced and reducing it to something simpler that we can count. A lot therefore rests on your ability to interpret the original data in an honest and accurate way.

The practice of marking up

Take the following text:

"The lake can be really beautiful at times. But the geese can get a bit violent!"

We might code that up in a number of ways; for instance:

Topic

"The lake can be really beautiful at times. But the geese can get a bit violent!"

Here we're identifying the topic of what's being talked about: the lake, and the geese.

You might want to think in advance about what categories you're looking for (or even use a pre-existing set of terms). Or you could make it up as you go along. Think about what is appropriate for your particular research need. Either way, you'll need to interpret for yourself precisely what was being referred to in each case.

Sentiment

"The lake can be really beautiful at times. But the geese can get a bit violent!"

Here we're looking for whether comments are positive or negative: the lake, and the geese.

This sort of sentiment analysis requires even more of a judgement on your part than coding by topic. You're having to interpret the meaning of what was being said, and there is always a risk that you'll impose a meaning that wasn't there and misrepresent the text. Again, you should factor that into your methodology.


As the above examples demonstrate, sometimes a passage might have more than one code that can be applied to it. That's perfectly justifiable, but again, you'll need to consider how such coding 'density' might effect your analysis. A longer text, or a text discussing more than one topic, might have more coding points than a shorter text or one that stayed on topic. So you need to be careful about reading too much into larger or smaller numbers.

NVivo

Qualitative data analysis is the analysis of things other than numbers — usually text information. It's mostly a case of just reading stuff, but it can also be advantageous to find ways of quantifying content.

NVivo is a qualitative data, text management and organisational tool which enables an analysis of very rich text-based and/or multimedia information, where deep levels of analysis on small or large volumes of data are required. It is often used for qualitative research and literature review.

Analysing spoken content

NVivo can be used to analyse more than just text-based files: images, audio and video can also be marked up. If you have an audio file of an interview, for example, you can annotate it directly in NVivo, without the need for transcription:

A waveform with coding stripes

If you do feel the need for a transcription, you're going to have to create it yourself (unless you can persuade somebody else to do it for you!). To save time, you could transcribe directly into NVivo and code as you go; or you could choose to code it up in some other way.

Auto-transcription is possible (though seldom very reliable). NVivo offers a paid transcription service, but there are free alternatives such as the ones we discuss on our Subtitling Skills Guide (if you're conducting an interview over Zoom, you can also enable its built-in auto-transcription). For research data, you'll want to check if the service has been approved in terms of security and safety. Be aware, though, that even with an automatic transcript, you're going to need to do a lot of work if what you're actually wanting is a perfectly punctuated verbatim text. It may well be nearly as quick to transcribe it manually.

If you create a transcript that you're wanting to analyse in NVivo, be sure to make use of Styles in your document. This will allow you to do some basic auto-coding (i.e. to identify the interviewer and interviewee). You can also sync a transcript to a video though this might require some preparation if you're working from a conventional subtitle file.

Forthcoming training sessions

Forthcoming sessions on :

Taught students
Staff
Researchers
Show details & booking for these sessions

There's more training events at: