Subject Guides: Data: a Practical Guide: Gathering data

Feedback

Gathering data

There are many ways to gather your own data. You might be conducting an experiment and recording observations and results. You might be performing some sort of literature search or analysing texts. Or maybe you'll be conducting a survey. On this page we'll mostly be focusing on the latter approach (with practical considerations and tips for designing a survey, and a look at survey question types), but many of the principles (for instance the section on validating data) will apply to any recording of data.

Also in this section we'll look more closely at the specific survey tools available at York, and some of the ethical considerations of gathering and storing data:

Survey tools

Data ethics

Designing a survey - practical considerations

When putting together a form or a survey, there are a few things to consider. In terms of conducting research, you will need to select an appropriate balance of quantitative (countable) and qualitative (textual) questions, and the sort of questions you will need to ask will vary according to your field or task. However, there are also some broader principles of design that you should bear in mind:

What's the aim?

What is it you're trying to achieve? Get a clear understanding of what information you actually need. It will inform your choice of tools, your question design, and your analysis.

What's essential?

Don't waste people's time on redundant questions! The longer your survey, the less likely people will get to the end of it. Just focus on collecting what is absolutely necessary.

How will you analyse what you collect?

What methods will you use to crunch the responses? This will inform your question types.

Data analysis

What will you ask?

How you ask a question will affect the response you receive. Make sure you're asking what you think you're asking. Don't be afraid to ask other people; it's a survey after all!

Lock down the possible answers to your questions to give yourself something you can count without having to read every single response...

How sensitive is the data you're collecting?

Don't ask for data you don't need. And if you're collecting personal, confidential, or sensitive information, you need to think seriously not only about the questions you ask but also about the way you ask them, the tools you use, and the ways you'll be using the data afterwards.

Data ethics

How will people complete your survey?

What's your distribution plan? Or are you just going to spam people with a link they'll never click? Who's allowed to take your survey, and will you allow them to fill it in more than once? Do you know in advance who you're targetting, and if so, can that help you to cut down the questions in your survey? Are people completing the survey online or on paper? Desktop or mobile? If it's a long form, will they be able to take a break from filling it in? Will you send reminders? How many responses will you need?

What tool will you use?

If you're using a survey tool like Google Forms or Qualtrics, which can best accommodate your survey design, your distribution plan, and your subsequent analysis?

Survey tools

How will you structure your survey?

If a respondent doesn't need to see a question, don't ask them it. You can use branching in a survey to reduce the number of irrelevant questions served up. But the survey still needs to flow logically, and you'll need to test every eventuality. Is everyone seeing the questions they need to see?

Does your survey work?

Test everything. Read and re-read any introductory text to make sure it makes sense. Test that the questions are understandable. Do some test distributions. Get other people to test it. Gather some test data and make sure you can do what you need to do with it. The last thing you want is for people to not complete your survey because they're frustrated with it, and the second last thing you want is a load of responses that don't make sense to you.

Distribute your survey!

Send it to people and hope. If you need to send reminders, send them. But avoid spamming people. If you spot any problems with your survey once it's released, you'll need to be careful that any modifications don't invalidate the results you've already collected.

Question types

When we ask people questions in real life we get whatever response the people choose to give us. They have a free choice as to how they respond.

If we ask someone...

What's your favourite colour?

...we might expect a range of answers: single word responses like "Blue", "Red", "Green", etc., but also "Ooh, I don't know...", "I think probably yellow but I'm also rather partial to purple", or even a lengthy response detailing their thoughts on every colour, and how their favourite colour changes according to mood.

This sort of qualitative answer can be valuable: it tells us a lot more about a person's views than a simple one-word answer. But it can also be difficult to analyse. So surveys often look to close down questions to restrict the number of possible responses to something more measurable: more quantitative. We might reduce "What's your favourite colour" to a multiple choice range of options if we're just wanting a quick answer and we're not really interested in someone's life-story.

But perhaps our study is really interested in finding out the detail about people's colour preferences, and really does want to get some rich information back. In which case we'll want to offer a free text question on the topic.

And actually there are rather a lot of colours when you start to think about it, and pre-empting what colours people might choose could get difficult. In which case we might at the very least want a mix of the two approaches: an "If other..." question so that people can still say vermilion, heliotrope, or chartreuse, even if we overlooked them in our list of options.

Free text

Free text fields in a survey can have a number of uses, but generally speaking they let your respondent provide the response they want to give.

Free text fields generally come in two different flavours:

and

long answer

Essentially the difference comes down to the size of box. Google Forms and Qualtrics actually have a few different lengths available to suit your needs, but in principle we can think of text boxes just in terms of those two types.

Short answer boxes encourage brevity. They're useful in cases where we can't realistically pre-empt a range of possible answers, for instance with a Name field. But we might also use them elsewhere (like in our colour example) to give the hint that we don't want a long answer.

Long answer boxes give a respondent the sense that they're allowed to write at length (and the actual size of the box may go some way to indicating the length expected).

Multiple choice

Multiple choice options are easily quantifiable in a way that free text isn't. The aim is to effectively restrict the range of possible responses to something more manageable and more countable. When completed, a multiple choice question gives us a uniform response, or a list of (usually comma-separated) uniform responses, which we can unpack and analyse.

Multiple choice questions come in two main flavours:

There's also

A car radio with five radio buttons; one of which is pressed in

Radio buttons get their name from old-school car radios where buttons were tied to particular stations, and only one station could be pressed at any one time.

Once triggered, a radio button question can't easily be unselected, so you might want to make sure there are sufficient options to allow your respondent to 'escape' a situation where they're forced to inaccurately answer a question.

If you're making a multiple choice list, you'll need to be sure you're including every option you should. That may also mean including a way of specifying none of the above, not applicable, or other. While multiple choice questions are usually about limiting the choice available to respondents, it can sometimes be useful to offer a bit more freedom, and both Google Forms and Qualtrics allow you to specify an "Other" option with an accompanying free text field so that people can provided their own options.

Linear scales and matrix tables

Linear scale questions are a form of radio button multiple choice:

These are handy because you get a number to analyse rather than a piece of text that can be difficult to deal with. But the risk is that you just end up with an awful lot of 4s or 7s that don't really tell you as much as you might like.

If you're using questions like this, the convention is to go from least positive value (left) to most positive value (right). People don't always read what you're asking, so try to stick with convention.

If you've got several scale-type questions in a row, you can arrange them as a matrix table question:

If you're using multiple choice questions of any kind in Qualtrics (and particularly scale questions), be sure to check that the underlying values match what's being shown on screen, otherwise you might have problems when it comes to analysis.

Special question types

Google Forms includes special date and time pickers which give properly formatted dates and times for analysis. Qualtrics has a whole load of additional question types including graphics-based questions, sliders, and ranking. Both also have the option to upload a file.

Even a free text field can be customised to behave in certain ways and only accept certain kinds of data. Take a look at the section on Validating data to find out more.

Choose a question for your form
Question types in Google Forms
Question types
Available questions in Qualtrics

Validating data

You need the data you collect to be useable, and those uses will probably include analysis. It's therefore important that you ask the right questions, and set them up in such a way that you can work effectively with the responses you get.

Data types

If we're analysing data, we might recognise specific types of information:

text like this is great for rich, qualitative responses. But qualitative analysis requires lots of reading. Something more quantifiable may be more helpful for summary analysis...
numbers for instance are really easy to do maths with. You can get counts, averages, and all sorts of complicated statistics from numbers.
dates and times might be thought of as a specific type of number, and one that crops up a lot in forms. By using specific date/time question-types we can be sure to get responses in a consistent format we can work with, without any "a week next Tuesday" text answers complicating matters.
Multiple choice options are also effectively countable in a way that free text isn't. They will give us a uniform response, or a list of (usually comma-separated) responses which we can unpack and analyse.

Choosing the right question type can help ensure we get the right sort of data back: useable data we can work with.

Framing the question

We can limit the range of potential responses by asking the right questions. Suppose we asked...

How many siblings do you have?

We're probably wanting a numeric answer. But it might not be that simple: What about step-siblings? And is it just alive siblings that count? Your respondent may feel it necessary to provide an essay response to this question, which is going to make your analysis an awful lot harder.

You need to consider as many eventualities as possible when formulating survey questions, and provide as much clarity as you can. It might also be helpful to restrict how your respondents can respond — a process called validation.

Closed questions

If we ask someone...

What did you think of the survey?

...we might expect a range of answers: "I thought it was a bit long", "It was great!", "Well cool", "Rubbish", "Graet!", "I found the question about siblings a bit tricky", etc.

This sort of qualitative answer can be valuable, but it can also be difficult to analyse.

Instead we could provide a controlled multiple-choice list of responses from which our respondents can pick: something that's then going to be a lot easier to count.

Another common approach is a five-point or ten-point scale: "Please rate this survey out of 10", or whatever. Suddenly you're getting a numeric response that you can do proper maths with. But there’s a trade-off: the more you force a respondent to give a quantitative response, the less qualitative information (and the less nuance) you’ll potentially receive.

Validation rules

Even text questions can use validation: for instance to check that a response matches the pattern of an email address, phone number, or postcode. You can also put limits on the amount being written, or accept only numbers. In fact, with regular expression rules you can do very elaborate validation of a 'free'-text response. Just be careful not to annoy your respondents by making your survey too hard to complete. If rules apply, make them clear!

Set rules for your form
Setting validation rules in Google Forms
Response requirements & validation
A look at content validation in Qualtrics

Tips for survey design

From working out what you're asking and who you want to fill out your survey, to making sure you make your survey clear and have questions that people can actually fill out, there's a lot to think about when designing surveys. We look at 10 steps for designing forms and surveys, with prompts to help you think about your own data collection.

Survey design: 10 steps to less annoying forms and surveys
Our tips for designing surveys and forms that people will actually fill out.

Open in Google Slides

Thinking inside the box
A look at name conventions and the potential problems of name fields; presented in the style of an online form.

A key issue covered in these tips is around what data you actually collect with your form or survey. Being aware of what you're doing with the data and your legal requirements when collecting data will help you make better surveys.

Data Protection Training Resources
Library of resources on Data Protection from York's Information Governance team.

Data: Searching for data
Where and how to search for existing datasets.
Research Data Management: a Practical Guide
A guide to help you manage your research data well.

Forthcoming training sessions

Forthcoming sessions on … :

Show details & booking for these sessions

There's more training events at:

Skills Guide: Training
Take a look at our list of events

	Strongly disagree	Disagree	Neither	Agree	Strongly agree	N/A
They group...
...your questions...
...together for...
...more efficient...
...use of space.

	1	2	3	4	5
strongly disagree						strongly agree