Skip to Main Content
University of York Library
Subject Guides

Skills Guides

Forms & surveys

Feedback
X

Designing a survey - practical considerations

When putting together a form or a survey, there are a few things to consider. In terms of conducting research, you will need to select an appropriate balance of quantitative and qualitative questions, and the sort of questions you will need to ask will vary according to your discipline. However, there are also some broader principles of design that you bear in mind:

What's the aim?

What is it you're trying to achieve? Get a clear understanding of what information you actually need. It will inform your choice of tools, your question design, and your analysis.


What's essential?

Don't waste people's time on redundant questions! The longer your survey, the less likely people will get to the end of it. Just focus on collecting what is absolutely necessary.


How will you analyse what you collect?

What methods will you use to crunch the responses? This will inform your question types.


 Analysing data
 
 Spreadsheets

What will you ask?

How you ask a question will affect the response you receive. Make sure you're asking what you think you're asking. Don't be afraid to ask other people; it's a survey after all!

Lock down the possible answers to your questions to give yourself something you can count without having to read every single response...


How sensitive is the data you're collecting?

Don't ask for data you don't need. And if you're collecting personal or sensitive information, you need to think seriously not only about the questions you ask but also about the way you ask them, the tools you use, and the ways you'll be using the data afterwards.


 Information security

How will people complete your survey?

What's your distribution plan? Or are you just going to spam people with a link they'll never click? Who's allowed to take your survey, and will you allow them to fill it in more than once? Do you know in advance who you're targetting, and if so, can that help you to cut down the questions in your survey? Are people completing the survey online or on paper? Desktop or mobile? If it's a long form, will they be able to take a break from filling it in? Will you send reminders? How many responses will you need?


What tool will you use?

If you're using a survey tool like Google Forms or Qualtrics, which can best accommodate your survey design, your distribution plan, and your subsequent analysis?


How will you structure your survey?

If a respondent doesn't need to see a question, don't ask them it. You can use branching in a survey to reduce the number of irrelevant questions served up. But the survey still needs to flow logically, and you'll need to test every eventuality. Is everyone seeing the questions they need to see?


Does your survey work?

Test everything. Read and re-read any introductory text to make sure it makes sense. Test that the questions are understandable. Do some test distributions. Get other people to test it. Gather some test data and make sure you can do what you need to do with it. The last thing you want is for people to not complete your survey because they're frustrated with it, and the second last thing you want is a load of responses that don't make sense to you.


Distribute your survey!

Send it to people and hope. If you need to send reminders, send them. But avoid spamming people. If you spot any problems with your survey once it's released, you'll need to be careful that any modifications don't invalidate the results you've already collected.

Tips for survey design

From working out what you're asking and who you want to fill out your survey, to making sure you make your survey clear and have questions that people can actually fill out, there's a lot to think about when designing surveys. We look at 10 steps for designing forms and surveys, with prompts to help you think about your own data collection.

Question types

When we ask people questions in real life we get whatever response the people choose to give us. They have a free choice as to how they respond.

If we ask someone...

What's your favourite colour?

...we might expect a range of answers: single word responses like "Blue", "Red", "Green", etc., but also "Ooh, I don't know...", "I think probably yellow but I'm also rather partial to purple", or even a lengthy response detailing their thoughts on every colour, and how their favourite colour changes according to mood.

This sort of qualitative answer can be valuable: it tells us a lot more about a person's views than a simple one-word answer. But it can also be difficult to analyse. So surveys often look to close down questions to restrict the number of possible responses to something more measurable: more quantitative. We might reduce "What's your favourite colour" to a multiple choice range of options if we're just wanting a quick answer and we're not really interested in someone's life-story.

But perhaps our study is really interested in finding out the detail about people's colour preferences, and really does want to get some rich information back. In which case we'll want to offer a free text question on the topic.

And actually there are rather a lot of colours when you start to think about it, and pre-empting what colours people might choose could get difficult. In which case we might at the very least want a mix of the two approaches: an "If other..." question so that people can still say vermilion, heliotrope, or chartreuse, even if we overlooked them in our list of options.

Free text

Free text fields in a survey can have a number of uses, but generally speaking they let your respondent provide the response they want to give.

Free text fields generally come in two different flavours:

and

Essentially the difference comes down to the size of box. Google Forms and Qualtrics actually have a few different lengths available to suit your needs, but in principle we can think of text boxes just in terms of those two types.

Short answer boxes encourage brevity. They're useful in cases where we can't realistically pre-empt a range of possible answers, for instance with a Name field. But we might also use them elsewhere (like in our colour example) to give the hint that we don't want a long answer.

Long answer boxes give a respondent the sense that they're allowed to write at length (and the actual size of the box may go some way to indicating the length expected).

Multiple choice

Multiple choice options are easily quantifiable in a way that free text isn't. The aim is to effectively restrict the range of possible responses to something more manageable and more countable. When completed, a multiple choice question gives us a uniform response, or a list of (usually comma-separated) uniform responses, which we can unpack and analyse.

Multiple choice questions come in two main flavours:

Radio buttons...

...only allow you...

...to choose one option.


Checkboxes...

...allow you to pick...

...multiple options.


There's also


A car radio with five radio buttons; one of which is pressed in
Radio buttons get their name from old-school car radios where buttons were tied to particular stations, and only one station could be pressed at any one time.

Tip

Once triggered, a radio button question can't easily be unselected, so you might want to make sure there are sufficient options to allow your respondent to 'escape' a situation where they're forced to inaccurately answer a question.


If you're making a multiple choice list, you'll need to be sure you're including every option you should. That may also mean including a way of specifying none of the above, not applicable, or other. While multiple choice questions are usually about limiting the choice available to respondents, it can sometimes be useful to offer a bit more freedom, and both Google Forms and Qualtrics allow you to specify an "Other" option with an accompanying free text field so that people can provided their own options.


Linear scales and matrix tables

Linear scale questions are a form of radio button multiple choice:

12345
strongly disagreestrongly agree

These are handy because you get a number to analyse rather than a piece of text that can be difficult to deal with. But the risk is that you just end up with an awful lot of 4s or 7s that don't really tell you as much as you might like.


Tip

If you're using questions like this, the convention is to go from least positive value (left) to most positive value (right). People don't always read what you're asking, so try to stick with convention.

If you've got several scale-type questions in a row, you can arrange them as a matrix table question:

Strongly disagreeDisagreeNeitherAgreeStrongly agreeN/A
They group...
...your questions...
...together for...
...more efficient...
...use of space.

Tip

If you're using multiple choice questions of any kind in Qualtrics (and particularly scale questions), be sure to check that the underlying values match what's being shown on screen, otherwise you might have problems when it comes to analysis.

Special question types

Google Forms includes special date and time pickers which give properly formatted dates and times for analysis. Qualtrics has a whole load of additional question types including graphics-based questions, sliders, and ranking. Both also have the option to upload a file.

Even a free text field can be customised to behave in certain ways and only accept certain kinds of data. Take a look at the section on Validating data to find out more.

Validating data

You need the data you collect to be useable, and those uses will probably include analysis. It's therefore important that you ask the right questions, and set them up in such a way that you can work effectively with the responses you get.

Data types

If we're analysing data, we might recognise specific types of information:

  • text like this is great for rich, qualitative responses. But qualitative analysis requires lots of reading. Something more quantifiable may be more helpful for summary analysis...
  • numbers for instance are really easy to do maths with. You can get counts, averages, and all sorts of complicated statistics from numbers.
  • dates and times might be thought of as a specific type of number, and one that crops up a lot in forms. By using specific date/time question-types we can be sure to get responses in a consistent format we can work with, without any "a week next Tuesday" text answers complicating matters.
  • Multiple choice options are also effectively countable in a way that free text isn't. They will give us a uniform response, or a list of (usually comma-separated) responses which we can unpack and analyse.

Choosing the right question type can help ensure we get the right sort of data back: useable data we can work with.

Framing the question

We can limit the range of potential responses by asking the right questions. Suppose we asked...

How many siblings do you have?

We're probably wanting a numeric answer. But it might not be that simple: What about step-siblings? And is it just alive siblings that count? Your respondent may feel it necessary to provide an essay response to this question, which is going to make your analysis an awful lot harder.

You need to consider as many eventualities as possible when formulating survey questions, and provide as much clarity as you can. It might also be helpful to restrict how your respondents can respond — a process called validation.

Closed questions

If we ask someone...

What did you think of the survey?

...we might expect a range of answers: "I thought it was a bit long", "It was great!", "Well cool", "Rubbish", "Graet!", "I found the question about siblings a bit tricky", etc.

This sort of qualitative answer can be valuable, but it can also be difficult to analyse.

Instead we could provide a controlled multiple-choice list of responses from which our respondents can pick: something that's then going to be a lot easier to count.

Another common approach is a five-point or ten-point scale: "Please rate this survey out of 10", or whatever. Suddenly you're getting a numeric response that you can do proper maths with. But there’s a trade-off: the more you force a respondent to give a quantitative response, the less qualitative information (and the less nuance) you’ll potentially receive.

Validation rules

Even text questions can use validation: for instance to check that a response matches the pattern of an email address, phone number, or postcode. You can also put limits on the amount being written, or accept only numbers. In fact, with regular expression rules you can do very elaborate validation of a 'free'-text response. Just be careful not to annoy your respondents by making your survey too hard to complete. If rules apply, make them clear!

Forthcoming training sessions

Forthcoming sessions on :

Show details & booking for these sessions

There's more training events at:

Survey tools

At York there are two main survey tools: Google Forms and Qualtrics.

In the slides below we take a look at how they work, and consider the respective pros and cons of each.

Google Forms

Google FormsGoogle Forms is part of the Google Apps for Education suite.

It is great for simple forms and surveys and has the benefit that the survey can be locked down to University of York members as a whole, as well as to specific individuals. 

Responses can be fed in real time to a Google Sheet.



Qualtrics

Qualtrics​Qualtrics has far more question types than Google Forms, with advanced options when it comes to structuring and branching your survey. It can also use distribution lists and pull across embedded data from such lists while maintaining anonymous responses.



Audience polling

Getting a show of hands is one thing but sometimes you might want to get a bit more sophisticated in terms of audience polling. As well as traditional survey tools, the University has an institutional licence for Mentimeter which can be used to provide real-time feedback in a presentation setting:

There's also the Q&A tool available in Google Slides:

And if you're presenting online, Zoom has a polling tool built in:

Cleaning data

It's one thing finding some data, but you probably need to manipulate it in some way before you can interrogate it...

Understanding the data you have

We might think of ‘data’ as values stored without context. Through processing that data we can seek to provide context and determine meaning. But even simple spreadsheet operations require us to have some understanding of what's in that dataset, and what constitutes ‘good’ data in the first place.

As an example, let’s ‘deconstruct’ some information:

“The appointment with Dr Watt is on Tuesday at 2:30pm at the Heslington Lane surgery.”

This information contains the following fields of data:

  • Who the appointment is with
  • The day (date) of the appointment
  • The time of the appointment
  • The location of the appointment

If you wanted to record appointments in a computer-based system you would need to use separate ‘fields’ for these — which in a spreadsheet might translate to separate columns.

When faced with an existing dataset, our first challenge might well be to reverse this process and rebuild our understanding of what information these fields convey. If you've got the data from a third-party source, look out for any explanatory notes that might help you with this.

Data types

Data processing systems struggle if you don’t stick to recognised data types, or if you add in values that don’t match others in the same context, For instance, in addition to text, spreadsheets observe the following special data types:

Data typeGoodUnrecognisable
Number5
1.6
-350
0.105
About 10
>5
10-15
25cm
Date/time01/01/2000
23-11-1963
15:30
17:16:20
01.01.2000
Mon or Fri
Next Tuesday
About 10:30
BooleanTrue
False
Maybe
?

For software to be able to analyse a number or a date, it needs a number or a date that it can parse — that it can understand and calculate with. If a value doesn't match the necessary rules to qualify as 'parsable', it will be treated as text. This may have an affect on how you're able to interrogate that data. If you represent a number or date in a way that does not allow the program to determine its type correctly, you will not be able to sort and filter correctly, you will not be able to add up, find averages, find the interval between two dates, etc... You might be able to understand that 20 + c.10 = c.30, but a computer can't make that leap. You're going to have to clean your data.

Cleaning

The success of any data processing will depend in large part on the quality of the source data you're working with. Data is often messy: columns might contain a mix of text and numerical data; some rows may have missing data; or perhaps you're trying to mash together two separate spreadsheets and the column names don’t quite match, or people have used a label in slightly different ways.

This is when you need to clean your data (a process also known as data munging or data wrangling). You need your data to be in a useful shape for your needs: if you're analysing or visualising data, what information (and types of data) does that analysis or visualisation require?

It’s all about ensuring that your data is validated and quantifiable. For instance, if you have a column of 'fuzzy' dates (e.g. c.1810 or 1990-1997), you might want to create a new column of 'parsed' dates — dates that are machine-readable (e.g. 1810, 1990). This might mean that you're losing some information and nuance from your data, and you'll need to keep that in mind in your analysis. But you'll at least have quantifiable data that you can analyse effectively.

For small, straightforward datasets, you can do data cleaning in a spreadsheet: ensure that numbers and dates are formatted as their appropriate data type, and use filters to help you standardise any recurring text. Excel even has a Query Editor tool that makes a lot of this work even easier.

The larger a dataset, the harder it is to work with it in a spreadsheet. Free tools like OpenRefine offer a relatively friendly way to clean up large amounts of data, while programming languages like R and Python have functions and libraries that can help with the tidying process.

Data structures

The way your data is laid out has an impact on how you can analyse it.

'Flat file' and relational data

Data is conventionally displayed as a two-dimensional table (rows and columns). Generally this will be laid out as a relationship between a case (a 'tuple') in each row, and its corresponding attributes (each with their own data type) in columns. Take this example of list structured data from a student fundraiser:

ABCDEFGHI
1Student ID ForeameSurnameYearCollegeBean bath10k runParachute jumpTandem joust
21001 DavidJones2Derwith60.0075.50 55.00
31002 FarrokhBulsara1Alcricke70.0085.0045.50
41003 CatherineBush2Langbrugh 65.5095.5035.00

Sometimes a single 'flat file' table of rows and columns is not enough. For instance:

You need to work with information about people and the research projects they are involved in. There will be several fields of data about the people, but also several about the projects.

It would be impossible to design one table that is suitable to hold all the data about people and projects, so in this case we create separate tables – one for people and one for projects – and find ways to express the connections between them.

In this example, one person can be involved in many projects, and one project can involve many people. This is a clear indication that the data is relational, and any attempt to work with it using a simple table will entail compromises.

This approach marks out the fundamental difference between a spreadsheet and a relational database.


Reshaping your data

Even the fundraising example in the table above may be better thought of as multiple tables: one table could index the students alongside their forenames, surnames, year, and college; a second table could list all the bean bathers (by Student ID) and the corresponding amount raised; a third could list the 10k runners, etc.

Depending on the analysis you need to do, it may be necessary to restructure your data. One common approach is to reorganise your data into what we might call a 'pivotable' format.

In our student fundraiser example, we have multiple columns all sharing the same attribute: amount raised. We might therefore look to move all these values into a single column:

ABCDEFG
1 Student IDForenameSurnameYearCollegeActivityAmount
21001 DavidJones2DerwithBean bath60.00
31001 DavidJones2Derwith10k run75.50
41001 DavidJones2DerwithTandem joust55.00
51002 FarrokhBulsara1AlcrickeBean bath70.00
61002 FarrokhBulsara1AlcrickeParachute jump45.00
71002 FarrokhBulsara1AlcrickeTandem joust85.00
81003 CatherineBush2Langbrugh10k run65.50
91003 CatherineBush2LangbrughParachute jump95.50
91003 CatherineBush2LangbrughTandem joust35.00

This table looks unusual when we're used to seeing one row per student. Now it's effectively one row per fundraising performance (we might even imagine a unique ID ascribed to each activity a student performs). But it means that all the fundraising amounts are now in the same column (G): we can get a total for that column very easily, and can even filter based on the activity, the student, or any other field. If we're using a spreadsheet, we can use this data in a pivot table, and if we're looking to make a visualisation, this is also the ideal format for a lot of visualisation tools.

Restructuring data is not always straightforward. But some of the data wrangling tools below may help you. We've also got some guidance on using spreadsheets to unpivot 'pivoted' data.

Data sources

How do you get hold of data and statistics? Take a look at our guide to the different data sources available: