When putting together a form or a survey, there are a few things to consider. In terms of conducting research, you will need to select an appropriate balance of quantitative and qualitative questions, and the sort of questions you will need to ask will vary according to your discipline. However, there are also some broader principles of design that you bear in mind:
What is it you're trying to achieve? Get a clear understanding of what information you actually need. It will inform your choice of tools, your question design, and your analysis.
Don't waste people's time on redundant questions! The longer your survey, the less likely people will get to the end of it. Just focus on collecting what is absolutely necessary.
What methods will you use to crunch the responses? This will inform your question types.
Analysing data | ||
| ||
Spreadsheets |
How you ask a question will affect the response you receive. Make sure you're asking what you think you're asking. Don't be afraid to ask other people; it's a survey after all!
Lock down the possible answers to your questions to give yourself something you can count without having to read every single response...
Don't ask for data you don't need. And if you're collecting personal or sensitive information, you need to think seriously not only about the questions you ask but also about the way you ask them, the tools you use, and the ways you'll be using the data afterwards.
Information security |
What's your distribution plan? Or are you just going to spam people with a link they'll never click? Who's allowed to take your survey, and will you allow them to fill it in more than once? Do you know in advance who you're targetting, and if so, can that help you to cut down the questions in your survey? Are people completing the survey online or on paper? Desktop or mobile? If it's a long form, will they be able to take a break from filling it in? Will you send reminders? How many responses will you need?
If you're using a survey tool like Google Forms or Qualtrics, which can best accommodate your survey design, your distribution plan, and your subsequent analysis?
If a respondent doesn't need to see a question, don't ask them it. You can use branching in a survey to reduce the number of irrelevant questions served up. But the survey still needs to flow logically, and you'll need to test every eventuality. Is everyone seeing the questions they need to see?
Test everything. Read and re-read any introductory text to make sure it makes sense. Test that the questions are understandable. Do some test distributions. Get other people to test it. Gather some test data and make sure you can do what you need to do with it. The last thing you want is for people to not complete your survey because they're frustrated with it, and the second last thing you want is a load of responses that don't make sense to you.
Send it to people and hope. If you need to send reminders, send them. But avoid spamming people. If you spot any problems with your survey once it's released, you'll need to be careful that any modifications don't invalidate the results you've already collected.
From working out what you're asking and who you want to fill out your survey, to making sure you make your survey clear and have questions that people can actually fill out, there's a lot to think about when designing surveys. We look at 10 steps for designing forms and surveys, with prompts to help you think about your own data collection.
When we ask people questions in real life we get whatever response the people choose to give us. They have a free choice as to how they respond.
If we ask someone...
What's your favourite colour?
...we might expect a range of answers: single word responses like "Blue", "Red", "Green", etc., but also "Ooh, I don't know...", "I think probably yellow but I'm also rather partial to purple", or even a lengthy response detailing their thoughts on every colour, and how their favourite colour changes according to mood.
This sort of qualitative answer can be valuable: it tells us a lot more about a person's views than a simple one-word answer. But it can also be difficult to analyse. So surveys often look to close down questions to restrict the number of possible responses to something more measurable: more quantitative. We might reduce "What's your favourite colour" to a multiple choice range of options if we're just wanting a quick answer and we're not really interested in someone's life-story.
But perhaps our study is really interested in finding out the detail about people's colour preferences, and really does want to get some rich information back. In which case we'll want to offer a free text question on the topic.
And actually there are rather a lot of colours when you start to think about it, and pre-empting what colours people might choose could get difficult. In which case we might at the very least want a mix of the two approaches: an "If other..." question so that people can still say vermilion, heliotrope, or chartreuse, even if we overlooked them in our list of options.
Free text fields in a survey can have a number of uses, but generally speaking they let your respondent provide the response they want to give.
Free text fields generally come in two different flavours:
and
Essentially the difference comes down to the size of box. Google Forms and Qualtrics actually have a few different lengths available to suit your needs, but in principle we can think of text boxes just in terms of those two types.
Short answer boxes encourage brevity. They're useful in cases where we can't realistically pre-empt a range of possible answers, for instance with a Name field. But we might also use them elsewhere (like in our colour example) to give the hint that we don't want a long answer.
Long answer boxes give a respondent the sense that they're allowed to write at length (and the actual size of the box may go some way to indicating the length expected).
Multiple choice options are easily quantifiable in a way that free text isn't. The aim is to effectively restrict the range of possible responses to something more manageable and more countable. When completed, a multiple choice question gives us a uniform response, or a list of (usually comma-separated) uniform responses, which we can unpack and analyse.
Multiple choice questions come in two main flavours:
There's also
Once triggered, a radio button question can't easily be unselected, so you might want to make sure there are sufficient options to allow your respondent to 'escape' a situation where they're forced to inaccurately answer a question.
If you're making a multiple choice list, you'll need to be sure you're including every option you should. That may also mean including a way of specifying none of the above, not applicable, or other. While multiple choice questions are usually about limiting the choice available to respondents, it can sometimes be useful to offer a bit more freedom, and both Google Forms and Qualtrics allow you to specify an "Other" option with an accompanying free text field so that people can provided their own options.
Linear scale questions are a form of radio button multiple choice:
1 | 2 | 3 | 4 | 5 | ||
strongly disagree | strongly agree |
These are handy because you get a number to analyse rather than a piece of text that can be difficult to deal with. But the risk is that you just end up with an awful lot of 4s or 7s that don't really tell you as much as you might like.
If you're using questions like this, the convention is to go from least positive value (left) to most positive value (right). People don't always read what you're asking, so try to stick with convention.
If you've got several scale-type questions in a row, you can arrange them as a matrix table question:
Strongly disagree | Disagree | Neither | Agree | Strongly agree | N/A | |
---|---|---|---|---|---|---|
They group... | ||||||
...your questions... | ||||||
...together for... | ||||||
...more efficient... | ||||||
...use of space. |
If you're using multiple choice questions of any kind in Qualtrics (and particularly scale questions), be sure to check that the underlying values match what's being shown on screen, otherwise you might have problems when it comes to analysis.
Google Forms includes special date and time pickers which give properly formatted dates and times for analysis. Qualtrics has a whole load of additional question types including graphics-based questions, sliders, and ranking. Both also have the option to upload a file.
Even a free text field can be customised to behave in certain ways and only accept certain kinds of data. Take a look at the section on Validating data to find out more.
You need the data you collect to be useable, and those uses will probably include analysis. It's therefore important that you ask the right questions, and set them up in such a way that you can work effectively with the responses you get.
If we're analysing data, we might recognise specific types of information:
Choosing the right question type can help ensure we get the right sort of data back: useable data we can work with.
We can limit the range of potential responses by asking the right questions. Suppose we asked...
How many siblings do you have?
We're probably wanting a numeric answer. But it might not be that simple: What about step-siblings? And is it just alive siblings that count? Your respondent may feel it necessary to provide an essay response to this question, which is going to make your analysis an awful lot harder.
You need to consider as many eventualities as possible when formulating survey questions, and provide as much clarity as you can. It might also be helpful to restrict how your respondents can respond — a process called validation.
If we ask someone...
What did you think of the survey?
...we might expect a range of answers: "I thought it was a bit long", "It was great!", "Well cool", "Rubbish", "Graet!", "I found the question about siblings a bit tricky", etc.
This sort of qualitative answer can be valuable, but it can also be difficult to analyse.
Instead we could provide a controlled multiple-choice list of responses from which our respondents can pick: something that's then going to be a lot easier to count.
Another common approach is a five-point or ten-point scale: "Please rate this survey out of 10", or whatever. Suddenly you're getting a numeric response that you can do proper maths with. But there’s a trade-off: the more you force a respondent to give a quantitative response, the less qualitative information (and the less nuance) you’ll potentially receive.
Even text questions can use validation: for instance to check that a response matches the pattern of an email address, phone number, or postcode. You can also put limits on the amount being written, or accept only numbers. In fact, with regular expression rules you can do very elaborate validation of a 'free'-text response. Just be careful not to annoy your respondents by making your survey too hard to complete. If rules apply, make them clear!
Forthcoming sessions on :
There's more training events at:
At York there are two main survey tools: Google Forms and Qualtrics.
In the slides below we take a look at how they work, and consider the respective pros and cons of each.
Google Forms is part of the Google Apps for Education suite.
It is great for simple forms and surveys and has the benefit that the survey can be locked down to University of York members as a whole, as well as to specific individuals.
Responses can be fed in real time to a Google Sheet.
Qualtrics has far more question types than Google Forms, with advanced options when it comes to structuring and branching your survey. It can also use distribution lists and pull across embedded data from such lists while maintaining anonymous responses.
Getting a show of hands is one thing but sometimes you might want to get a bit more sophisticated in terms of audience polling. As well as traditional survey tools, the University has an institutional licence for Mentimeter which can be used to provide real-time feedback in a presentation setting:
There's also the Q&A tool available in Google Slides:
And if you're presenting online, Zoom has a polling tool built in:
It's one thing finding some data, but you probably need to manipulate it in some way before you can interrogate it...
We might think of ‘data’ as values stored without context. Through processing that data we can seek to provide context and determine meaning. But even simple spreadsheet operations require us to have some understanding of what's in that dataset, and what constitutes ‘good’ data in the first place.
As an example, let’s ‘deconstruct’ some information:
“The appointment with Dr Watt is on Tuesday at 2:30pm at the Heslington Lane surgery.”
This information contains the following fields of data:
If you wanted to record appointments in a computer-based system you would need to use separate ‘fields’ for these — which in a spreadsheet might translate to separate columns.
When faced with an existing dataset, our first challenge might well be to reverse this process and rebuild our understanding of what information these fields convey. If you've got the data from a third-party source, look out for any explanatory notes that might help you with this.
Data processing systems struggle if you don’t stick to recognised data types, or if you add in values that don’t match others in the same context, For instance, in addition to text, spreadsheets observe the following special data types:
Data type | Good | Unrecognisable |
---|---|---|
Number | 5 1.6 -350 0.105 | About 10 >5 10-15 25cm |
Date/time | 01/01/2000 23-11-1963 15:30 17:16:20 | 01.01.2000 Mon or Fri Next Tuesday About 10:30 |
Boolean | True False | Maybe ? |
For software to be able to analyse a number or a date, it needs a number or a date that it can parse — that it can understand and calculate with. If a value doesn't match the necessary rules to qualify as 'parsable', it will be treated as text. This may have an affect on how you're able to interrogate that data. If you represent a number or date in a way that does not allow the program to determine its type correctly, you will not be able to sort and filter correctly, you will not be able to add up, find averages, find the interval between two dates, etc... You might be able to understand that 20 + c.10 = c.30, but a computer can't make that leap. You're going to have to clean your data.
The success of any data processing will depend in large part on the quality of the source data you're working with. Data is often messy: columns might contain a mix of text and numerical data; some rows may have missing data; or perhaps you're trying to mash together two separate spreadsheets and the column names don’t quite match, or people have used a label in slightly different ways.
This is when you need to clean your data (a process also known as data munging or data wrangling). You need your data to be in a useful shape for your needs: if you're analysing or visualising data, what information (and types of data) does that analysis or visualisation require?
It’s all about ensuring that your data is validated and quantifiable. For instance, if you have a column of 'fuzzy' dates (e.g. c.1810 or 1990-1997), you might want to create a new column of 'parsed' dates — dates that are machine-readable (e.g. 1810, 1990). This might mean that you're losing some information and nuance from your data, and you'll need to keep that in mind in your analysis. But you'll at least have quantifiable data that you can analyse effectively.
For small, straightforward datasets, you can do data cleaning in a spreadsheet: ensure that numbers and dates are formatted as their appropriate data type, and use filters to help you standardise any recurring text. Excel even has a Query Editor tool that makes a lot of this work even easier.
The larger a dataset, the harder it is to work with it in a spreadsheet. Free tools like OpenRefine offer a relatively friendly way to clean up large amounts of data, while programming languages like R and Python have functions and libraries that can help with the tidying process.
The way your data is laid out has an impact on how you can analyse it.
Data is conventionally displayed as a two-dimensional table (rows and columns). Generally this will be laid out as a relationship between a case (a 'tuple') in each row, and its corresponding attributes (each with their own data type) in columns. Take this example of list structured data from a student fundraiser:
A | B | C | D | E | F | G | H | I | |
---|---|---|---|---|---|---|---|---|---|
1 | Student ID | Foreame | Surname | Year | College | Bean bath | 10k run | Parachute jump | Tandem joust |
2 | 1001 | David | Jones | 2 | Derwith | 60.00 | 75.50 | 55.00 | |
3 | 1002 | Farrokh | Bulsara | 1 | Alcricke | 70.00 | 85.00 | 45.50 | |
4 | 1003 | Catherine | Bush | 2 | Langbrugh | 65.50 | 95.50 | 35.00 |
Sometimes a single 'flat file' table of rows and columns is not enough. For instance:
You need to work with information about people and the research projects they are involved in. There will be several fields of data about the people, but also several about the projects.
It would be impossible to design one table that is suitable to hold all the data about people and projects, so in this case we create separate tables – one for people and one for projects – and find ways to express the connections between them.
In this example, one person can be involved in many projects, and one project can involve many people. This is a clear indication that the data is relational, and any attempt to work with it using a simple table will entail compromises.
This approach marks out the fundamental difference between a spreadsheet and a relational database.
Even the fundraising example in the table above may be better thought of as multiple tables: one table could index the students alongside their forenames, surnames, year, and college; a second table could list all the bean bathers (by Student ID) and the corresponding amount raised; a third could list the 10k runners, etc.
Depending on the analysis you need to do, it may be necessary to restructure your data. One common approach is to reorganise your data into what we might call a 'pivotable' format.
In our student fundraiser example, we have multiple columns all sharing the same attribute: amount raised. We might therefore look to move all these values into a single column:
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | Student ID | Forename | Surname | Year | College | Activity | Amount |
2 | 1001 | David | Jones | 2 | Derwith | Bean bath | 60.00 |
3 | 1001 | David | Jones | 2 | Derwith | 10k run | 75.50 |
4 | 1001 | David | Jones | 2 | Derwith | Tandem joust | 55.00 |
5 | 1002 | Farrokh | Bulsara | 1 | Alcricke | Bean bath | 70.00 |
6 | 1002 | Farrokh | Bulsara | 1 | Alcricke | Parachute jump | 45.00 |
7 | 1002 | Farrokh | Bulsara | 1 | Alcricke | Tandem joust | 85.00 |
8 | 1003 | Catherine | Bush | 2 | Langbrugh | 10k run | 65.50 |
9 | 1003 | Catherine | Bush | 2 | Langbrugh | Parachute jump | 95.50 |
9 | 1003 | Catherine | Bush | 2 | Langbrugh | Tandem joust | 35.00 |
This table looks unusual when we're used to seeing one row per student. Now it's effectively one row per fundraising performance (we might even imagine a unique ID ascribed to each activity a student performs). But it means that all the fundraising amounts are now in the same column (G): we can get a total for that column very easily, and can even filter based on the activity, the student, or any other field. If we're using a spreadsheet, we can use this data in a pivot table, and if we're looking to make a visualisation, this is also the ideal format for a lot of visualisation tools.
Restructuring data is not always straightforward. But some of the data wrangling tools below may help you. We've also got some guidance on using spreadsheets to unpivot 'pivoted' data.
How do you get hold of data and statistics? Take a look at our guide to the different data sources available: