At York there are two main survey tools: Google Forms and Qualtrics.
In the slides below we take a look at how they work, and consider the respective pros and cons of each.
When putting together a form or a survey, there are a few things to consider. In terms of conducting research, you will need to select an appropriate balance of quantitative and qualitative questions, and the sort of questions you will need to ask will vary according to your discipline. However, there are also some broader principles of design that you bear in mind:
What is it you're trying to achieve? Get a clear understanding of what information you actually need. It will inform your choice of tools, your question design, and your analysis.
Don't waste people's time on redundant questions! The longer your survey, the less likely people will get to the end of it. Just focus on collecting what is absolutely necessary.
What methods will you use to crunch the responses? This will inform your question types.
How you ask a question will affect the response you receive. Make sure you're asking what you think you're asking. Don't be afraid to ask other people; it's a survey after all!
Lock down the possible answers to your questions to give yourself something you can count without having to read every single response...
Don't ask for data you don't need. And if you're collecting personal or sensitive information, you need to think seriously not only about the questions you ask but also about the way you ask them, the tools you use, and the ways you'll be using the data afterwards.
What's your distribution plan? Or are you just going to spam people with a link they'll never click? Who's allowed to take your survey, and will you allow them to fill it in more than once? Do you know in advance who you're targetting, and if so, can that help you to cut down the questions in your survey? Are people completing the survey online or on paper? Desktop or mobile? If it's a long form, will they be able to take a break from filling it in? Will you send reminders? How many responses will you need?
If you're using a survey tool like Google Forms or Qualtrics, which can best accommodate your survey design, your distribution plan, and your subsequent analysis?
If a respondent doesn't need to see a question, don't ask them it. You can use branching in a survey to reduce the number of irrelevant questions served up. But the survey still needs to flow logically, and you'll need to test every eventuality. Is everyone seeing the questions they need to see?
Test everything. Read and re-read any introductory text to make sure it makes sense. Test that the questions are understandable. Do some test distributions. Get other people to test it. Gather some test data and make sure you can do what you need to do with it. The last thing you want is for people to not complete your survey because they're frustrated with it, and the second last thing you want is a load of responses that don't make sense to you.
Send it to people and hope. If you need to send reminders, send them. But avoid spamming people. If you spot any problems with your survey once it's released, you'll need to be careful that any modifications don't invalidate the results you've already collected.
Google Forms is part of the Google Apps for Education suite.
It is great for simple forms and surveys and has the benefit that the survey can be locked down to University of York members as a whole, as well as to specific individuals.
Responses can be fed in real time to a Google Sheet.
You need the data you collect to be useable, and those uses will probably include analysis. It's therefore important that you ask the right questions, and set them up in such a way that you can work effectively with the responses you get.
If we're analysing data, we might recognise specific types of information:
Choosing the right question type can help ensure we get the right sort of data back: useable data we can work with.
We can limit the range of potential responses by asking the right questions. Suppose we asked...
How many siblings do you have?
We're probably wanting a numeric answer. But it might not be that simple: What about step-siblings? And is it just alive siblings that count? Your respondent may feel it necessary to provide an essay response to this question, which is going to make your analysis an awful lot harder.
You need to consider as many eventualities as possible when formulating survey questions, and provide as much clarity as you can. It might also be helpful to restrict how your respondents can respond — a process called validation.
If we ask someone...
What did you think of the survey?
...we might expect a range of answers: "I thought it was a bit long", "It was great!", "Well cool", "Rubbish", "Graet!", "I found the question about siblings a bit tricky", etc.
This sort of qualitative answer can be valuable, but it can also be difficult to analyse.
Instead we could provide a controlled multiple-choice list of responses from which our respondents can pick: something that's then going to be a lot easier to count.
Another common approach is a five-point or ten-point scale: "Please rate this survey out of 10", or whatever. Suddenly you're getting a numeric response that you can do proper maths with. But there’s a trade-off: the more you force a respondent to give a quantitative response, the less qualitative information (and the less nuance) you’ll potentially receive.
Even text questions can use validation: for instance to check that a response matches the pattern of an email address, phone number, or postcode. You can also put limits on the amount being written, or accept only numbers. In fact, with regular expression rules you can do very elaborate validation of a 'free'-text response. Just be careful not to annoy your respondents by making your survey too hard to complete. If rules apply, make them clear!
Qualtrics has far more question types than Google Forms, with advanced options when it comes to structuring and branching your survey. It can also use distribution lists and pull across embedded data from such lists while maintaining anonymous responses.
How do you get hold of data and statistics? Take a look at our guide to the different data sources available:
It's one thing finding some data, but you probably need to manipulate it in some way before you can interrogate it...
We might think of ‘data’ as values stored without context. Through processing that data we can seek to provide context and determine meaning. But even simple spreadsheet operations require us to have some understanding of what's in that dataset, and what constitutes ‘good’ data in the first place.
As an example, let’s ‘deconstruct’ some information:
“The appointment with Dr Watt is on Tuesday at 2:30pm at the Heslington Lane surgery.”
This information contains the following fields of data:
If you wanted to record appointments in a computer-based system you would need to use separate ‘fields’ for these — which in a spreadsheet might translate to separate columns.
When faced with an existing dataset, our first challenge might well be to reverse this process and rebuild our understanding of what information these fields convey. If you've got the data from a third-party source, look out for any explanatory notes that might help you with this.
Data processing systems struggle if you don’t stick to recognised data types, or if you add in values that don’t match others in the same context, For instance, in addition to text, spreadsheets observe the following special data types:
Mon or Fri
For software to be able to analyse a number or a date, it needs a number or a date that it can parse — that it can understand and calculate with. If a value doesn't match the necessary rules to qualify as 'parsable', it will be treated as text. This may have an affect on how you're able to interrogate that data. If you represent a number or date in a way that does not allow the program to determine its type correctly, you will not be able to sort and filter correctly, you will not be able to add up, find averages, find the interval between two dates, etc... You might be able to understand that 20 + c.10 = c.30, but a computer can't make that leap. You're going to have to clean your data.
The success of any data processing will depend in large part on the quality of the source data you're working with. Data is often messy: columns might contain a mix of text and numerical data; some rows may have missing data; or perhaps you're trying to mash together two separate spreadsheets and the column names don’t quite match, or people have used a label in slightly different ways.
This is when you need to clean your data (a process also known as data munging or data wrangling). You need your data to be in a useful shape for your needs: if you're analysing or visualising data, what information (and types of data) does that analysis or visualisation require?
It’s all about ensuring that your data is validated and quantifiable. For instance, if you have a column of 'fuzzy' dates (e.g. c.1810 or 1990-1997), you might want to create a new column of 'parsed' dates — dates that are machine-readable (e.g. 1810, 1990). This might mean that you're losing some information and nuance from your data, and you'll need to keep that in mind in your analysis. But you'll at least have quantifiable data that you can analyse effectively.
For small, straightforward datasets, you can do data cleaning in a spreadsheet: ensure that numbers and dates are formatted as their appropriate data type, and use filters to help you standardise any recurring text. Excel even has a Query Editor tool that makes a lot of this work even easier.
The larger a dataset, the harder it is to work with it in a spreadsheet. Free tools like OpenRefine offer a relatively friendly way to clean up large amounts of data, while programming languages like R and Python have functions and libraries that can help with the tidying process.
The way your data is laid out has an impact on how you can analyse it.
Data is conventionally displayed as a two-dimensional table (rows and columns). Generally this will be laid out as a relationship between a case (a 'tuple') in each row, and its corresponding attributes (each with their own data type) in columns. Take this example of list structured data from a student fundraiser:
|1||Student ID||Foreame||Surname||Year||College||Bean bath||10k run||Parachute jump||Tandem joust|
Sometimes a single 'flat file' table of rows and columns is not enough. For instance:
You need to work with information about people and the research projects they are involved in. There will be several fields of data about the people, but also several about the projects.
It would be impossible to design one table that is suitable to hold all the data about people and projects, so in this case we create separate tables – one for people and one for projects – and find ways to express the connections between them.
In this example, one person can be involved in many projects, and one project can involve many people. This is a clear indication that the data is relational, and any attempt to work with it using a simple table will entail compromises.
This approach marks out the fundamental difference between a spreadsheet and a relational database.
Even the fundraising example in the table above may be better thought of as multiple tables: one table could index the students alongside their forenames, surnames, year, and college; a second table could list all the bean bathers (by Student ID) and the corresponding amount raised; a third could list the 10k runners, etc.
Depending on the analysis you need to do, it may be necessary to restructure your data. One common approach is to reorganise your data into what we might call a 'pivotable' format.
In our student fundraiser example, we have multiple columns all sharing the same attribute: amount raised. We might therefore look to move all these values into a single column:
This table looks unusual when we're used to seeing one row per student. Now it's effectively one row per fundraising performance (we might even imagine a unique ID ascribed to each activity a student performs). But it means that all the fundraising amounts are now in the same column (G): we can get a total for that column very easily, and can even filter based on the activity, the student, or any other field. If we're using a spreadsheet, we can use this data in a pivot table, and if we're looking to make a visualisation, this is also the ideal format for a lot of visualisation tools.
Restructuring data is not always straightforward. But some of the data wrangling tools below may help you. We've also got some guidance on using spreadsheets to unpivot 'pivoted' data.
Forthcoming sessions on :
There's more training events at: