On this page we explore the wonderful world of digital audio...
We start by getting our ears around what it actually is, before taking a look at waveforms and how to edit them, and considering how to get the volume right. We investigate some common effects we can apply to our recordings, and how we can work with more than one audio track at the same time.
As usual for this guide, you'll also find on this page a copy of a presentation about audio editing, a breakdown of available wave editing software at York, a look at some common audio filetypes, and suggestions for sourcing sounds from the internet.
Sound is a wave-like vibration of air (or other material). We can capture that sound by capturing an impression of that vibration.
In a gramophone recording, the sound vibrated against a needle which cut a correspondingly jagged groove in a rotating disc. During playback, the groove vibrates the needle which vibrates the air. You can test this with a vinyl record and a five pound note:
This is an analogue recording: the shape of the soundwave is essentially directly captured, like for like, in the recording medium (in this case the jagged groove of the vinyl) — much like how a seismograph draws a wavy line when jostled by an earthquake.
The wave might change medium more than once before it's captured: to be recorded on magnetic tape, the vibration hits a microphone which produces a corresponding electrical current which produces a corresponding magnetic field. But whatever the medium, it's ultimately a direct replication of the original waveform.
Digital sound is not like analogue sound... it’s more like animation: digital sound is a series of still images arranged together to create the illusion of audio. The more still images taken, the more accurate the representation of the waveform.
In the above graph, the blue samples are snapshots of the sound wave. There are two dimensions to those samples: the sample rate (x-axis) and the bit depth (y-axis). Generally you'd just press record and not worry about it, but here's the gory details:
The sample rate is the number of samples being made in a given period of time. Think of it like frames of film....
The above gif is made up of 15 still images (numbered 2-16) animating at 10 frames per second (10 fps or 10 Hz). Most cinema film animates at 24 fps (24 Hz): that's sufficient to trick the eye for most humans.
Sound needs a lot more frames to be understandable, not least because sound itself is made up of tiny vibrations: human hearing can discern vibrations between 20 and 20,000 Hz as different pitches, so 24 samples per second is not going to come remotely close to replicating the nuances of those sounds. A sample rate of 8,000 Hz serves for basic telephone communication (though ess-es sound like effs), but to properly capture human hearing you need to be able to pick up two points in the vibration at the highest pitch we can hear (in other words double 20,000 Hz). That's why CD audio has a sample rate of 44,100 Hz (or 44.1 kHz).
The free sound editing tool Audacity supports rates from 8 kHz to 384 kHz. In theory, 60 kHz is considered to be more detailed than the human ear can discern, and 48 kHz is considered the standard rate for most uses.
Bit depth is the amount of information being recorded in each sample. If samples are like frames of animation, bit depth is like the size of the image being animated, or, more accurately, like the number of colours being used in that image.
The above image uses a 4-bit colour palette: that is to say that it is made from 16 colours (15 in binary is 1111 — in other words 4 binary digits — and zero makes 16). Because there are only 16 colours to play with, the natural colour of the rabbit photo is reduced to blocks of the nearest available colour in the palette.
With audio bit depth, the 'colours' are amplitudes (amplitude is the amount of vibration occurring in the sound medium — the volume of the sound). The amplitude of a wave at a particular sample point is rounded to the nearest available value ('quantization'). Let's have that graph again from earlier to see that happening:
The greater the bit depth, the more accurate the representation of the sound (the 'resolution'). CD audio uses a bit depth of 16 which gives a resolution of 65,536 possible amplitudes. To carry on with our colour palette analogy, that's something looking more like this:
24-bit audio offers over 16 million values and is considered to be of a professional standard. It's a close approximation to human hearing capabilities.
In addition to 16 and 24 bit formats, the free sound editing tool Audacity has a 32-bit option, albeit using a 'floating point' method which gives a greater rounding error at larger values. For most purposes, 24-bit is more than sufficient.
Most audio editing is done using a waveform. Here's a waveform of one of us trying to sing:
What you've got here is the vibrations of sound plotted over time (x-axis). The louder the sound, the greater the vibration: the amplitude (y-axis). When there is little to no sound, you get a straight horizontal line. When there is sustained noise, you get sharp zig-zags of activity diverging from that line.
Moments of activity can easily be identified in a waveform. For instance, if we zoom in we can see distinct syllables of words:
Where… |
the |
du |
– |
cks |
play |
foot |
– |
ball… |
The higher the spike, the louder the sound. For instance, the 'ck' sound in 'duck' peaks higher than the rest of the waveform in the example above — it is louder than the rest of that section. Plosive sounds like 'p', 't', and 'k' often pick up particularly strongly on microphones owing to the blast of air they produce (this is why some microphones are fitted with pop shields that mitigate this effect).
Waveforms are pretty easy to edit: you can select bits of the wave in the same way that you would select text in a text editing program like Word or Google Docs. And you can cut and paste in a wave editor like Audacity in much the same way that you can in a text editor. You could, for instance, quite easily cut out the word "foot" from the above waveform without it sounding too weird.
The closer the waveform is to a flat line at both sides of an edit, the 'cleaner' (less jarring) the cut will tend to be. So editing words out of a passage of slow, deliberate, well enunciated speech, delivered in an otherwise silent room, should be very straightforward. But often bits of speech blend into each other, so it can be hard to get a clean cut. And if there's ambient sound (like background noise), that will make any edits show too, a bit like it would in a photograph:
Wave editors may have an option to correct your selection to start and end on zero crossings — parts of the wave that have an amplitude of zero (and so are bang on that middle horizontal line). This is a bit like when Microsoft Word tries to highlight whole words for you. If both sides of your selection are at zero, you'll get a neater cut, a bit like if we were to make our cuts to that photograph at spots where the hill was at the same height:
You can adjust your selection to zero crossings in Audacity by pressing the "z" key. But bear in mind that this will move your selection, and it might not be possible to select what you need in this way. Sometimes you'll have to compromise.
In analogue days, when people were cutting up and reassembling bits of magnetic tape (a process called splicing) it was common to splice the tape at a 45° angle: this would blur the sounds together slightly, rather than giving a hard cut, a bit like if we were to blur the hard line in the above photograph. Some wave editors like Adobe Audition have a Mix Paste option that will do a similar thing. Otherwise, you could achieve the same basic effect using a short crossfade in a multitrack session.
The above waveform examples are mono waveforms. If you're recording from a stereo source (a source with left and right channels) you will see two parallel waveforms in the editor. Some editors will let you edit each of these tracks separately. To do this in Audacity you would need to use the dropdown in the track controls (to the left of the wave) to split the stereo track into two mono tracks. We'll explore this concept a bit more when we look at multitrack editing.
The amplitude of a wave is effectively its volume: its loudness. The extent of this amplitude is indicated by the vertical axis of the waveform: the bigger the peak of a wave, the louder it is.
When we're making a digital recording, we're essentially drawing a waveform on a canvas and that canvas has edges — its full scale (which is to some extent determined by the bit depth). When we're colouring in in a colouring book, we're told not to go over the edges. The same is true of digital recording.
Different applications express amplitude in different ways. Audacity uses a decimal scale to indicate the relative amplitude. The full scale (highest possible peak) is expressed as 1 / -1, with silence at 0. Adobe Audition uses a logarithmic dBFS (decibels below full scale) measure where 0 / -0 are the full scale limits and ∞ is silence.
If you're recording audio, you'll need to ensure that the input volume is neither too loud nor too quiet. There may be multiple options to adjust the input volume — the gain — be it in your recording application, on the microphone itself, or in the settings for the device you're using. If the gain is set too low, the subject of your recording may be lost in noise (ambient noise from the room, and electronic noise caused by your equipment). If set too high, your audio will clip — in essence, the waveform will be peaking too high for your recording and the top part of the wave will get cut off:
Ideally, you want your gain set so that the loudest sounds you're likely to capture are only just touching the top of the waveform representation while the quietest sounds you need are still audible. You might need to do some test recordings to get it right.
The video above talks you through the basics of getting sound into Audacity at the right volume. It's using an old operating system but the principles are still the same. To get to the Control Panel (in order to change sound settings) on a more recent version of Windows, the easiest thing to do is to press the Start button and type "control panel". The video talks about setting levels in Audacity itself from 2:23.
Wave editors will have many options for doing things to the volume of a wave. For a start, the there will likely be a basic Amplify setting that lets you modify the volume of a selection as a whole. But there are some other tools to be aware of, too:
The normalise option typically adjusts the overall volume of a selection so that the highest peak in the wave is matched to the full scale, with everything else amplified relative to that. This is particularly useful if your recording is quite quiet and isn't getting close to the metaphorical edge of the paper.
You might think that normalising would be useful for boosting multiple recordings to the same volume level. But that's not how normalising works. Take these two samples of audio from the same recording:
If we normalised each sample independently they'd look like this:
The sample on the left wouldn't change much because it's already peaking quite highly, while the sample on the right would become really quite a lot louder. That's despite both samples being generally of a similar recording level (the peaks in the first sample might just be popping consonants). If you're working with a single audio recording you're intending to chop up into multiple files, and you're wanting to apply normalisation at some point, be sure to do the normalisation to the whole file before you start any cutting up, otherwise you might actually end up with some very inconsistent volumes.
A fade is a gradual increase (or decrease) in volume. The rate of change is usually linear and most often will be from (or to) silence. Audacity has simple fade effects that apply a linear fade from (or to) silence across a selection, but it also has an Adjustable Fade option to allow fine-tuning (including non-linear fades and fades from (and to) other volume levels).
In multitrack editing you might need a finer degree of volume control. We've got a specific section on multitrack editing that looks at these options in more detail.
Wave editors have a range of effects. Here are a few core ones to know about. We'd recommend keeping regular copies of your files as you work on them, just in case you later decide you want to go back to an earlier version.
Background noise is a fact of life. Because of this, most wave editors have a noise reduction tool. These usually work from a sample of noise (a noise profile) that you need to provide to the tool in advance. You can then use that noise profile to remove similar noise from elsewhere.
Generally speaking, the longer the noise profile you provide, the better the noise reduction. But if your profile is too noisy, the results of your noise reduction might sound weird (like an underwater robot or something) and harder to listen to than the original 'noisy' version!
With analogue sound there is a direct relationship between the speed of something and its pitch. After all, the shorter the wavelength, the higher the frequency. If you play back an analogue recording at double speed, it will go twice as fast (double the tempo) and sound at double pitch (in other words, it will sound an octave higher, a bit like a Chipmunk). If you play back an analogue recording at half the speed, it will go half as fast (half the tempo) and sound at half the pitch (in other words, an octave lower, a bit like Barry White).
In fact, the way that the Chipmunks' songs were created (and the songs of similar acts like Pinky & Perky) were that they were recorded at a slower speed (a slower tempo) and then sped up to the right speed (increasing the tempo and the pitch). Search YouTube for Chipmunks at the original speed, we dare you!
Wave editors will let you adjust the speed of a wave in the same way, simultaneously effecting tempo and pitch. But digital technology also allows us to resample the audio: remember that digital audio is a series of samples — snapshots of a wave. Those snapshots can be processed individually; spaced out differently; rescaled. In other words we can play around with the pitch without altering the tempo, and vice versa: we can make ourselves sound like Chipmunks without having to record our songs really slowly; or we can speed up the pace of speech without it sounding like a bunch of singing rodents.
Some programs, including Audacity, will allow you to perform a Sliding Stretch whereby the pitch or tempo shift is applied gradually (rather than uniformly) across your selection. You could use this method to give the effect of, say, a robot slowly powering down, or a merry-go-round speeding up to a terrifying pace.
While we're considering these 'mechanical' aspects of audio manipulation, it's also worth mentioning another effect that predates digital audio: playing the recording backwards. Backwards audio sounds weird, which is part of its enduring appeal. So, inevitably, most wave editors will let you reverse your wave.
Most wave editing software allows you to add reverb (short for reverberation) effects to your audio. Reverb is essentially a slight 'ambient' echo. Some programs may even include specific room profiles so that you can easily make your audio sound like it was recorded in a different sort of room (anything from a small bathroom to a huge auditorium). In other programs you might need to use a bit of trial and error to get the level of reverb you want. Reverb is often used to make a music recording sound more 'professional' but be careful not to overdo it!
As well as reverb, you could be able to apply a more explicit echo or delay effect: essentially playing multiple versions of your audio with a set delay between them. Audacity has a couple of options for this, but you could also achieve similar effects by multi-tracking several copies of your audio.
Equalisation (EQ) tools allow you to boost or restrict particular frequencies (pitches) across your waveform. For instance, you might want to boost lower frequencies for a bass-heavy booming sound. There are usually a few different options to plays with, such as Graphic Equalisation, and Filter Curve Equalisation, along with other Filter effects for restricting high or low frequencies. There are also tools like Limiters, Gates, and Compressors that do similar sorts of things using the amplitude rather than the frequency. Have a play with the tools you have available to see the kind of things they do.
So far we've just looked at modifying a single wavefile, but very often you will have multiple wavefiles you'll want to combine into a collage or mix. For that sort of thing you'll want to build a multitrack project.
The most obvious example of multitrack mixing is in music. Imagine a band with a drummer, a guitarist, a bassist, and a vocalist. Each band member might be recorded to their own individual wavefile (a track), with those four waves then assembled into a multitrack project. The four tracks can be lined up to be in sync with each other, and their volumes can be set to the right levels with respect to each other.
For each track in a multitrack mix, you can control certain properties at an overall track level:
You can set the overall 'input' volume (the gain) for the track. This is useful where a track is just generally louder or quieter than other tracks in the mix.
You can adjust where abouts the wave is positioned within the stereo field (its balance). For instance, you could have the wave playing only in the left speaker, only in the right speaker, or anywhere between.
This control (sometimes just shown as M in some programs) will silence (mute) the track during playback. This is useful when you're working on the mix and want to be able to hear what the other tracks are sounding like without this track.
This control (sometimes just shown as S in some programs) will mute every track except this track, so you can hear only the track you've selected — useful when assembling your mix and working with this track alone.
In addition to the global track settings, and to the general track-editing tools already discussed, you can also make use of envelope controls. These may also be available for single track editing, but come in particularly useful in a multitrack setting.
An envelope tool allows you to draw dynamic changes in volume across your wave: by adding new markers (vertices) to the envelope line and dragging them into place, you can fade different parts of your wave in and out to different levels as required, without having to modify the wave itself. Imagine you've got a bass part just plodding along and then suddenly there's a bass solo: you could use the envelope control to up the volume of the bass at this point and make it more prominent in the mix, before dropping it again afterwards.
Audacity's envelope tool only controls volume, but other programs often let you adjust the balance in this way too (a process known as panning). This allows you to create dynamic pans across the stereo field (from one speaker to another).
If you wanted to create a dynamic pan in Audacity you would need to double track your wave (have two copies of it in the mix), with one panned to the left and the other panned to the right in the track controls. To pan from left to right, you'd need to use the envelope control to fade out the 'left' channel, and fade in the 'right' channel.
When you save a multitrack session it will be saved as a special multitrack filetype so that you can return to it and do further modifications to the mix. These filetypes are usually native to the program and aren't something that you could easily play in anything else. To mix down your session to a conventional audio filetype you'll need to export it and choose the filetype you want.
Not everyone will be able to hear the audio you've created. You might therefore need to provide some sort of written summary or transcript...
Ever wanted to create a soundtrack? Like the idea of mixing audio? We provide some tips, principles and examples on how to get creative with sound.
Audacity is a free audio-editing tool. It can be downloaded for use on your own computer but it's also available on campus machines.
There are also lots of online resources on editing audio with Audacity (for example, here is a lesson from Programming Historian on Editing Audio with Audacity) and searching online for what you're trying to do will bring up suggestions that you can critically evaluate to see if they'll be useful.
Adobe Audition is a more advanced audio-editing tool. It's available as part of the Adobe Creative Cloud suite.
There's also a range of free online tools for sound creation and editing. Here's a selection of interesting tools and resources for use at your own risk:
There are a number of audio file formats. The most common are:
Waveform audio file format (.wav) is the standard filetype for digital audio. It is used for professional and archival purposes. Wave files are generally an uncompressed representation of the string of samples that make up a digital recording.
Because they are generally uncompressed, wave files need to record every point of data in the file. Consequently, file sizes can be very large. A CD-quality 16-bit 44.1 kHz stereo recording takes up about 10 MB of file-space per minute:
44100 samples
x 16 bits of resolution
x 2 channels
= 1,411,200 bits per second
= 176,400 bytes per second
= 172 KB per second
= 10.09 MB per minute.
MPEG-1 (or MPEG-2...) Audio Layer III (not that anyone calls it that) is the most common audio file format because it generally has much smaller file sizes than a wave file. It uses 'lossy' compression a bit like the compression used to make a JPEG image file. As with JPEG, there's the ability to select the 'quality' of the file, in this case in the form of the bitrate (the number of kilobits of data being used each second): the lower the bitrate, the smaller the size, but the more compression artifacts (in other words, the worse the sound quality).
The free sound editing tool Audacity supports a range of bitrates up to 320 kbps, but even this will lose some information from the original recording. The video below demonstrates this by isolating the lost information from a 320 kbps encoding of "Tom's Diner" by Suzanne Vega:
Audacity has a range of mp3 encoding options including a number of preset settings.
And then there's...
MIDI (Musical Instrument Digital Interface) files are not waveform files but they're worth mentioning. They're kind of like the audio equivalent of an SVG file but for music. They're essentially a set of instructions that can operate a musical instrument, a bit like a digital version of one of those old player-piano rolls:
Indeed MIDI files are often displayed in a format that looks a lot like a piano roll, because fundamentally they're the same technology:
Audacity can play a MIDI file but it can't do much else with it. To create MIDI files you'd either need a MIDI instrument or MIDI editing software. Some MIDI software will even let you work using stave notation:
Always bear in mind that published sounds are always subject to copyright law, so you can’t just use any sound you want.
Fortunately, there are plenty of free-to-use sounds out there. You can find some here:
For more information and advice, take a look at:
If you're looking to do audio editing to help you make a podcast, you might also want to explore our guidance on making podcasts to help you think about the format and style. You'll want to focus on editing speech and getting pacing right, so it may also be useful to think about video editing techniques for interviews as well.
Forthcoming sessions on :
There's more training events at: