• Wondering which camera, gear, computer, or software to buy? Ask in our Gear Guide.

Music Requirements

Due to cheap technology, there are a lot more people these days creating music but as the music business is in decline and making any money is so difficult, many turn to the film/tv scene. There are two types of composers working or trying to work in this field, people who compose films scores and people who compose library music. While there are many similarities between the two, there are also significant differences and different skill sets required. Ultimately though both are creating music for use in film/tv and there are technical and usage requirements which need to be fulfilled which are different from the almost complete lack of requirements in the music business. I'm creating this post to help to avoid the issues which seem to be increasing due to lower budgets causing producers and directors to hire less experienced composers and due to the internet allowing anyone to offer their music for use in film/tv.

Delivery: The audio format should be: 24bit 48kFs/S wav or aiff, with a maximum peak level of -6dBFS. If you're working with a 32bit (or higher) mix engine you do not need to dither to 24bit. If for some obscure reason you do need to dither, use a standard TDPF dither, never a noise shaped dither. If you're thinking of creating 5.1 music, don't! By all means work with surround (it's a growing market) but the .1 is for low frequency effects, not for music. So 5.0 is OK and 4.0 might be even better as it leaves the centre channel free for dialogue and Foley. If you are hired as a composer for a surround film/tv program, you will need to discuss with the director and the supervising sound editor or the re-recording mixer which channel format to use for the music.

Usage Considerations: With the exception of the end credits, the vast majority of the time your music will not be the most important audio element, the dialogue will and frequently some of the SFX or Foley will take precedence as well. If your music gets in the way of the dialogue I (the re-recording mixer) will get rid of it! I'll lower the level of your music, I'll EQ it to reduce the frequencies interfering with the dialogue or I might even add reverb to it, commonly I'll use a combination of these. In extreme situations I may have to edit the music out in part or entirely! If you're a hired composer you have the benefit of being able to orchestrate around the dialogue but even as a library music composer there are certain things you can do to help keep your music sounding as you intended and make the Director's and Re-recording mixer's job easier. For example, make your music mixes wetter than you otherwise would and use compression sparingly and only for musical reasons, not just to make your music louder (as you would in the music business).

Check Your Mix:
  • Check for reasonably good mono compatibility and very good stereo compatibility if you're working in surround.
  • Check your mix still works well at low levels. If the re-recording mixer has to lower the level of your music significantly to accommodate other audio elements, the perception of your mix will change. The most obvious change will be that the bass frequencies in your mix will seem quieter or will disappear entirely.
  • Check your mix still works well at high levels. Dubbing theatre systems are incredibly revealing, you need to make sure there is nothing in your mix which will make you look incompetent. Editing clicks and spurious noises completely invisible in an average music studio may suddenly become incredibly obvious in the dubbing theatre (and of course in the cinema or on a good home cinema system).
  • Check the bass! Cinemas often have subs which go down as low as 12Hz, even home cinema systems commonly go as low as 18-20Hz. Near field monitors and most other home and studio speakers usually can't reproduce much below 40Hz, so you really you need to use full range speakers. If budget doesn't allow for full range speakers, your only workaround is to check your mix on a good set of headphones (with a wide frequency response) and check to see if you can find anything untoward in a spectrum analyser.

If there is much interest in the information in this post, I might add to it as useful points to remember spring to mind.

G
 
Hey, that was a great post. Reading this kind of stuff helps neophytes like me talk to the experts and not sound like a complete idiot. Please feel free to add to this post and enlighten us further.
Cheers,
Aveek
 
Question: If I only have an mp3 file, will it help to convert it to a wav before putting in my film's timeline or is the damage already done once it becomes an mp3?

Sadly, the latter. When you remove bits to compress the file (as an .mp3), they don't get restored when you expand it to a .wav. If all you have is an .mp3, then that's all you have, but try and get your music folks to give you .wavs when they can (and they should be able to....no one in their right mine records directly to .mp3).

Oh, and AudioPostExpert, if you ever feel like critiquing a score, let me know! Like the rest of the folks on this board, I'm always trying to get better!
 
Oh, and AudioPostExpert, if you ever feel like critiquing a score, let me know! Like the rest of the folks on this board, I'm always trying to get better!

Your answer to moonshieldmedia is correct.

I would critique your score but in practice it's not so simple, unless you are talking about a library music score. If we're not talking about library music, I can't critique your score unless I understand it's context within the film.

If you want to send me a link to your score and an explanation of it, I'll let you know if I can critique it and actually do a critique when I've got the time.

G
 
Poof! I gain +1 audio knowledge. Thanks G, I have a much better understanding of the use of the 5.1 sound scape now :) Awesome. This is a cool post, I'm going to tell my friends about it.
 
Thanks for the post. I will follow your advice to use a headphone with wide response.

Let me ask: if I use an adapter to plug P2 headphone cable into P10 monitor of my interface, the quality of sound lose something? I mean, there is any drawback from the use of an adapter?
 
Poof! I gain +1 audio knowledge. Thanks G, I have a much better understanding of the use of the 5.1 sound scape now :) Awesome. This is a cool post, I'm going to tell my friends about it.

I could go into far more detail about 5.1. There are quite a few potential pitfalls for the unwary when setting up 5.1 monitoring because you have 6 sound sources rather than the 2 of stereo and therefore many more issues with acoustic reflections such as phase cancellation and standing waves. Not using the .1 (Sub/LFE) is a blessing for the composer as this is another whole potential nightmare area.

One thing to remember though is that there are essentially two different 5.1 set-ups, one for cinema 5.1 and the other for consumer 5.1 media (HDTV, DVD, BluRay, etc). With cinema 5.1 the two rear speakers are set -3dBSPL lower than the front 3 speakers but with HDTV all 5 speakers are set to the same level. Exactly what that SPL level should be is a little complicated, depending on the size of your room but precise room calibration is not so important for the composer (although it's essential for the re-recording mixer). There are also different settings for the sub/LFE but again this doesn't affect the composer. There maybe some potential issues with the cinema x-curve, which is a pair of EQ low pass/high pass filters applied to the speaker outputs in the mix room to compensate for projection screen and other sound absorption factors present in a cinema. Let me know if anyone wants me to go into more detail with any of these issues.

Let me ask: if I use an adapter to plug P2 headphone cable into P10 monitor of my interface, the quality of sound lose something? I mean, there is any drawback from the use of an adapter?

Sorry, what's P2 and P10? Not terms I'm familiar with.

Providing the adapter isn't changing the gain or doing some other audio processing then it shouldn't affect sound quality unless it is faulty. More precisely, any adapter or audio connection will degrade the signal slightly, but with any decent quality adapter or connection the degradation should be many times below the threshold of audibility.

G
 
Last edited:
Awesome post. *sammi learned something*

Especially illuminating considering I'm in the middle of working with a composer on my current short. Thanks for the specifications on technical format. Now I know exactly what to ask for.

One quick question: you mentioned max decibels, but what about minimum? What I mean to say is that, on the score I've gotten from my composer, he's done a bit of the fading/mixing (not an expert, sorry if those terms are wrong) himself. I'd like to get a more even level, volume wise, so I can have more options in the final audio mix. What general range, volume-wise, should I ask him to create for the final render? And on a similar note, obviously a lot of 'fading' or 'swelling' isn't so much about volume but about the particular place the music needs to go. Is it possible to maintain the musical aspect of, say, swelling strings, while creating a file with a more consistent volume? Does that question even make sense?

Thanks again!

edit: I'm also taking quotes for a 7-minute 'fix and mix' ;)
 
One quick question: you mentioned max decibels, but what about minimum? What I mean to say is that, on the score I've gotten from my composer, he's done a bit of the fading/mixing (not an expert, sorry if those terms are wrong) himself. I'd like to get a more even level, volume wise, so I can have more options in the final audio mix. What general range, volume-wise, should I ask him to create for the final render? And on a similar note, obviously a lot of 'fading' or 'swelling' isn't so much about volume but about the particular place the music needs to go. Is it possible to maintain the musical aspect of, say, swelling strings, while creating a file with a more consistent volume? Does that question even make sense?

There isn't really a technical minimum and artistically it depends on the requirements of the music cue. The -6dBFS max peak is purely for technical reasons, to avoid the possibility of clipping intersample peaks when adding further processing in the re-recording, I could explain what all this means but it would take a while and would bore you to death! As far as dynamic range is concerned, again that's really an artistic choice.

You can run into an awful lot of problems when talking about volume or loudness to a sound engineer. The reason being that volume seems a very simple concept but in reality it's a very complex perception which the brain carries out automatically. To cut a long story very short, there is simply no way with current science or technology to accurately measure loudness (as strange as that may seem).

The easy way to get a more even level is to use an audio compressor, which is a basic tool included in pretty much any audio software package. Compressors are used extensively in TV sound and a compressor is probably your best bet but I personally tend to avoid it far more in film, there are side effects to compressor use which actually make final mixing more difficult. You have to consider the old chestnut again of loudness and that getting an even level is not the same thing as getting an even loudness. If you're talking about an even loudness then there is not really a simple solution to that and it comes under the heading of "skill of the mixer". I personally often ask the composer for the music to be broken down into submixes: Drums, Vocals, Guitars maybe or strings, brass, woodwind and perc in the case of orchestral music. Although this is more work for me to deal with it also means I have more flexibility to get the music to fit perfectly with the dialogue and other audio elements.

edit: I'm also taking quotes for a 7-minute 'fix and mix' ;)

I would need to know a little more before I can provide a quote: Are we talking about a 5.1 mix or stereo? What genre is the short? Is it for TV broadcast, internet or cinema? If it's for TV I would like to know which broadcaster (they all have different specifications and some require much more work than others). Can you give me a little more detail on what needs fixing please? I'll send you a PM with my email address so we can communicate privately.

G
 
You can run into an awful lot of problems when talking about volume or loudness to a sound engineer. The reason being that volume seems a very simple concept but in reality it's a very complex perception which the brain carries out automatically.

Ain't that the truth!!! One of the biggest concepts that confuses the issue is "apparent loudness." Every single audio clip on one track of a time-line may be set at -6db, but some will sound louder or softer than others. This has a lot to do with frequency content (as an example, upper mid-range frequencies can be harsh so seem louder because they are "annoying"), digital distortion (again, the harshness factor), and dynamic processing (which can artificially make things seem louder - the volume/loudness wars debate has been going on on the music industry for years).

This is why you can't mix "by the numbers."

Once you have reached the budget level of being able to mix in a real "certified" mixing facility you should also be able to get stems from the composer; don't worry, stems aren't your problem, your rerecording mixer(s) will know what to do with them. Basically the score is supplied to the rerecording mixer(s) as separate sections - strings, brass, reeds, percussion, synths, etc.

As I mentioned in another thread, it is important for the composer and supervising sound editor to work together. It saves the rerecording mixer lots of hassles trying to get the score, sound FX, Foley and dialog to all work together as a cohesive whole.
 
Ain't that the truth!!! One of the biggest concepts that confuses the issue is "apparent loudness." Every single audio clip on one track of a time-line may be set at -6db, but some will sound louder or softer than others. This has a lot to do with frequency content (as an example, upper mid-range frequencies can be harsh so seem louder because they are "annoying")

The ear is most sensitive to frequencies around 3kHz, the resonant frequency of the ear canal for most people. Certainly SPL and frequency content are the two most influential sound wave properties used by the brain to determine volume but not the only ones. For example, distance and the relative balance and spectrum of the early reflections within the Haas Effect window also affect volume perception. You can output pink @ 85dBSPL(C) sitting a few feet from the speakers, then move 30 feet away and turn up the volume so the SPL from the new listening position is still 85dBSPL and the new listening position will sound quieter. That's why the correct calibration of a smaller mix room could be anywhere from about 72dBSPLC to the standard 85dBSPLC in a full sized Dolby certified facility.

Once you have reached the budget level of being able to mix in a real "certified" mixing facility you should also be able to get stems from the composer; don't worry, stems aren't your problem, your rerecording mixer(s) will know what to do with them. Basically the score is supplied to the rerecording mixer(s) as separate sections - strings, brass, reeds, percussion, synths, etc.

Just a tiny point: We have to be a little careful with nomenclature between the re-recording mixer and the composer, I've run into this problem before. The word "stems" is used differently, to a re-recording mixer "Stems" are Foley, DX, FX and Music. Sub-stems may be ambience, PFX, ADR, etc. So with the music I prefer to use the term sub-mixes to describe the separate section mixes of the ensemble rather than stems, just to avoid potential confusion.

G
 
I think now might be a good opportunity to cover some basics regarding levels, which might be of use to filmmakers, picture editors and those starting out with sound or music software:

As mentioned previously, we can't really measure volume or loudness but what we can measure very accurately is the energy contained within a sound wave. The scale we use to measure this energy is called the Decibel (dB) scale. Actually there isn't one decibel scale but a whole range of them, to measure different types of energy. For example, volts, watts, sound pressure energy, etc. Mostly for the composer or film maker we are concerned with two of these scales: dBFS and dBSPL. Although when recording music you are likely to also come across the dBm, dBu and dBV scales and for the re-recording mixer there are a bunch of scales we need to know about in addition to all the others already mentioned: PPM, VU, LUFS, LKFS, dBTP and DialNorm are the common ones. But I want this to stay simple so I'll stick for now to dBFS and dBSPL.

The first thing to be aware of is that the dB scale is logarithmic, not linear! This means that double 50dBSPL is not 100dBSPL, it's 56dBSPL! 120dBSPL is not 2 times the level of 60dBSPL, it's 1,000 times greater!! Again, don't confuse the level (energy) with the perception of volume. Let me give you an example, which would appear louder, a 60Hz sound wave at 80dBSPL or a 3kHz sound wave at 70dBSPL? Answer, the 3kHz sound wave because our ears are very sensitive to frequencies around 2-3kHz but very insensitive to low frequencies. In fact, for the 60Hz sound wave to sound the same volume as the 3kHz sound wave it would need to have about 60dB (1,000 times) more level. This is why when you mix a bass guitar and a hihat together, the bass guitar fader is always going to be substantially higher than the hihat fader (provided they are both recorded at roughly the same level). The sensitivity of the ear varies according to the frequency, look up Fletcher-Munson if you want more info on this subject.

Back to our two scales: dBFS (dB Full Scale) is the scale used to measure digital audio, the maximum possible value of the dBFS scale is 0dBFS, so all dBFS values are negative. The rule applies though that double the level is +6dB, so double the level of -20dBFS is -14dBFS, this rule also applies to the dBSPL scale but not to all dB scales, for example with dBW (Watts) double the wattage is +3dBW. The dBSPL (dB Sound Pressure Level) scale measures the energy contained in a sound travelling through the air (the sound you actually hear from your speakers for example or the sound which enters your microphones). All dB scales are a ratio, a figure relative to a known reference. dBFS is relative to 0dBFS the maximum possible value of digital audio (all the digital bits set to "1"). The dBSPL scale is relative to 20 micropascals which is supposed to be a rough average of the quietest sound a human can hear. It should be noted that there are 3 different weightings to the dBSPL scale to roughly mimic our perception of volume, they are hideously inaccurate for this purpose but are still commonly used, for example the dBSPL(C) weighting is always used for calibrating a sound system (in a dubbing theatre, cinema or music studio).

To make matters a little more complicated, just in case you found the above a bit too easy :)
There is no direct or fixed relation between the dBFS and dBSPL scales because speakers, amps and room sizes all vary. So you have to create a relation yourself by calibrating your system. In film, the calibration for a cinema or Dolby approved dubbing theatre is -20dBFS = +4dBu (1.223volts) = 85dBSPL(C). In other words, if you output a channel of pink noise at -20dBFS this should give a sound pressure level of 85dBSPL(C) at the listening position (behind the mixing desk in a Dolby room or at a datum point chosen by a Dolby technician in a cinema). The music business is a bit different, with no fixed reference point or agreed calibration. Most converters out of the box are calibrated to -18dBFS = +4dBU but not all, some music studios calibrate to -16 or even -14dBFS = +4dBu.

This can all get rather confusing for those not used to dealing with audio engineering, so I'll stop now before it gets out of hand but hopefully I've provided some information useful to some of you. If anyone has any specific questions, I'll try to answer them.

G
 
Last edited:
So you have to create a relation yourself by calibrating your system. In film, the calibration for a cinema or Dolby approved dubbing theatre is -20dBFS = +4dBu (1.223volts) = 85dBSPL(C). In other words, if you output a channel of pink noise at -20dBFS this should give a sound pressure level of 85dBSPL(C) at the listening position (behind the mixing desk in a Dolby room or at a datum point chosen by a Dolby technician in a cinema). The music business is a bit different, with no fixed reference point or agreed calibration. Most converters out of the box are calibrated to -18dBFS = +4dBU but not all, some music studios calibrate to -16 or even -14dBFS = +4dBu.

G

Can't argue with that, in that is great for people working in a Dolby approved dubbing theatre...but how many indie filmmakers on here are working in such an environment? I imagine many are working in a small edit suite, or living room, or even a converted bedroom. You may suggest that they should hand their audio editing over to a professional who IS working in one. But, in the case of a budget film, many won't be able to afford that.
In the case therefore, of a budget production done in a small room and using predominantly nearfield monitoring, where the speakers are at a close range, calibrating to work at 85dBSPL is going to appear VERY loud. In which case would it not be better to calibrate to a lower level of round 79dBSPL (C weighted) or a similar level, such as is done in many smaller TV edit suites?

ps: In order to calibrate at all, they would need a pink noise generator of some sort and an SPL meter.
 
Last edited:
Back
Top