Physical modelling is the field of audio modelling that aims to replicate physical instruments or sounds using any type of synthesis. Generally, physical modelling tries to model the instrument's physical part and compute the interaction of the air with the instruments and the vibrations that result from this interaction. In this chapter, we regard physical modelling in the large sense, that is we consider to be physical modelling any synthesis that tries to replicate a real instrument's sound, even if the model is not perfectly close to the physical function of the instrument.
\section{Strings synthesis}
When synthesizing a string instrument, there are several aspects to be taken care of. Those include
\begin{itemize}
\item what material is the string made of (nylon, metal...),
\item how long is the string,
\item how is the string excited (plucked, bowed...),
\caption{A Takamine G90 series acoustic guitar. The strings are played either with fingers or with a mediator.}
\end{figure}
Basically, the number of features can be increased almost infinitely. However, as complex as the string model can be, it always relies on these basic principles.
\begin{enumerate}
\item The string is first excited actively and second vibrates passively.
\item The string oscillates periodically and the frequency depends principally on its length.
\item Dampening occurs during the passive vibration phase.
\end{enumerate}
A string model hence needs to tackle an excitation phase, a fixed frequency, and a decay.
\subsection{Karplus-Strong string synthesis}
The Karplus-Strong string synthesis is an algorithm for generating string sounds that was proposed by Karplus and Strong in 1983\cite{karplus}. The model consists in a white noise generator that creates a short noise burst, which is subsequently sent to the output as well as run through a delayed loop with a lowpass filter. The diagram is as follows
The idea behind this model is that a string is first excited by bowing or striking (the noise burst), and then resonates by itself (the delay loop). The delay of the loop defines the frequency of the string, while the lowpass filter in the loop represents the fading of the vibration over time. The model can hence be characterized by the following parameters:
\begin{itemize}
\item the fundamental frequency $f$ of the string in Hz, from which we can compute the delay $N$ in samples, as $N = \lfloor f_s / f \rfloor$ where $f_s$ is the sampling frequency
\item the picking duration $p$ is seconds. A short picking duration (10ms) will generate a precise sound, reminding of a stroke string sound, while a longer picking (100ms) will evoke bowed strings
\item the sustain gain $g$ controlling the gain of the feedback loop. A smaller gain will create shorter string vibrations
\item the dampen $D$ determining of dampened the string is while vibrating. The dampening can be seen as controlling the "order" of the lowpass filter: the harder the lowpass, the greater the dampening. In the simplest implementation, the lowpass is a moving average, hence the larger the window, the greater the dampening.
\end{itemize}
\subsection{Digital waveguide synthesis}
A generalization of the Karplus-Strong model was proposed by Julius O. Smith III (a name that you will meet quite often if you check the references of this class). This generalization is called the \textbf{Digital Waveguide Synthesis}\cite{waveguide}\cite{PASPWEB2010} and aims to simulate wind and string instruments with an arbitrary number of parameters. While achieving fair results when implemented properly, they rely on quite heavy physics theory, making them difficult to work with. Additionally, much simpler methods such as sampling, or a little simpler machine learning models than this allow an at least as good reproduction of physical instruments, while saving us from the physical equations. We will hence rather focus on simple implementations that trade realism for originality!
\section{Drums synthesis}
Synthetic drums have been around since the 1980s and have been a quite popular synthetic instrument since actual drums have both the disadvantage of taking a lot of space, and to be difficult to record with microphones. A lot of electronic devices, such as drum machines, have tried to replicate drum sounds in many way. In this section, we review one popular synthesis technique for each of the main components of a drum set: the kick, the snare, and the cymbals. There are no state-of-the-art way of generating drum sounds (and good thing there isn't, diversity is key to creativity!), although there seems to be a general pattern to the way to make these sounds, that we will follow\footnote{A good tutorial given by one of the creators of Ableton Live pretty much circles the general trend to make drum sounds: \url{https://www.youtube.com/watch?v=rfeY0_k1ctk}}.
\subsection{Kick}
Physically, a kick (a bass drum) is a large drum, around 20 inches in diameter, one side of it, the beat drum head, being hit with a hammer generally attached to a pedal, or hold in hand. The other side is covered with another drum head for resonance. The tension of the drum heads can be controlled with screws, and is tuned to optimally resonate with the geometry of the drum. When the hit by the hammer, the tension in the drum head increase briefly generating a very short (around 20ms) high pitch noise. This initial "burst" contributes to the attack (the "kick") of the bass drum sound. As the beating ends, the vibration of the drum head continues with a lower tension, resulting in a lower pitch that immediately follows the first "burst", and fades away after 200ms. \\
\caption{A 24 inches bass drum, mounted with a Evans EQ4 drum head.}
\end{figure}
Amplitude-wise, a kick as a fast attack, going from 0 to 1 during the initial burst phase. Then, as soon as the resonance phase starts, the amplitude decreases logarithmically back to 0 until 200ms. Both the sweep behaviour and the envelope described are shown in figure \ref{kick}. \\
\caption{Logarithmic sweep (left) and envelope (right) of a synthetic kick.}
\label{kick}
\end{figure}
A simple way to model a kick is to generate a sine sweep that starts at a high pitch, around 14kHz, and then quickly goes down to a lower pitch at 20Hz and stays there for 200ms. Simply multiply this by the envelope described above and you obtain a very recognizable electronic kick sound. One can model toms and other kinds of simple drum in a similar fashion by using different start and end frequencies for the sweep. Similarly, the resonance of the drum can be tuned by the length of the sweep and the envelope. For instance, the 1980s saw a prominence in the popularity of headless drums, bass drums without a resonance drum head, generating shorter, more precise kicks.
A snare drum is a drum with the particularity if being flatter than other drums (typically 7 inches), and to feature drums (hence the name) on the resonance drum head. Both drum heads of the snare are generally stretched with a higher tension than the other drums. The flatness of the snare generate a very sharp, short tom-like sound, while the drums add to the snare sound its signature noise sound.
\caption{Bottom view of a 10 inches snare drum, mounted with Remo Ambassador\textregistered\ Renaissance\textregistered\ drum heads. The visible snares are responsible for the signature noisy sound of the snare.}
\end{figure}
To model a simple snare, we will use 2 components: a noise sound modelling the snares, and a sine wave modelling the resonance of the snare drum's body. The noise is a filtered white noise with frequencies concentrated between 100 and 2000Hz. The sine is a simple 100Hz wave with length 1/4 seconds. Note that for a snare drum, the drum heads are sufficiently tight so that we do not observe the logarithmic swipe effect as in the bass drum, hence a simple sine is enough. Both the noise and the sine are then passed through a respective envelope, with a shape similar to the bass drum's. The envelope of the noise ends shorter than the envelope of the sine, as the snares stop vibrating before the heads themselves. We finally simply add the noise with the sine, with more noise than sine. The resulting waveform is shown in figure \ref{snare}.
Cymbals are worked plates of bronze whose usage is very versatile in a drumset. Some cymbals are used to lead the rhythm of the music, while some are used to highlight accents on some particular beats. Among the cymbals, the hi-hat is probably the most important one. It consists in 2 cymbals facing each other, mounted on a stand that allows to control the distance between the cymbals using a pedal. If the distance is large, the hi-hat is said to be open and the cymbals can resonate. If the distance is small, i.e. the cymbals touch each other, the hi-hat is said to be closed and the sound is more precise and muffled.\\
\caption{A hi-hat stand with a 13 inches Meinl HCS hi-hat mounted on it.}
\end{figure}
Modelling cymbals can actually be done fairly easily. One can simply generate a white noise and filter it to only keep the high frequencies. The higher the filter, the crispier the sound of the cymbal. Then, a similar envelope as before can be added to control the precision of the cymbal. A short envelope would create a closed hi-hat sound, while a longer envelope would generate an open hi-hat/normal cymbal sound.
Wind instruments have the particularity that as long as the musician blows in it, it keeps playing at a constant volume. However, a normal person generally does not perfectly blows an even amount of air over time. This results in light variations in the pitch of the instrument over time, quite typical of wind instruments. \\
Amplitude-wise, the musician has to start blowing air as precisely as possible in the trumpet to start making a sound. There again, as precise as one can be, it will still be less precise than plucking a string, hence wind instruments typically have a longer attack and release than other instruments. \\
Generally, the base synthetic sound used to create a (brass) wind instrument is a layer of sawteeth (for instance, two of them). The sawtooth waveform is rich in frequencies and offers a crispy sounds that recalls the brightness of the brass. For wood wind instruments, smoother waves such as a triangle wave can be used. The pressure in the instrument is not constant while playing. Actually, when one starts playing the trumpet, the air pressure quickly increases, but still in an audible way. This results in a quick raise in pitch as we start playing. It is also popular to add another pitch change to the second sawtooth from above this time, creating more "breath" in the initial sound attack. This can be modelled by a pitch envelope going from below, and one from above. The two pitch envelopes are shown in figure \ref{pitch_envelopes}
\caption{Dual pitch envelopes for the wind instrument sound.}
\label{pitch_envelopes}
\end{figure}
The second thing to discuss is that even when the desired pitch is reached, it is hard for a player to keep an exactly constant pressure in the instrument, resulting is very small variations in the pitch over time. This can be simply tackled by sinusoidal LFOs slightly modulating the pitch of both sawteeth. Once we have both pitch envelope'd, LFO'ed sawteeth, we can pass them both through a lowpass filter around 2kHz, as wind instruments generally do not emit higher frequencies than this. Higher lowpass would result in crispier sounds, while lower lowpass would generate more mellow sounds, like for instance sounds obtain when using a \textit{mute} (popular in jazz!). We then simply mix both sawteeth together.\\
The last thing to do is to apply an amplitude envelope to the mix. As said above, wind instrument's attack is rather slow, then the sustain is slightly quieter than the attack and stays constant as the player keeps blowing in the trumpet, and finally the release is quite slow too. The amplitude envelope is shown in figure \ref{wind_env}.