0 downloads 48 Views 5MB Size

Motivation • Temporal structure is important in music, and it can even exist without melody and harmony • Intriguing and fun • MIR goals: detecting the onsets of notes and accents, and tapping the beats like human beings • Onset detection, tempo estimation, beat tracking, …

Onset detection • Recall the attack-decay-sustain-release (ADSR) curve • Transient: the noise-like sound component of short duration and high amplitude typically occurring at the beginning of a musical tone • Onset: the instant marking the start of the tranasient

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Energy-based novelty (1) • Playing a note on an instrument often coincides with a sudden increase of the signal’s energy • Local energy: given a window function ℎ(𝑚) supported on 𝑚 ∈ [−𝑀, 𝑀], we have 𝑀

𝐸ℎ𝑥 ≔ 𝑥 𝑛 + 𝑚 ℎ(𝑚)

2

𝑚=−𝑀

• Energy-based novelty function: ΔEnergy 𝑛 = 𝐸ℎ𝑥 𝑛 + 1 − 𝐸ℎ𝑥 𝑛 • Half-wave rectification function | ∙ |≥0 : 𝑟+ 𝑟 𝑟, 𝑖𝑓 𝑟 ≥ 0 𝑟 ≥0 ≔ =ቊ 0, 𝑖𝑓 𝑟 < 0 2

≥0

Energy-based novelty (2) • Human perception of sound intensity is logarithmic in nature • Log-energy-based novelty function: Log ΔEnergy 𝑛 = 𝐸ℎ𝑥 𝑛 + 1 − 𝐸ℎ𝑥 𝑛 ≥0

• Example:

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Energy-based novelty (3) • Waveform and energy-based novelty function of the note C4 (261.6 Hz) played by different instruments – piano (left), violin (middle) and flute (right)

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Recap: short-time Fourier transform • Given a discrete-time signal 𝑥(𝑡) sampled at a rate 𝑓𝑠 . Let window size 𝑁 samples, hop size 𝐻 samples, then the shorttime Fourier transform (STFT) 𝑋(𝑛, 𝑘) is: 𝑁−1

𝑋 𝑛, 𝑘 = 𝑥 𝑚 + 𝑛𝐻

𝑗2𝜋𝑘𝑚 − 𝑛 ℎ(𝑚)𝑒

𝑚=0

• 𝑘: frequency index, 𝑓 𝑘 ≔ • 𝑛: time index , 𝑡 𝑛 ≔

𝑛𝐻 𝑓𝑠 2

𝑘𝑓𝑠 𝑁

• Spectrogram: 𝑋(𝑛, 𝑘) • Logarithmic compression: 𝑌𝛾 𝑛, 𝑘 ≔ log 1 + 𝛾 𝑋(𝑛, 𝑘)

Spectral-based novelty (1) • Energy-based novelty function falls short of: Pitch change, low-intensity note masked by high-intensity ones, frequency-dependent transient, and others …

• Spectral flux 𝐾

ΔSpectral 𝑛 ≔ 𝑌𝛾 𝑛 + 1, 𝑘 − 𝑌𝛾 (𝑛, 𝑘)

≥0

𝑘=0

• Example: Spectrogram 𝛾=1 𝛾 = 1000

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Spectral-based novelty (2): post-processing • Adaptive thresholding and peak picking Basic idea: moving average

• Local average function: 𝑀

1 𝜇 𝑛 ≔ ΔSpectral (𝑛 + 𝑚) 2𝑀 + 1 𝑚=−𝑀

• Enhanced novelty function: ഥSpectral 𝑛 ≔ Δ𝑆𝑝𝑒𝑐𝑡𝑟𝑎𝑙 (𝑛) − 𝜇(𝑛) Δ

≥0

• Other ideas: median filtering From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Spectral-based novelty (3) • Shostakovich’s Waltz No. 2

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Spectral-based novelty (4) • Percussive onsets (e.g., percussion instrument, piano) can be considered a solved problem, but soft onsets, vibrato and tremolo are still major challenges • The variation of frequency in the vibrato are easily considered as onset using the spectral flux, while the variation of amplitude in the tremolo are easily considered as onset using energy-based novelty • How to improve (Böck et. al, 2013): (1) consider longer time difference 𝜇 > 1, (2) maximum filtering 𝐾

𝜇 𝑆𝐹 𝑛 + = 𝑌𝛾 𝑛 + 𝜇, 𝑘 − max 𝑌𝛾 (𝑛, 𝑘) ′ 𝑘−𝜂≤𝑘 ≤𝑘+𝜂 2 𝑘=0

≥0

Spectral-based novelty (5) • Vibrato suppression

S. Böck and G. Widmer, “Maximum filter vibrato suppression for onset detection,” in DAFx 2013

Phase-based novelty • Phase is also important Stationary tones have a stable phase (i.e., evolves linearly with time), while transients have a unstable phase

• Polar coordinate representation 𝑋 𝑛, 𝑘 = 𝑋(𝑛, 𝑘) 𝑒 2𝜋𝑖𝜙 𝑛,𝑘 • Phase derivative 𝜙 ′′ = 0 when steady state

𝜙 ′ 𝑛, 𝑘 ≔ 𝜙 𝑛, 𝑘 − 𝜙 𝑛 − 1, 𝑘 𝜙 ′′ 𝑛, 𝑘 ≔ 𝜙 ′ 𝑛, 𝑘 − 𝜙 ′ 𝑛 − 1, 𝑘 • Phase-based novelty function 𝐾

ΔPhase =

𝜙 ′′ 𝑘=0

𝑛, 𝑘

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Complex-domain novelty • Problems in phase-based novelty Phase jumps from −𝜋 to 𝜋, needs a procedure called phase unwrapping (unstable) When 𝑋 𝑛, 𝑘 is very small, 𝜙(𝑛, 𝑘) could be very chaotic (large 𝜙 ′′ 𝑛, 𝑘 )

•

Considering both magnitude and phase 2𝜋𝑖 𝑋 𝑛 + 1, 𝑘 = 𝑥(𝑛, 𝑘) 𝑒

𝑋 𝑛, 𝑘 − 𝑋 𝑛 + 1, 𝑘 + 𝑋 𝑛, 𝑘 = ቊ 0 • Complex-domain novelty function:

𝜙 𝑛,𝑘 +𝜙′ 𝑛,𝑘

for 𝑋 𝑛, 𝑘 > 𝑋 𝑛 − 1, 𝑘 otherwise 𝐾

ΔComplex 𝑛, 𝑘 = 𝑋 + (𝑛, 𝑘) 𝑘=0

Further readings • Sebastian Böck and Gerhard Widmer, “Maximum filter vibrato suppression for onset detection”, Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx-13), 2013 • S. Dixon, “Onset detection revisited,” in Proceedings of the 9th International Conference on Digital Audio Effects (DAFx-06), Montreal, Quebec, Canada, September 2006, pp. 133–137. • A. Holzapfel, Y. Stylianou, A.C. Gedik, and B. Bozkurt, “Three dimensions of pitched instrument onset detection,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1517–1527, 2010.

Tempo, beat, and rhythm • Tempo: the speed or pace of music • How to describe tempo? Italian tempo markings: Largo, Andante, Presto, … Note-level markings: e.g., Beats per minute (BPM): how many “pulses” per minutes

• Beat: the unit of time of music Beat-level: just “beat” Measure-level: downbeat (the first beat in one measure)

• Rhythm: a characteristic pattern of beats in different level Meter: a rhythmic structure identifying the relation between downbeats (accent, measure) and beats (e.g., 𝟐𝟒, 𝟑𝟒, …) Sometimes you can think rhythm as “genre” Cha Cha, Rumba, Tango, Waltz, …

Tatum, tactus, and measure • Rhythm is hierarchic: there are various levels presumed to contribute to the human perception of tempo and beat Tatum: the fastest repetition rate of musically meaningful accents Tactus: typically the foot tapping rate (and quarter note level) Measure: typically the rate of one cycle of counting beats

• BPM (beats per minute)

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Tasks and problems • Tempo/meter analysis

Same as in pitch detection, tempo also has octave ambiguity “Tempo harmonics” of a song with BPM 𝜏: 2𝜏, 3𝜏, … “Tempo subharmonics”: 𝜏/2, 𝜏/3, … Basic feature: tempogram, an indicator of the local relevance of a specific tempo for a given music recording for each time instance

• Beat tracking and downbeat tracking Backbeat, syncopation Rubato, swing Other genre-/subject-dependent issues

• Rhythm classification

Fourier tempogram (1) • Given the novelty function Δ(𝑚) of a musical recording, given a window function 𝑤(𝑚), the tempogram , denoted as 𝐹(𝑛, 𝜔), is the STFT of the novelty function: 𝐹 𝑛, 𝜔 ≔ Δ 𝑚 𝑤 𝑚 − 𝑛 𝑒 −2𝜋𝑖𝜔𝑛 𝑚∈𝑍

• Beat rate 𝜔 = 1/𝑇 (beats per second) • BPM: 𝜏 = 60𝜔 • Discrete Fourier tempogram 𝑇 𝐹 𝑛, 𝜏 = 𝐹 𝑛, 𝜏/60

Fourier tempogram (2) • Example: Shostakovich’s Waltz No. 2

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Autocorrelation tempogram (1) • Recall the issue of finding frequency and finding periodicity in previous lectures in pitch detection • Alternative way in finding tempo: taking time-varying autocorrelation function (ACF) on the novelty • Recall ACF: 𝑅𝑥𝑥 𝑙 = 𝑥 𝑚 𝑥(𝑚 − 𝑙) 𝑚∈𝑍

• Short-time ACF of the novelty curve (windowed ACF): 𝐴 𝑛, 𝑙 ≔ Δ 𝑚 𝑤 𝑚 − 𝑛 Δ 𝑚 − 𝑙 𝑤(𝑚 − 𝑛 − 𝑙) 𝑚∈𝑍

• Alternative way: 𝐴 𝑛, 𝑙 = 𝐼𝐹𝐹𝑇 𝐹𝐹𝑇 𝑤 𝑚 Δ 𝑚

2

Autocorrelation tempogram (2) • ACF tempogram 𝑇 𝐴 𝑛, 𝜏 ≔ 𝐴 𝑛, 𝑙 • BPM 𝜏 = 60Τ(𝑟𝑙) • 𝑟: the “sampling period” of the novelty curve • Comparison STFT: Fourier basis ACF: self basis

• Example: Shostakovich’s Waltz No. 2 From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Autocorrelation tempogram (3) • Conversion from lag to tempo From time-lag representation to time-frequency representation

• Example:

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Global and local tempo • For a music recording with 𝑁 time instances: Average (global) tempo: let Θ the expected interval of tempo 1 𝑇Average 𝜏 ≔ 𝑇(𝑛, 𝜏) 𝑁 𝑛∈ 1,𝑁

𝜏Ƹ ≔ max 𝑇Average 𝜏 |𝜏 ∈ Θ

Cyclic tempogram (1) • Motivation: like the pitch scale, the tempo scale also has “octave equivalence” A song with BPM=120 can usually be interpreted as BPM=240 (or BPM=60)

• Define 𝜏 ≔ 2𝑘 𝜏, 𝑘 ∈ 𝑍 Example: for 𝜏 = 120 one obtains 𝜏 = … , 30, 60, 120, 240, 480, …

• The cyclic tempogram 𝐶 𝑛, 𝜏

≔ 𝑇(𝑛, 𝜆) 𝜆∈ 𝜏

• The cyclic tempogram referred to 𝜏0 : 𝐶𝜏0 𝑛, 𝑠 ≔ 𝐶 𝑛, 𝑠, 𝜏0 𝑠: scaling parameter Notice that 𝐶𝜏0 𝑛, 𝑠 ≔ 𝐶𝜏0 𝑛, 2𝑘 𝑠 for 𝑘 ∈ 𝑍

Cyclic tempogram (2) • • • •

Example: a pulse sequence with increasing tempo 110 to 130 BPM Left: Fourier tempogram Right: ACF tempogram

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Tempogram and musical structure • Fourier tempogram (top) and ACF tempogram (bottom) • “In the year 2525” (Zager and Evans) • Hungarian Dance No.5 (Johannes Brahms)

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Comparison

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Cyclic tempogram and chromagram

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Beat tracking by PLP (1) • Tempo estimation: identifying the frequency (or periodicity) of the novelty curve • Beat tracking: identifying the phase of the novelty curve at the local tempo frequency • Predominant local pulse (PLP) • Local tempo estimation at time 𝑛 𝜏𝑛 ≔ argmax𝜏∈Θ 𝑇 𝐹 𝑛, 𝜏 • Local phase estimation: the phase 𝜙𝑛 belonging to the windowed sinusoid of tempo 𝜏𝑛 𝜏𝑛 𝑅𝑒 𝐹(𝑛, ) 1 60 𝜙𝑛 = arccos 𝜏𝑛 2𝜋 𝐹(𝑛, ) 60

Beat tracking by PLP (2) • The optimal window sinusoid “matching” the local novelty curve can be described by a sinusoid in phase with the novelty curve 𝜏𝑛 𝜅𝑛 𝑚 ≔ 𝑤 𝑚 − 𝑛 cos 2𝜋 ∙ 𝑚 − 𝜙𝑛 60 • The PLP function: applying the overlap-add (OLA) technique and reconstruct the sinusoids Picking only positive values

Γ 𝑚 = 𝜅𝑛 𝑚 𝑛∈𝑍

≥0

Beat tracking by PLP (3)

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Example of PLP • Hungarian Dance No.5 (Johannes Brahms)

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

PLP extracted in different bands

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Dynamic programming for beat tracking • Assumption: the tempo of music is more or less constant • A beat sequence 𝐵 = 𝑏1 , 𝑏2 , … , 𝑏𝐿 , estimated tempo 𝛿መ • Penalty function 𝑃𝛿 𝛿 ≔ − log 2 𝛿/𝛿መ • The objective function 𝐿

2

𝐿

𝑆 𝐵 ≔ Δ 𝑏𝑙 + 𝜆 𝑃𝛿 (𝑏𝑙 − 𝑏𝑙−1 ) 𝑙=1

𝑙=2

• Let 𝐵𝑁 all possible sequence, optimize 𝐵∗ ≔ argmax𝐵∈𝐵𝑁 𝑆(𝐵) From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Algorithm

From: M. Mueller, Fundamentals of Music Processing, Chapter 6, Springer 2015

Further readings • Tempogram toolbox (M. Mueller) • Beat tracking by dynamic programming (Dan Ellis’ code) Dan Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, Special Issue on Beat and Tempo Extraction, vol. 36 no. 1, March 2007, pp. 51-60. http://labrosa.ee.columbia.edu/projects/beattrack/

• Beat tracking with dynamic Bayesian networks A. Srinivasamurthy, A. Holzapfel, A. Cemgil, and X. Serra. Particle filters for efficient meter tracking with Dynamic Bayesian networks. In ISMIR 2015.

• Downbeat tracking with neural networks S. Durand, J. P Bello, B. David, and G. Richard, “Downbeat tracking with multiple features and deep neural networks,” in ICASSP 2015, pp. 409– 413.

Wrap-up • The time-frequency structure of music: what have we learned in this semester?

Example: visualization and sonification • Spectrogram, log-scale spectrogram, chromagram, novelty curve, tempogram, PLP curve … • Examples PSY – 〈Gangnam Style〉(2012) (2.57B views on YouTube) 田馥甄 – 〈小幸運〉(2015) (79.5M views on YouTube) Verse 1, chorus 1, chorus 2

玖壹壹 – 〈下輩子〉(2012) (10M views on YouTube) Bass part and others: how about calculating the novelty curve from the spectrogram in the region from 0 to 200 Hz?

濁水溪公社 – 〈留在台西鄉賺錢〉(2014) Syncopation Rhythm of different instruments

Example (1): Gangnam style Spectrogram Novelty curve 2000 1000 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 1000 500 0

2

Fourier tempogram 1 500 Pitch profile

C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

20

40

60 time (s)

80

100

Example (2): 小幸運 Verse 1 Spectrogram Novelty curve 500 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 2

100 0 Fourier tempogram

1 500 Pitch profile C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

10

20

30 time (s)

40

50

Example (3): 小幸運 Chorus 1 Spectrogram Novelty curve 2000 1000 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 500

2

0 Fourier tempogram 1 500 Pitch profile

C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

10

20

30 time (s)

40

50

Example (4): 小幸運 Chorus 2, window size for tempogram = 8s Spectrogram Novelty curve 5000 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 1000 500 0

2

Fourier tempogram 1 500 Pitch profile

C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

10

30

20 time (s)

40

50

Example (5): 小幸運 Chorus 2, window size for tempogram = 16s Spectrogram

Novelty curve 5000 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 1000 500 0

2

Fourier tempogram 1 500 Pitch profile

C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

10

20 time (s)

30

40

Example (6): 下輩子 Chorus Spectrogram Novelty curve 1000 500 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 500 2 0 Fourier tempogram 1 500 Pitch profile

C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

5

10

15 time (s)

20

25

30

Example (7): 留在台西鄉賺錢 Spectrogram Novelty curve 2000 1000 0

frequency (kHz)

3

Novelty curve (bass spectrogram) 2

500 0 Fourier tempogram

1 500 Pitch profile C8

BPM

400 300 200 100

C6

0

Fourier tempogram (bass)

C4 500

C2

Chromagram B

BPM

400 300 200 100 C time (s)

0 0

10

20

30

40 time (s)

50

60

70