lecture07 pitch

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions • Chord progressions...

1 downloads 90 Views 2MB Size
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31

Chord progressions • Chord progressions are not arbitrary  Example 1: I-IV-I-V-I (C-F-C-G-C)  Example 2: I-V-VI-III-IV-I-II-V (C-G-Am-Em-F-C-Dm-G)

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Markov chains of chord progressions • Markov states 𝛼1 , 𝛼2 , 𝛼3 in a sequence 𝑠1 𝑠2 𝑠3 … • Markov property:  𝑃 𝑠𝑛+1 = 𝛼𝑗 𝑠𝑛 = 𝛼𝑖 , 𝑠𝑛−1 = 𝛼𝑘 , … = 𝑃(𝑠𝑛+1 = 𝛼𝑗 |𝑠𝑛 = 𝛼𝑖 )

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

HMM model • Observations  Chroma features  Or template-based result

• Hidden states  “Refined” chord sequence  The answer we want

• Transition probability  From training data

• Emission probability  From training data From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Discrete HMM components • Map an arbitrary chroma features in the test data to one of a finite set of prototype vectors (codebook)  Quantization: map the feature  Clustering: train the codebook

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

A naïve HMM model training method • 𝐼 states (e.g., 𝐼 = 24), 𝐾 observation symbols • For the training data: 𝑐𝑖 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 𝑓𝑜𝑟𝑚 𝛼𝑖 𝑎𝑡 𝑡𝑖𝑚𝑒 (𝑛 = 1) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑠 𝑓𝑟𝑜𝑚 𝛼𝑖 𝑡𝑜 𝛼𝑗 𝑎𝑖𝑗 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑠 𝑓𝑟𝑜𝑚 𝛼𝑖 𝑏𝑖𝑘

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑠 𝑓𝑟𝑜𝑚 𝛼𝑖 𝑎𝑛𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑖𝑛𝑔 𝛽𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑠 𝑓𝑟𝑜𝑚 𝛼𝑖

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

The uncovering problem of HMM • Given:  An HMM specified by Θ = (𝑨, 𝐴, 𝐶, 𝑩, 𝐵)  An observation sequence 𝑂 = (𝑜1 , 𝑜2 , … , 𝑜𝑁 )

• Find:  The single state sequence 𝑆 = (𝑠1 , 𝑠2 , … , 𝑠𝑁 ), 𝑠𝑖 ∈ 𝑨 that “best explain” the observation sequence 𝑆 ∗ = argmax 𝑃(𝑂, 𝑆|Θ) 𝑆

 𝐼 states, 𝑁 time frames -> total 𝐼 𝑁 possible paths  How to solve this problem?

Viterbi’s algorithm (1) • Based on dynamic programming: the optimal result for a problem is built on the optimal result for the sub-problems

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Viterbi’s algorithm (2)

For backtracking

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

An example of Viterbi’s algorithm (1)

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

An example of Viterbi’s algorithm (2)

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Result • Better than temporal smoothing

From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Pitch detection

Pitch detection • Pitch detection from the spectrum  Problem 1: missing fundamental  Problem 2: inharmonicity

• Periodicity-based pitch detection?

From: http://sites.sinauer.com/wolfe4e/wa10.02.html

From: http://www.21harmony.com/blog/illuminating-inharmonicity

“Periodicity” detection • We have discussed some techniques in spectrum estimation / frequency detection • What is the difference between frequency and periodicity? • Formally, a periodic signal is defined as  𝑥 𝑡 = 𝑥 𝑡 + 𝑇0 , ∀ 𝑡 • What is the definition of frequency? • Find the fundamental frequency/period • Application: pitch detection, transcription, beat tracking …

Pitch detection theory: a historical remark

August Seebeck (1805-1849)

Georg Simon Ohm (1789-1854) Herman von Helmholtz (1821-1894)

Harvey Fletcher (1884-1981)

Jan Frederik Schouten (1910-1980)

Seebeck’s experiment (1841) and Ohm’s second law • Ohm’s second law: a pitch could be heard only if the wave contains power at the frequency (“Fourierism” perspective) • Ohm: Seebeck’s finding is just an illusion pitch is periodicity!

pitch is frequency!

Helmholtz’s theory • 《On the Sensations of Tone as a Physiological Basis for the Theory of Music》(1877) • “Fourierism” perspective: distortion products generated in the ear so we can hear that weak fundamental • Fletcher: discover “missing fundamental” using high-pass filter on audio signal I support Ohm’s position, and I have a beautiful explanation

I support Helmholtz’s position!

Schouten’s experiment I (1938) • Input signal: 400Hz, 600Hz, 800Hz, …, with distortion product at 200Hz (Helmholtz’s theory) • Add a pure tone of 206 Hz, beats should be heard  No beats were heard Things are not quite so simple…

Schouten’s experiment II (1938) • Input signal: 1000Hz, 1200Hz, 1400Hz  A clear pitch at 200 Hz should be heard (Helmholtz’s theory)

• Input signal: 1040Hz, 1240Hz, 1440Hz  Also a clear pitch at 200 Hz should be heard (Helmholtz’s theory)

• Experiment: ~207 Hz Things are not quite so simple…

Challenges • Quasi-periodicity • Multiple periodicity (polyphonic: overlap and harmonic) • Transient

Basic idea of periodicity detection • Formally, a periodic signal is defined as  𝑥 𝑡 = 𝑥 𝑡 + 𝑇0 , ∀ 𝑡

• Formally, the frequency spectrum of a signal is defined as… • Frequency analysis: the relationship between the signal and the sinusoidal basis • Periodicity analysis: the relationship between the signal and itself

Basic periodicity detection functions • • • •

Autocorrelation function (ACF) Average magnitude difference function (AMDF) YIN and its periodicity detector Generalized ACF and Cepstrum

Autocorrelation function (ACF) • Cross product measures similarity across time • Cross correlation:  𝑅𝑥𝑦 𝜏 =

1 σ𝑁−1−𝜏 𝑥 𝑁−1 𝑡=0

𝑡 𝑦(𝑡 + 𝜏)

• Autocorrelation:  𝑅𝑥𝑥 𝜏 =

1 σ𝑁−1−𝜏 𝑥 𝑁−1 𝑡=0

• 𝑡: time-domain • 𝜏: lag-domain

𝑡 𝑥(𝑡 + 𝜏)

Other relevant pitch detection functions • Average magnitude difference function (AMDF)  𝐴𝑀𝐷𝐹𝑥𝑥 𝜏 =

1 σ𝑁−1−𝜏 |𝑥 𝑁−1 𝑡=0

𝑡 −𝑥 𝑡+𝜏 |

• The pitch detection function used in YIN  𝑌𝐼𝑁𝑥𝑥 𝜏 =

1 σ𝑁−1−𝜏 𝑁−1 𝑡=0

𝑥 𝑡 −𝑥 𝑡+𝜏

2

 Ref: Alain de Cheveigné et al, “YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Am. 111 (4), April 2002

Pre-processing

Pitch detection function

Post-processing

Time-domain signal 0.1

Result

0 -0.1

• • • •

A violin D4 𝑓0 = 293 Hz 𝑇 = 3.41 msec Pitch indicator:  Discarding zero-lag term (for zero lag the signal matches the signal itself)  𝑝∗ = argmax𝑝 𝐴𝐶𝐹(𝑝) ∗

 𝑝 = argmin𝑝 𝐴𝑀𝐷𝐹(𝑝)

2

0 x 10

0.02 -3

0.04 0.06 Time (s) ACF

0.08

0.1

1 0 -1

0

0.005

0.01 Lag (s) AMDF

0.015

0.02

0

0.005

0.01 Lag (s) YIN

0.015

0.02

0.005

0.01 Lag (s)

0.015

0.02

0.06 0.04 0.02 0

4

x 10

-3

2 0

0

Wiener-Khinchin Theorem • The computational complexity of a 𝑁-point ACF:  𝑂(𝑁 × 𝑁)  Is there any way to accelerate it?

• Wiener-Khinchin theorem: the ACF is the inverse Fourier transform of the power spectrum  𝑅𝑥𝑥 𝜏 = 𝐼𝐹𝐹𝑇( 𝐹𝐹𝑇 𝑥 𝑡  Complexity: 𝑂(𝑁 log 𝑁)

2

)

Generalized ACF • Consider a generalization of ACF: 𝛾



𝑅𝑥𝑥 𝜏 = 𝐼𝐹𝐹𝑇( 𝐹𝐹𝑇 𝑥 𝑡



Or, 𝑅𝑥𝑥 𝜏 = 𝐼𝐹𝐹𝑇(log |𝐹𝐹𝑇 𝑥 𝑡 |) ?

), 0 < 𝛾 < 2

• What are the advantages of generalized ACF? •

Recall the “logarithmic compression” part of the chromagram!

• Reference: •



Helge Indefrey, Wolfgang Hess, and Günter Seeser. "Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain-preliminary results." in Proc, ICASSP, 1985. Anssi Klapuri, "Multipitch analysis of polyphonic music and speech signals using an auditory model." IEEE Transaction on Audio, Speech and Language Processing, Vol.16, No.2, pp. 255-266, 2008.

Time-domain signal 0.1

Preliminary result

0 -0.1

• A violin D4 (𝑓0 = 293 Hz, 𝑇 = 3.41 msec) • Pitch indicator:  𝛾 = 2 (ACF)  𝛾 = 0.2  Logarithm

0

0.02

0.04 0.06 Time (s) ACF

0.08

0.1

5 0 -5

0

0.005

0.01 Lag (s) AMDF

0.015

0.02

0

0.005

0.01 Lag (s) YIN

0.015

0.02

0

0.005

0.01 Lag (s)

0.015

0.02

0.05 0 -0.05

0.2 0 -0.2