lecture01 intro - PDF Free Download

10420CS 573100 音樂資訊檢索 Music Information Retrieval

Lecture 1 Introduction Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ [email protected]

Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica

Outline • Logistic issues  Syllabus  Some notes on the final project

• Brief introduction of Music Information Retrieval (MIR) • Lab  Sonic Visualiser  Matlab  Python

Logistic issues • Lecturers  Yi-Hsuan Yang (楊奕軒)  http://mac.citi.sinica.edu.tw/~yang/  [email protected]

 Li Su (蘇黎)  https://sites.google.com/site/lisupage/  [email protected]

• Office hour  Friday 9-11am, or by appointment  Office: 資電館 645室

Logistic issues • TA  Jeffrey Huang (黃彥學)  [email protected]

• Office hour  Thursday 1-3pm  Office:資電館719室

Logistic issues • Time: R2R3R4  09:10-10:00 (i.e. 10 mins later)  10:10-11:00  11:10-12:00

Logistic issues • Main textbook

Meinard Müller

Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., 30 illus. in color, hardcover ISBN: 978-3-319-21944-8 Springer, 2015 https://www.audiolabs-erlangen.de/fau/professor/mueller/bookFMP

• Related books  Music Similarity and Retrieval, Springer  Music Recommendation and Discovery, Springer  Music Emotion Recognition, CRC Press  Music Data Mining, CRC Press  Speech and Audio Signal Processing, Wiley

Logistic issues • Grading policy  Participation (10%)  Assignments (60%), 4 times  Final Project (30%): for a team of 3 or so  (we will talk more about this later)

• Prerequisites  Programming in Matlab or Python

Logistic issues • MIR = {signal processing, machine learning} + music • Course objectives    

Share with you the fun of MIR Teach you the fundamental techniques of MIR Build hands-on experience Invite you to contribute to this field

• We welcome  People who enjoy listening to music and wish to do something about it via programming  People who want to learn the techniques covered in this course

Logistic issues • We won’t  assume that you play an instrument  assume that you’ve taken DSP or ML related courses  teach you how to make music by computer (our focus is music information retrieval, not computer music)  teach you subjects such as music theory, acoustics, audio engineering, or audio synthesis

• We will, however,  consider you as a NTHU graduate student

Logistic issues • Extra enrollment  fine

• Audition  fine

Logistic issues • Course website https://twtmir.wordpress.com/ (TW Teaching MIR)

Syllabus • • • • • • • •

W1: Intro W2: STFT W3: timbre W4: classification W5: pitch W6: synchronization W7: pitch W8: chord

• • • • • • • • •

W9: separation W10: separation W11: beat, tempo W12: rhythm W13: transcription W14: tagging/recom. W15: MAClab W16: structure W18: final project

Assignments • Programming (in python or Matlab) + report (in English; preferably using Latex) • • • •

HW1: timbre + classification HW2: pitch HW3: source separation (tentative) HW4: tempo (tentative)

• HW3 likely due on W12 (before 停修截止)

Final Project • Goal: Invite you to contribute to this field • DSP or ML  Melodic transcription for a specific genre of music  Piano quintet, string quartet, violin sonata, voice duet, choir, wind quintet, etc

 Real-time score following  Cross-cultural MIR  Taiwanese Pop songs, folk songs, aboriginal music, etc

 Or on your own

Final Project (Cont’) • Try to start thinking about what you want to do now • Try to recruit team members earlier • Project pitch: likely W10 • Deadline for team-up: likely W12 (we will help you find your partners, if needed) • Final presentation: W18 • Deadline for final report: W18+1

Resources • Learning on your own  Dan Ellis @ Columbia (moved to Google) https://www.ee.columbia.edu/~dpwe/  Meinard Meuller @ Universität Erlangen-Nürnberg https://www.audiolabserlangen.de/fau/professor/mueller/teaching  Juan Bello @ NYU http://www.nyu.edu/classes/bello/Teaching.html  CCRMA summer school @ Stanford https://ccrma.stanford.edu/workshops/music-informationretrieval-mir-2015  Xavier Serra @ UPF, Spain https://zh-tw.coursera.org/course/audio 17

Resources (Cont’) • Learning on your own  Roger Jang @ NTU: http://mirlab.org/jang/

• Conference proceedings  Int’l Soc. Music Information Retrieval Conf. (ISMIR)  Int’l Conf. Acoustic, Speech, and Signal Processing (ICASSP)  ACM MM, ACM ICMR, ACM SIGIR, IEEE ICME

• Transactions  IEEE Trans. Audio, Speech and Language Processing (TASLP)  IEEE Trans. Multimedia (TMM)  IEEE Trans. Signal Processing (TSP) 18

Resources (Cont’) • MIREX (MIR Evaluation eXchange)  Part of ISMIR  http://www.music-ir.org/mirex/wiki/MIREX_HOME     

Audio Onset Detection Audio Beat Tracking Audio Key Detection Audio Downbeat Detection Real-time Audio to Score Alignment(a.k.a Score Following)  Audio Cover Song Identification  Discovery of Repeated Themes & Sections

     

Audio Melody Extraction Query by Singing/Humming Audio Chord Estimation Singing Voice Separation Audio Fingerprinting Music/Speech Classification/Detection  Audio Offset Detection

Outline • Logistic issues  Syllabus  Some notes on the final project

• Brief introduction of MIR • Lab  Sonic Visualiser  Matlab  Python

Intelligent Music Systems and Applications • • • •

Why is it relevant to computer science? Why is it interesting? Why is it difficult? What can we do?

22

Why Relevant to Computer Science? (1/8) • Listening to music online (subscription-based music streaming services)

23

Why Relevant to Computer Science? (2/8) • (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online • Paradigm shift (i→ii) DJ’s selec on → your selec on (ii→iii) Internet, PC, walkman, phones (iii→iv) a few thousands → million songs (anytime, anywhere, and “any song”)

24

Why Relevant to Computer Science? (3/8) • (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online • Research focus (i), (ii) audio engineering, computer music (iii: PC+Web) auto-classification, similarity search (iv: million songs) recommendation data science (information retrieval + pattern recognition)

audio content analysis (signal processing + pattern recognition) 25

Why Relevant to Computer Science? (4/8) • Research focus (i), (ii) audio engineering, computer music (iii: PC+Web) auto-classification, similarity search (iv: million songs) recommendation Machine learning Signal processing

Information retrieval Data mining

Musicology Human computer interaction

Music psychology

26

Why Relevant to Computer Science? (5/8) • Relevant conferences International Computer Music Conference (ICMC), since 1975 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), since 1976 ACM Multimedia (MM), since 1993 IEEE International Conference on Image Processing (ICIP), since 1994 International Society for Music Information Retrieval Conference (ISMIR), since 2000 27

Why Relevant to Computer Science? (6/8) • Relevant conferences Int’l Computer Music Conf. (ICMC) Int’l Society for Music Info. Retrieval Conf. (ISMIR)

• Goal  computer music: computers that “make” music  music information retrieval (MIR): computers that “listen” to music and understand/perceive them as human beings do  and, consequently, help we find the right music 28

Why Relevant to Computer Science? (7/8)

(2015) 29

Why Relevant to Computer Science? (8/8)

30

Why Is It Interesting? (1/8)

Pop Danthology Mashup of 50+ Pop Songs

31

Why Is It Interesting? (2/8) • Songle: web service for active music listening

32

Why Is It Interesting? (3/8) • PHENICX: performances as highly enriched and interactive concert experiences

33

Why Is It Interesting? (4/8) • Flow-machines: “An Ode to Music Generation”

34

Why Is It Interesting? (5/8) • MusicBricks: Internet of Music Things “In some cases it’s “just” entertainment, in some cases this revolutionises how music is performed, in others it produces a new world-wide business”…

35

Why Is It Interesting? (6/8) • iRACE: iOS-based rhythmic auditory cueing evaluation for Parkinson’s Disease

36

Why Is It Interesting? (7/8) • CompMusic: computational model for the discovery of the world’s music

37

Why Is It Interesting? (8/8) • Entertainment, education, culture, healthcare… • We see great potentials in Taiwan… MIR courses in the world

https://teachingmir.wikispaces.com/courses 38

Why Is It Difficult? (1/6) • Music is polyphonic audio → “musical score” is not easy 8ve

8ve

8ve 8ve

8ve 39

Why Is It Difficult? (2/6) • Music consists of multiple instruments/layers source separation, again, is not easy

40

Why Is It Difficult? (3/6) • Music is an art of time what we consciously perceive or expect in a piece is at the level of events – notes, chords, etc; not frames music have structures

(Figure from Jordan Smith's slides) 41

Why Is It Difficult? (4/6) • Music is performed (i.e. it’s expressive) Mozart’s Variationen (1st phrase)

-

scherzando tranquillo maestoso risoluto 42

Why Is It Difficult? (5/6) • Music is perceived (i.e. it’s subjective) x-axis: valence (negative → positive) y-axis: arousal (low → high)

(a) Smells Like Teen Spirit

(b) A Whole New (c) The Rose World

(d) Tell Laura I Love Her

43

Why Is It Difficult? (6/6) • Music is listened to in different contexts • • •

Context

Music

•

Activity: driving, studying, working, walking Mood: happy, sad, angry, relaxed Location: home, work, public place Social company: alone, w/ friends, w/ strangers

User

• • • • •

age gender personality cultural background musical background

44

What Can We Do? (1/3) • Learn from musicians → rule-based systems • Learn from data  feature design: music theory + signal processing  model learning: machine learning

• Learn from data, a more fashionable way,  feature learning: deep learning  model learning: deep learning

45

What Can We Do? (2/3) • Music is polyphonic & multi-instrument  unsupervised and supervised approaches  hidden Markov model (HMM), non-negative matrix factorization (NMF), sparse coding (SC), Bayesian approaches

• Music is an art of time  beat tracking, structure segmentation, sequential motif discovery, HMM, recurrent neural network (RNN)

• Music is listened to in different contexts  context-aware recommendation, listening behavior analysis 46

What Can We Do? (3/3) • • • • • • • •

W1: Intro W2: STFT W3: timbre W4: classification W5: pitch W6: synchronization W7: pitch W8: chord

• • • • • • • • •

W9: separation W10: separation W11: beat, tempo W12: rhythm W13: transcription W14: tagging/recom. W15: MAClab W16: structure W18: final project 47

Wrap-Up • Why is it relevant to computer science? (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online

• Why is it interesting? Entertainment, education, culture, healthcare

• Why is it difficult? Polyphonic, multi-layer, art of time, expressive, subjective, listening contexts

• What can we do? 48

Outline • Logistic issues  Syllabus  Some notes on the final project

• Brief introduction of MIR • Lab  Sonic Visualiser  Matlab  Python

Sonic Visualiser (1/3) • http://www.sonicvisualiser.org/

Sonic Visualiser (2/3) • https://www.freesound.org/  https://www.freesound.org/people/acclivity/sounds/22347/  https://www.freesound.org/people/Rudmer_Rotteveel/soun ds/316915/  https://www.freesound.org/people/Jaylew1987/sounds/321 112/  https://www.freesound.org/people/mickel11/sounds/90803/ (Find some interesting ones and send to me, with the spectrogram (in jpg or png) attached!)

Sonic Visualiser (3/3) • VAMP plugins http://www.vamp-plugins.org/

Matlab (1/3) • https://www.mathworks.com/matlabcentral/newsreader/view _thread/260379 fs = 8192; t = 0:1/fs:3;

% Hz % seconds

f0 = 440; y = sin(2.*pi.*f0.*t); sound(y,fs)

% Hz

• http://www.mathworks.com/help/signal/ref/chirp.html?reque stedDomain=www.mathworks.com y2 = chirp(t,f0,3,2*f0); sound(y2,fs);

% help chirp, or doc chirp

w = 1024; % window size of STFT figure(1), spectrogram(y,w,w/2,w,fs,'yaxis') figure(2), spectrogram(y2,w,w/2,w,fs,'yaxis')

Matlab (2/3) • http://mirlab.org/jang/books/audiosignalprocessing/matlab4wa veRead.asp?title=4-2%20Reading%20Wave%20Files dpath = 'C:\Users\affige\Dropbox\#MIR_course\lecture_1\audio\'; fname = fullfile(dpath,'316915__rudmer-rotteveel__cats-fighting-amplified.wav'); [y,fs] = wavread(fname); sound(y,fs) figure(3), subplot(311), plot(y) y = mean(y,2);

% stereo to mono

y = downsample(y,2); fs = fs/2;

% downsample to 22050

figure(3), subplot(3,1,[2 3]), spectrogram(y,w,w/2,w,fs,'yaxis')

• http://labrosa.ee.columbia.edu/matlab/mp3read.html • https://www.ee.columbia.edu/~dpwe/resources/matlab/

Matlab (3/3) • MIRtoolbox https://www.jyu.fi/hum/laitokset/musiikki/en/resear ch/coe/materials/mirtoolbox addpath(genpath(YOUR_FOLDER_OF_MIRTOOLBOX)) addpath(genpath(YOUR_FOLDER_OF_MP3READ)) fname = fullfile(dpath,'321112__jaylew1987__fur-elise-intro.mp3'); [y,fs,nbits] = mp3read(fname);

% read mp3

wavwrite(y,fs,nbits,'tmp.wav');

% write wav

a = miraudio(mean(y,2),fs,'trim') S = mirspectrum(a,'frame','dB') o = mironsets(a)

% onset estimation

p = mirpitch(mirspectrum(a,'frame'),'mono')

% pitch estimation

Python (1/4) • python (v2.7 is suggested) https://www.python.org/ • easy_install or pip https://pypi.python.org/pypi/setuptools • key packages to install      

numpy, scipy (for scientific programming) ipython, notebook (interactive programming environment) matplotlib (for plots) cython (C-Extensions for Python) spyder (for Matlab-like interface) ffmpeg (for MP3)

Python (2/4) • librosa https://github.com/bmcfee/librosa http://bmcfee.github.io/librosa/ import numpy as np import librosa import matplotlib.pyplot as plt %matplotlib inline import IPython.display audio_path = 'C:/Users/affige/Music/Linkin Park - Minutes To Midnight/08. No More Sorrow.mp3' print audio_path y, sr = librosa.load(audio_path, duration=30.0) IPython.display.Audio(data=y, rate=sr)

Python (3/4) S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128) log_S = librosa.logamplitude(S, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram') # harmonic percussion separation y_h, y_p = librosa.effects.hpss(y) S_h = librosa.feature.melspectrogram(y=y_h, sr=sr, n_mels=128) log_S_h = librosa.logamplitude(S_h, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S_h, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram (Harmonic)')

Python (4/4) IPython.display.Audio(data=y_h, rate=sr) S_p = librosa.feature.melspectrogram(y=y_p, sr=sr, n_mels=128) log_S_p = librosa.logamplitude(S_p, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S_p, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram (Percussive)') IPython.display.Audio(data=y_p, rate=sr)

• scikit-learn http://scikit-learn.org/stable/install.html • essentia https://github.com/MTG/essentia

Tips • Find your favorite music and play around with the toolboxes/libraries  listen to the music and check the result

• Preview before the class  online resources  textbooks

• Think about your final project earlier