10420CS 573100 音樂資訊檢索 Music Information Retrieval
Lecture 1 Introduction Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/
[email protected]
Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica
Outline • Logistic issues Syllabus Some notes on the final project
• Brief introduction of Music Information Retrieval (MIR) • Lab Sonic Visualiser Matlab Python
Logistic issues • Lecturers Yi-Hsuan Yang (楊奕軒) http://mac.citi.sinica.edu.tw/~yang/
[email protected]
Li Su (蘇黎) https://sites.google.com/site/lisupage/
[email protected]
• Office hour Friday 9-11am, or by appointment Office: 資電館 645室
Logistic issues • TA Jeffrey Huang (黃彥學)
[email protected]
• Office hour Thursday 1-3pm Office:資電館719室
Logistic issues • Time: R2R3R4 09:10-10:00 (i.e. 10 mins later) 10:10-11:00 11:10-12:00
Logistic issues • Main textbook
Meinard Müller
Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., 30 illus. in color, hardcover ISBN: 978-3-319-21944-8 Springer, 2015 https://www.audiolabs-erlangen.de/fau/professor/mueller/bookFMP
• Related books Music Similarity and Retrieval, Springer Music Recommendation and Discovery, Springer Music Emotion Recognition, CRC Press Music Data Mining, CRC Press Speech and Audio Signal Processing, Wiley
Logistic issues • Grading policy Participation (10%) Assignments (60%), 4 times Final Project (30%): for a team of 3 or so (we will talk more about this later)
• Prerequisites Programming in Matlab or Python
Logistic issues • MIR = {signal processing, machine learning} + music • Course objectives
Share with you the fun of MIR Teach you the fundamental techniques of MIR Build hands-on experience Invite you to contribute to this field
• We welcome People who enjoy listening to music and wish to do something about it via programming People who want to learn the techniques covered in this course
Logistic issues • We won’t assume that you play an instrument assume that you’ve taken DSP or ML related courses teach you how to make music by computer (our focus is music information retrieval, not computer music) teach you subjects such as music theory, acoustics, audio engineering, or audio synthesis
• We will, however, consider you as a NTHU graduate student
Logistic issues • Extra enrollment fine
• Audition fine
Logistic issues • Course website https://twtmir.wordpress.com/ (TW Teaching MIR)
Syllabus • • • • • • • •
W1: Intro W2: STFT W3: timbre W4: classification W5: pitch W6: synchronization W7: pitch W8: chord
• • • • • • • • •
W9: separation W10: separation W11: beat, tempo W12: rhythm W13: transcription W14: tagging/recom. W15: MAClab W16: structure W18: final project
Assignments • Programming (in python or Matlab) + report (in English; preferably using Latex) • • • •
HW1: timbre + classification HW2: pitch HW3: source separation (tentative) HW4: tempo (tentative)
• HW3 likely due on W12 (before 停修截止)
Final Project • Goal: Invite you to contribute to this field • DSP or ML Melodic transcription for a specific genre of music Piano quintet, string quartet, violin sonata, voice duet, choir, wind quintet, etc
Real-time score following Cross-cultural MIR Taiwanese Pop songs, folk songs, aboriginal music, etc
Or on your own
Final Project (Cont’) • Try to start thinking about what you want to do now • Try to recruit team members earlier • Project pitch: likely W10 • Deadline for team-up: likely W12 (we will help you find your partners, if needed) • Final presentation: W18 • Deadline for final report: W18+1
Resources • Learning on your own Dan Ellis @ Columbia (moved to Google) https://www.ee.columbia.edu/~dpwe/ Meinard Meuller @ Universität Erlangen-Nürnberg https://www.audiolabserlangen.de/fau/professor/mueller/teaching Juan Bello @ NYU http://www.nyu.edu/classes/bello/Teaching.html CCRMA summer school @ Stanford https://ccrma.stanford.edu/workshops/music-informationretrieval-mir-2015 Xavier Serra @ UPF, Spain https://zh-tw.coursera.org/course/audio 17
Resources (Cont’) • Learning on your own Roger Jang @ NTU: http://mirlab.org/jang/
• Conference proceedings Int’l Soc. Music Information Retrieval Conf. (ISMIR) Int’l Conf. Acoustic, Speech, and Signal Processing (ICASSP) ACM MM, ACM ICMR, ACM SIGIR, IEEE ICME
• Transactions IEEE Trans. Audio, Speech and Language Processing (TASLP) IEEE Trans. Multimedia (TMM) IEEE Trans. Signal Processing (TSP) 18
Resources (Cont’) • MIREX (MIR Evaluation eXchange) Part of ISMIR http://www.music-ir.org/mirex/wiki/MIREX_HOME
Audio Onset Detection Audio Beat Tracking Audio Key Detection Audio Downbeat Detection Real-time Audio to Score Alignment(a.k.a Score Following) Audio Cover Song Identification Discovery of Repeated Themes & Sections
Audio Melody Extraction Query by Singing/Humming Audio Chord Estimation Singing Voice Separation Audio Fingerprinting Music/Speech Classification/Detection Audio Offset Detection
Outline • Logistic issues Syllabus Some notes on the final project
• Brief introduction of MIR • Lab Sonic Visualiser Matlab Python
Intelligent Music Systems and Applications • • • •
Why is it relevant to computer science? Why is it interesting? Why is it difficult? What can we do?
22
Why Relevant to Computer Science? (1/8) • Listening to music online (subscription-based music streaming services)
23
Why Relevant to Computer Science? (2/8) • (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online • Paradigm shift (i→ii) DJ’s selec on → your selec on (ii→iii) Internet, PC, walkman, phones (iii→iv) a few thousands → million songs (anytime, anywhere, and “any song”)
24
Why Relevant to Computer Science? (3/8) • (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online • Research focus (i), (ii) audio engineering, computer music (iii: PC+Web) auto-classification, similarity search (iv: million songs) recommendation data science (information retrieval + pattern recognition)
audio content analysis (signal processing + pattern recognition) 25
Why Relevant to Computer Science? (4/8) • Research focus (i), (ii) audio engineering, computer music (iii: PC+Web) auto-classification, similarity search (iv: million songs) recommendation Machine learning Signal processing
Information retrieval Data mining
Musicology Human computer interaction
Music psychology
26
Why Relevant to Computer Science? (5/8) • Relevant conferences International Computer Music Conference (ICMC), since 1975 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), since 1976 ACM Multimedia (MM), since 1993 IEEE International Conference on Image Processing (ICIP), since 1994 International Society for Music Information Retrieval Conference (ISMIR), since 2000 27
Why Relevant to Computer Science? (6/8) • Relevant conferences Int’l Computer Music Conf. (ICMC) Int’l Society for Music Info. Retrieval Conf. (ISMIR)
• Goal computer music: computers that “make” music music information retrieval (MIR): computers that “listen” to music and understand/perceive them as human beings do and, consequently, help we find the right music 28
Why Relevant to Computer Science? (7/8)
(2015) 29
Why Relevant to Computer Science? (8/8)
30
Why Is It Interesting? (1/8)
Pop Danthology Mashup of 50+ Pop Songs
31
Why Is It Interesting? (2/8) • Songle: web service for active music listening
32
Why Is It Interesting? (3/8) • PHENICX: performances as highly enriched and interactive concert experiences
33
Why Is It Interesting? (4/8) • Flow-machines: “An Ode to Music Generation”
34
Why Is It Interesting? (5/8) • MusicBricks: Internet of Music Things “In some cases it’s “just” entertainment, in some cases this revolutionises how music is performed, in others it produces a new world-wide business”…
35
Why Is It Interesting? (6/8) • iRACE: iOS-based rhythmic auditory cueing evaluation for Parkinson’s Disease
36
Why Is It Interesting? (7/8) • CompMusic: computational model for the discovery of the world’s music
37
Why Is It Interesting? (8/8) • Entertainment, education, culture, healthcare… • We see great potentials in Taiwan… MIR courses in the world
https://teachingmir.wikispaces.com/courses 38
Why Is It Difficult? (1/6) • Music is polyphonic audio → “musical score” is not easy 8ve
8ve
8ve 8ve
8ve 39
Why Is It Difficult? (2/6) • Music consists of multiple instruments/layers source separation, again, is not easy
40
Why Is It Difficult? (3/6) • Music is an art of time what we consciously perceive or expect in a piece is at the level of events – notes, chords, etc; not frames music have structures
(Figure from Jordan Smith's slides) 41
Why Is It Difficult? (4/6) • Music is performed (i.e. it’s expressive) Mozart’s Variationen (1st phrase)
-
scherzando tranquillo maestoso risoluto 42
Why Is It Difficult? (5/6) • Music is perceived (i.e. it’s subjective) x-axis: valence (negative → positive) y-axis: arousal (low → high)
(a) Smells Like Teen Spirit
(b) A Whole New (c) The Rose World
(d) Tell Laura I Love Her
43
Why Is It Difficult? (6/6) • Music is listened to in different contexts • • •
Context
Music
•
Activity: driving, studying, working, walking Mood: happy, sad, angry, relaxed Location: home, work, public place Social company: alone, w/ friends, w/ strangers
User
• • • • •
age gender personality cultural background musical background
44
What Can We Do? (1/3) • Learn from musicians → rule-based systems • Learn from data feature design: music theory + signal processing model learning: machine learning
• Learn from data, a more fashionable way, feature learning: deep learning model learning: deep learning
45
What Can We Do? (2/3) • Music is polyphonic & multi-instrument unsupervised and supervised approaches hidden Markov model (HMM), non-negative matrix factorization (NMF), sparse coding (SC), Bayesian approaches
• Music is an art of time beat tracking, structure segmentation, sequential motif discovery, HMM, recurrent neural network (RNN)
• Music is listened to in different contexts context-aware recommendation, listening behavior analysis 46
What Can We Do? (3/3) • • • • • • • •
W1: Intro W2: STFT W3: timbre W4: classification W5: pitch W6: synchronization W7: pitch W8: chord
• • • • • • • • •
W9: separation W10: separation W11: beat, tempo W12: rhythm W13: transcription W14: tagging/recom. W15: MAClab W16: structure W18: final project 47
Wrap-Up • Why is it relevant to computer science? (i) Radio → (ii) CDs → (iii) downloading music → (iv) listening to music online
• Why is it interesting? Entertainment, education, culture, healthcare
• Why is it difficult? Polyphonic, multi-layer, art of time, expressive, subjective, listening contexts
• What can we do? 48
Outline • Logistic issues Syllabus Some notes on the final project
• Brief introduction of MIR • Lab Sonic Visualiser Matlab Python
Sonic Visualiser (1/3) • http://www.sonicvisualiser.org/
Sonic Visualiser (2/3) • https://www.freesound.org/ https://www.freesound.org/people/acclivity/sounds/22347/ https://www.freesound.org/people/Rudmer_Rotteveel/soun ds/316915/ https://www.freesound.org/people/Jaylew1987/sounds/321 112/ https://www.freesound.org/people/mickel11/sounds/90803/ (Find some interesting ones and send to me, with the spectrogram (in jpg or png) attached!)
Sonic Visualiser (3/3) • VAMP plugins http://www.vamp-plugins.org/
Matlab (1/3) • https://www.mathworks.com/matlabcentral/newsreader/view _thread/260379 fs = 8192; t = 0:1/fs:3;
% Hz % seconds
f0 = 440; y = sin(2.*pi.*f0.*t); sound(y,fs)
% Hz
• http://www.mathworks.com/help/signal/ref/chirp.html?reque stedDomain=www.mathworks.com y2 = chirp(t,f0,3,2*f0); sound(y2,fs);
% help chirp, or doc chirp
w = 1024; % window size of STFT figure(1), spectrogram(y,w,w/2,w,fs,'yaxis') figure(2), spectrogram(y2,w,w/2,w,fs,'yaxis')
Matlab (2/3) • http://mirlab.org/jang/books/audiosignalprocessing/matlab4wa veRead.asp?title=4-2%20Reading%20Wave%20Files dpath = 'C:\Users\affige\Dropbox\#MIR_course\lecture_1\audio\'; fname = fullfile(dpath,'316915__rudmer-rotteveel__cats-fighting-amplified.wav'); [y,fs] = wavread(fname); sound(y,fs) figure(3), subplot(311), plot(y) y = mean(y,2);
% stereo to mono
y = downsample(y,2); fs = fs/2;
% downsample to 22050
figure(3), subplot(3,1,[2 3]), spectrogram(y,w,w/2,w,fs,'yaxis')
• http://labrosa.ee.columbia.edu/matlab/mp3read.html • https://www.ee.columbia.edu/~dpwe/resources/matlab/
Matlab (3/3) • MIRtoolbox https://www.jyu.fi/hum/laitokset/musiikki/en/resear ch/coe/materials/mirtoolbox addpath(genpath(YOUR_FOLDER_OF_MIRTOOLBOX)) addpath(genpath(YOUR_FOLDER_OF_MP3READ)) fname = fullfile(dpath,'321112__jaylew1987__fur-elise-intro.mp3'); [y,fs,nbits] = mp3read(fname);
% read mp3
wavwrite(y,fs,nbits,'tmp.wav');
% write wav
a = miraudio(mean(y,2),fs,'trim') S = mirspectrum(a,'frame','dB') o = mironsets(a)
% onset estimation
p = mirpitch(mirspectrum(a,'frame'),'mono')
% pitch estimation
Python (1/4) • python (v2.7 is suggested) https://www.python.org/ • easy_install or pip https://pypi.python.org/pypi/setuptools • key packages to install
numpy, scipy (for scientific programming) ipython, notebook (interactive programming environment) matplotlib (for plots) cython (C-Extensions for Python) spyder (for Matlab-like interface) ffmpeg (for MP3)
Python (2/4) • librosa https://github.com/bmcfee/librosa http://bmcfee.github.io/librosa/ import numpy as np import librosa import matplotlib.pyplot as plt %matplotlib inline import IPython.display audio_path = 'C:/Users/affige/Music/Linkin Park - Minutes To Midnight/08. No More Sorrow.mp3' print audio_path y, sr = librosa.load(audio_path, duration=30.0) IPython.display.Audio(data=y, rate=sr)
Python (3/4) S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128) log_S = librosa.logamplitude(S, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram') # harmonic percussion separation y_h, y_p = librosa.effects.hpss(y) S_h = librosa.feature.melspectrogram(y=y_h, sr=sr, n_mels=128) log_S_h = librosa.logamplitude(S_h, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S_h, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram (Harmonic)')
Python (4/4) IPython.display.Audio(data=y_h, rate=sr) S_p = librosa.feature.melspectrogram(y=y_p, sr=sr, n_mels=128) log_S_p = librosa.logamplitude(S_p, ref_power=np.max) plt.figure(figsize=(15,5)) librosa.display.specshow(log_S_p, sr=sr, x_axis='time', y_axis='mel') plt.title('mel power spectrogram (Percussive)') IPython.display.Audio(data=y_p, rate=sr)
• scikit-learn http://scikit-learn.org/stable/install.html • essentia https://github.com/MTG/essentia
Tips • Find your favorite music and play around with the toolboxes/libraries listen to the music and check the result
• Preview before the class online resources textbooks
• Think about your final project earlier