CSmeetsML slides

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Compressed Sensing ...

1 downloads 183 Views 6MB Size
Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang

Feb. 25, 2008. UC Berkeley

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

What is Sparsity Sparsity A signal is sparse if most of its coefficients are (approximately) zero.

(a) Harmonic functions

(b) Magnitude spectrum

Figure: 2-D DCT transform.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Sparsity in spatial domain gene microarray data [Drmanac et al. 1993]

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006]

1 2 3

Feed-forward: No iterative feedback loop. Redundancy: Average 80-200 neurons for each feature representation. Recognition: Information exchange between stages is not about individual neurons, but rather how many neurons as a group fire together.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Sparsity and `1 -Minimization “Black gold” age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979]

Figure: Deconvolution of spike train.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007]

Basis pursuit [Chen & Donoho 1999]: Given y = Ax and x unknown, x∗ = arg min kxk1 , subject to y = Ax The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x∗ = arg min ky − Axk2 , subject to kxk1 ≤ k

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Taking Advantage of Sparsity What generates sparsity? (d’apr` es Emmanuel Cand` es) Measure first, analyze later. Curse of dimensionality. 1

Numerical analysis: sparsity reduces cost for storage and computation.

2

Regularization in classification:

(a) decision boundary

(b) maximal margin

Figure: Linear support vector machine (SVM)

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Our Contributions 1

Classification via compressed sensing

2

Performance in face recognition

3

Extensions Outlier rejection Occlusion compensation

4

Distributed pattern recognition in sensor networks.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Problem Formulation in Face Recognition 1

Notations Training: For K classes, collect training samples {v1,1 , · · · , v1,n1 }, · · · , {vK ,1 , · · · , vK ,nK } ∈ RD . Test: Present a new y ∈ RD , solve for label(y) ∈ [1, 2, · · · , K ].

2

Construct RD sample space via stacking

Figure: For images, assume 3-channel 640 × 480 image, D = 3 · 640 · 480 ≈ 1e6. 3

Assume y belongs to Class i [Belhumeur et al. 1997, Basri & Jacobs 2003]

y

= =

αi,1 vi,1 + αi,2 vi,2 + · · · + αi,n1 vi,ni , Ai αi ,

where Ai = [vi,1 , vi,2 , · · · , vi,ni ]. Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

1

Classification via Sparse Representation

Distributed Pattern Recognition

Nevertheless, i is the variable we need to solve. Global representation:

 α1  α2

y

=

[A1 , A2 , · · · , AK ]  ..  , .

=

Ax0 .

αK

2

Over-determined system: A ∈ RD×n , where D  n = n1 + · · · + nK . x0 encodes membership of y: If y belongs to Subject i, x0 = [ 0 ···

0 αi 0 ··· 0 ]T

∈ Rn .

Problems to face Solving for x0 in RD is intractable. True solution x0 is sparse: Average

Allen Y. Yang

1 K

terms non-zero.

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Dimensionality Redunction 1

Construct linear projection R ∈ Rd×D , d is the feature dimension, d  D. . ˜ 0 ∈ Rd . ˜ y = Ry = RAx0 = Ax ˜ ∈ Rd×n , but x0 is unchanged. A

2

Holistic features Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005]

3

Partial features

4

Unconventional features Downsampled faces Random projections

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

`0 -Minimization 1

Solving for sparsest solution via `0 -Minimization ˜ x0 = arg min kxk0 s.t. ˜ y = Ax. x

k · k0 simply counts the number of nonzero terms. 2

`0 -Ball `0 -ball is not convex. `0 -minimization is NP-hard.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

`1 /`0 Equivalence 1

Compressed sensing: If x0 is sparse enough, `0 -minimization is equivalent to (P1 )

˜ min kxk1 s.t. ˜ y = Ax.

kxk1 = |x1 | + |x2 | + · · · + |xn |. 2

`1 -Ball

`1 -Minimization is convex. Solution equal to `0 -minimization.

3

`1 /`0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] ˜ 0 , there exists equivalence breakdown point (EBP) ρ(A), ˜ if kx0 k0 < ρ: Given ˜ y = Ax `1 -solution is unique x1 = x0

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

`1 -Minimization Routines Matching pursuit [Mallat 1993] 1 2 3

˜ with y: i = arg max hy, vj i. Find most correlated vector vi in A ˜ ←A ˜ˆi , xi ← hy, vi i, y ← y − xi vi . A Repeat until kyk < .

Basis pursuit [Chen 1998] 1 2

Assume x0 is m-sparse. ˜ as a basis Select m linearly independent vectors Bm in A †

xm = Bm y. 3 4

˜ if improve ky − Bm xm k. Repeat swapping one basis vector in Bm with another vector in A If ky − Bm xm k2 < , stop.

˜ 0 + z ∈ Rd , where kzk2 <  Quadratic solvers: ˜ y = Ax x∗

=

˜ 2} arg min{kxk1 + λky − Axk

[Lasso, Second-order cone programming]: More expensive. Matlab Toolboxes `1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Classification

1

Project x1 onto face subspaces:  α1  0



0 α2





0 0



  δ1 (x1 ) =  ..  , δ2 (x1 ) =  .  , · · · , δK (x1 ) =  ..  . .. . . 0

2

0

αK

˜ i (x1 )k2 for Subject i: Define residual ri = k˜ y − Aδ id(y) = arg mini=1,··· ,K {ri }

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

(1)

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

AR Database 100 Subjects (Illumination and Expression Variance)

Table: I. Nearest Neighbor Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]

30 68.1 73.1 56.7 51.7 83.4

54 74.8 77.1 63.7 60.9 86.8

Table: II. Nearest Subspace 130 79.3 83.8 71.4 69.2 N/A

540 80.5 89.7 75 73.7 N/A

30 64.1 66 59.2 56.2 80.3

30 73 73.4 54.1 51.4 86.3

Allen Y. Yang

54 84.3 85.8 70.8 73 93.3

130 82 84.3 80 77 N/A

540 85.1 90.3 83.3 82.1 N/A

Table: IV. `1 -Minimization

Table: III. Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]

54 77.1 77.5 68.2 67.7 85.8

130 89 90.8 81.6 83.4 N/A

< [email protected]>

540 92 95.7 88.8 90.3 N/A

30 71.1 73.7 57.8 46.8 87

54 80 84.7 75.5 67 92.3

130 85.7 91 87.6 84.6 N/A

540 92 94.3 94.7 93.9 N/A

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Sparsity vs. Non-sparsity: `1 and SVM decisively outperform NN and NS. 1

Our framework seeks sparsity in representation of y.

2

SVM seeks sparsity in decision boundaries on A = [v1 , · · · , vn ].

3

NN and NS do not enforce sparsity.

`1 -Minimization vs. SVM: Performance of SVM depends on the choice of features. 1

Random project performs poorly with SVMs.

2

`1 -Minimization guarantees performance convergence with different features.

3

At lower-dimensional space, Fisher features outperform. Table: IV. `1 -Minimization

Table: III. Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]

30 73 73.4 54.1 51.4 86.3

Allen Y. Yang

54 84.3 85.8 70.8 73 93.3

130 89 90.8 81.6 83.4 N/A

< [email protected]>

540 92 95.7 88.8 90.3 N/A

30 71.1 73.7 57.8 46.8 87

54 80 84.7 75.5 67 92.3

130 85.7 91 87.6 84.6 N/A

540 92 94.3 94.7 93.9 N/A

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Randomfaces Blessing of Dimensionality [Donoho 2000] In high-dimensional data space RD , with overwhelming probability, `1 /`0 equivalence holds for random projection R.

Unconventional properties: 1

Domain independent!

2

Data independent!

3

Fast to generate and compute!

Reference: Yang et al. Feature selection in face recognition: A sparse representation perspective. Berkeley Tech Report, 2007.

Allen Y. Yang



Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Variation: Outlier Rejection `1 -Coefficients for invalid images

Outlier Rejection When `1 -solution is not sparse or concentrated to one subspace, the test sample is invalid. . K · maxi kδi (x)k1 /kxk1 − 1 Sparsity Concentration Index: SCI(x) = ∈ [0, 1]. K −1

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Variation: Occlusion Compensation

1

Sparse representation + sparse error y = Ax + e

2

Occlusion compensation: y= A

|

I

   x = Bw e

Reference: Wright et al. Robust face recognition via sparse representation. UIUC Tech Report, 2007.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Distributed Pattern Recognition

Figure: d-Oracle: Distributed object recognition via camera wireless networks.

Key components: 1 Each sensor only observes partial profile of the event: Demand a global classification framework. 2

Individual sensor obtains limited classification ability: Sensors become active only when certain events are locally detected.

3

The network configuration is dynamic: Global classifier needs to adapt to change of active sensors.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Problem Formulation for Distributed Action Recognition Architecture 8 sensors distributed on human body. Location of the sensors are given and fixed. Each sensor carries triaxial accelerometer and biaxial gyroscope. Sampling frequency: 20Hz.

Figure: Readings from 8 x-axis accelerometers and x-axis gyroscopes for a stand-kneel-stand sequence.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Challenges for Distributed Action Recognition 1

Simultaneous segmentation and classification.

2

Individual sensors not sufficient to classify all human actions.

3

Simulate sensor failure and network congestion by different subsets of active sensors.

4

Identity independence: The prior examples of the subject for testing are excluded as part of training data.

Figure: Same actions performed by two subjects.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Mixture Subspace Model for Distributed Action Recognition 1

Training samples are segmented manually with correct labels.

2

On each sensor node i, normalize the vector form (via stacking) vi = [x(1), · · · , x(h), y (1), · · · , y (h), z(1), · · · , z(h), θ(1), · · · , θ(h), ρ(1), · · · , ρ(h)]T ∈ R5h

3

Full body motion v1

y  1

!

.. .

Training sample: v =

Test sample: y =  ..  ∈ R8·5h .

v8 4

y8

Mixture subspace model y 



v1

y =  ..  =  .

.. .

1

y8

Allen Y. Yang

v8

< [email protected]>

v1

! ,··· , 1

.. .

v8

!   x = Ax. n

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Localized Classifiers Distributed Sparse Representation y   v ! 1 1  ..  =  .. ,··· , . . y8

v8

1

 !    y1 =(v1,1 ,··· ,v1,n )x .. .. x ⇔ .  .  v8 n y8 =(v8,1 ,··· ,v8,n )x v1

On each sensor node i: 1

Given a (long) test sequence at time t, apply multiple duration hypotheses: yi ∈ R5h .

2

Choose Fisher features Ri ∈ R10×5h : ˜ i x ∈ R10 ˜ yi = Ri yi = Ri Ai x = A

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Localized Classifiers 1

Equivalently, define Ri0 = (0 · · · Ri · · · 0) y  1



˜ yi = (0 · · · Ri · · · 0)  ..  = (0 · · · Ri · · · 0)  . y8

2

v1

.. .

v8

v1

! ,··· ,

!  ..  x = Ri0 Ax ∈ R10 .

v8

1

n

For all segmentation hypotheses, apply sparsity concentration index (SCI) threshold σ1 :

(a) valid segmentation

(b) invalid segmentation

Local Sparsity Threshold σ1 If SCI(x) > σ1 , sensor i becomes active and transmits ˜ yi ∈ R10 . ˜ yi provides a segmentation hypothesis at time t and length hi .

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Adaptive Global Classifier Adaptive classification for a subset  of active sensors(Suppose 1, . . . , L at time t and hi ) R1 ···

0 ··· 0

Define global feature matrix R 0 =  .. . . .. . . . 0 ··· R  ˜y   y L 1

1

.. : .  

··· 0

A1

 ..  = R 0  ..  = R 0  ..  x = R 0 Ax . . . ˜ yL

y8

A8

Global segmentation: Regardless of L active sensors, given a global threshold σ2 , If SCI(x) > σ2 , accept y as global segmentation with label given by x. Distributed Classification via Compressed Sensing 1

Reformulate adaptive classification via feature matrix R 0 : 

R1 ···

Local: R 0 = (0 · · · Ri · · · 0)

0 ··· 0

Global: R 0 =  .. . . .. . . .



..  ⇔ R 0 y = R 0 Ax .

0 ··· RL ··· 0

2

The representation x and training matrix A remain invariant.

3

Segmentation, recognition, and outlier rejection are unified on x.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Experiment Algorithm only depends on two parameters: (σ1 , σ2 ). Data communications between sensors and station are 10-D action features. Precision vs Recall: Sensors Prec [%] Rec [%]

2 89.8 65

7 94.6 61.5

2,7 94.4 82.5

1,2,7 92.8 80.6

1- 3, 7,8 94.6 89.5

1- 8 98.8 94.2

Confusion Tables:

(a) Sensor 1-8

(b) Sensor 7

Reference: Yang et al. Distributed Segmentation and Classification of Human Actions Using a Wearable Motion Sensor Network. Berkeley Tech Report, 2007.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion 1

Sparsity is important for classification of HD data.

2

A new recognition framework via compressed sensing.

3

In HD feature space, choosing an “optimal” feature becomes not significant.

4

Randomfaces, outliers, occlusion.

5

Distributed pattern recognition in body sensor networks.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Future Directions Distributed camera networks

Biosensor networks in health care

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning

Conclusion

Introduction

Classification via Sparse Representation

Distributed Pattern Recognition

Conclusion

Acknowledgments Collaborators Berkeley: Shankar Sastry, Ruzena Bajcsy UIUC: Yi Ma UT-Dallas: Roozbeh Jafari MATLAB Toolboxes `1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. References Robust face recognition via sparse representation. Submitted to PAMI, 2008. Distributed segmentation and classification of human actions using a wearable motion sensor network. Berkeley Tech Report 2007.

Allen Y. Yang

< [email protected]>

Compressed Sensing Meets Machine Learning