camera calibration pose estimation

Camera Calibration and Pose Estimation

1

Camera Calibration and Pose Estimation The purpose of camera calibration is to determine intrinsic camera parameters: c0 , r0 , sx , sy , and f . Camera calibration is also referred to as interior orientation problem in photogrammetry. The goal of pose estimation is to determine exterior camera parameters: ω, φ, κ, tx , ty , and tz . In other words, pose estimation is to determine the position and orientation of the object coordinate frame relative to the camera coordinate frame or vice versus.

2

Linear Algebra Basics Understand the followings • Range, span, null space, determinant, and rank of a matrix • Eigen-value and eigen-vector of a matrix • Pseudo-inverse (generalized) inverse of a matrix • Singular value decomposition (SVD) and its relationship to eigen-value and eigen-vector • Setup of a system of linear equations and its linear least squares solution • Homogeneous coordinate system

3

Camera Calibration The purpose of camera calibration is to determine the intrinsic camera parameters (c0 , r0 ), f, sx , sy , skew parameter (s = cot α), and the lens distortion (radial distortion coefficient k1 ). Skew parameter defines the angle between the c and r axes of a pixel. For most CCD camera, we have rectangular pixel. The skew angle is 90. The skew parameter is therefore zero. k1 is often very small and can be assumed to be zero. The intrinsic camera matrix is 

 W = 

f sx

cot α

0

f sy sin α

0

0

If we assume α = 90, then we have 4

c0



 r0   1



 W = 

f sx

0

0

f sy

0

0

c0



 r0   1

f sx and f sy are collectively treated as two separate parameters instead of three parameters.

5

Camera Calibration Methods • Conventional method: use a calibration pattern that consists of a 2D and 3D data to compute camera parameters. It needs a single image of the calibration pattern. • camera self-calibration: only need 2D image data of the calibration pattern but need multiple 2D image data from multiple views. Problem is often less constrained than the conventional method.

6

Conventional Camera Calibration Given a 3D calibration pattern, extract image features from the calibration pattern. Use 2D image features and the corresponding 3D features on the calibration pattern. The features can be points, lines, or curves or their combination. • Determine the projection matrix P • Derive camera parameters from P

Pf ull



sx f r 1 + c 0 r 3

 =  sy f r2 + r0 r3 r3

7

sx f tx + c0 tz



 sy f ty + r0 tz   tz

Compute P: Linear Method using 2D/3D Points t = (c , r ) and the corresponding 3D Given image points m2×1 i i i points Mi3×1 = (xi yi zi )t , where i=1, . . . , N, we want to compute P . Let P be represented as   t p1 p14   t  (1) P =  p2 p24   pt3 p34

where pi , i=1,2,3 are 3 × 1 vectors and pi4 , i=1,2,3, are scalers.

8

Compute P: Linear Method (cont’d) Then for each pair of 2D-3D points, we have Mit p1 + p14 − ci Mit p3 − ci p34

Mit p2 + p24 − ri Mit p3 − ri p34

=

0

=

0

Each pair of 2D/3D points gives two equations on the projection matrix. For 12 unknowns in the projection matrix, we need a minimum of N=6 pairs of 2D/3D points. If N < 6, there is an infinite number of solution. For N = 6 points, there is one unique exact solutions. We can setup a system of linear equations to exactly solve for V , i.e., AV = 0 where A is a 12 × 12 matrix depending only on the 3-D and 2-D coordinates of the calibration points, and V is a 12 × 1 vector 9

(pt1 p14 pt2 p24 pt3 p34 )t .

10



    A=    

M1t ~0

1

~0

0

M1t

1

~0

0

t 0 MN

1

−r1

0

t −cN MN

−cN

t 1 −rN MN

where ~01×3 = [0 0 0]

11

−c1

−r1 M1t

.. . t MN ~0

−c1 M1t

−rN

         

Compute P: Linear Method (cont’d) For N > 6, there is not an exact solution but we can find the linear least squares solution by minimizing ǫ2 = ||AV ||22 , i.e., V ∗ = arg max ||AV ||22 V

Taking derivative of ǫ2 = ||AV ||22 w.r.t V and setting it to zero yields ∂ǫ2 ∂V At AV

= =

(AV )t (AV ) =0 ∂V 0

12

Compute P: Linear Method (cont’d) For the linear method to work, we need N >= 6 and the N points cannot be coplanar points.

13

Rank of A The solutions depend on the rank of A. In general, given AV = 0, the rank of A is less than 12, which means the solution is not unique. But due to effect of noise and locational errors, A may be full rank, which means only the trivial solution V = 0 exists. Besides the number of points, the rank of A may also change for certain special configurations of input 3D points, for example collinear points, coplanar points, etc. The issue is of practical relevance.

14

Rank of A Rank of A is a function of the input points configurations (see section 3.4.1.3 of Faugeras). If 3D points are coplanar, then rank(A) < 11 (in fact, it equals 8), which means there is an infinite number of solutions. Faugeras proves that 1) in general for non-coplanar of more than 6 points, Rank(A)=11; 2) for coplanar points (N ≥4), rank(A)=8 since three points are needed to determine a plane (11-3=8). How about the rank of A if points are located a sphere or on the planes that are orthogonal or parallel to each other ? Hint: how many points are needed to determine a sphere ?

15

Linear Solution 1 Given ǫ2 = ||AV ||22 , the linear least-squares solution is V∗

=

arg min ǫ2 , which leads to V

At AV = 0 Solution to At AV = 0 is not unique (up to a scale factor). It lies in the null space of At A. As the null space of At A is that same at that of A, we can find the null space of A. If rank(A)=11, then V is the only null vector of A, multiplied by a scaler. If on the other hand, rank(A) ≤ 11, the the solution to X is the linear combinations of all null vectors of A. The null space of A can be obtained by performing SVD on A, yielding Am×n = U m×m D m×n (S t )n×n 16

where U and S are orthnormal matrix and D is a diagonal matrix, whose diagonal entries are called singular values and are arranged in descending order. If rank(A)=11, the only null vector A is the last column (corresponding to the smallest singular value) of S matrix. Alternatively, we can also solve V by directly minimizing ||AV ||22 , which yields (At A)V = 0 or (At A)V = λV , where λ = 0. As a result, solution to V is the eign vector of matrix (At A), corresponding to zero eigen value. This implies that the eignvectors of At A correspond to the columns of the S matrix. Note V is solved only up to a scale factor. The scale factor can be recovered using the fact thatq||p3 ||2 = 1. Let the null vector of A

be V’. The scale factor α =

1 V ′2 (9)+V ′2 (10)+V ′2 (11) .

V = αV ′ .

17

Hence,

Linear Solution 2 Let A = [B b], where  B

=

b =

        

M1t ~0

1

~0

0

M1t

1

−r1 M1t

1

~0

0

t −cN MN

0

.. . t MN ~0

t 0 MN

1

−c1 M1t

t −rN MN

(−c1 − r1 . . . − cN − rN )t

18

         

V

=

Y

=



p34 

Y 1

 

(pt1 p14 pt2 p24 pt3 )t /p34

Then, AV = p34 (BY + b). Since p34 is a constant, minimizing ||AV ||2 is the same as minimizing ||BY + b||2 , whose solution is Y = −(B t B)−1 B t b. The rank of matrix B must be eleven.

19

Linear Solution 2 (cont’d) The solution to Y is up to a scale factor p34 . To recover the scale factor, we can use the fact that |p3 | = 1. The scale factor p34 can 1 , where Y (9), Y (10), be recovered as p34 = √ 2 2 2 Y (9)+Y (10)+Y (11)

and Y (11) are the last 3 elements of Y . The final projection matrix (vector) is therefore equal to   p34 Y   V = p34

20

Linear Solution 3 Imposing the orthonormal constraint, Rt = R−1 , i.e., minimize ||AV ||2 subject to Rt = R−1 . Solution to this problem is non-linear !.

21

Linear Solution 3 (cont’d) To yield a linear solution, we can impose one of the normal constraints, i.e., ||p3 ||2 = 1, then the problem is converted to a constrained linear least-squares problem. That is, minimize ||AV ||22 subject to ||p3 ||22 = 1. ǫ2 = ||AV ||22 + λ(||p3 ||22 − 1)

22

(2)

Decomposing A into two matrices B and C, and V into Y and Z A = (B C)   Y   V = Z 

 2N ×9 B =  

Mt 1 ~ 0

1

~ 0

0

−c1

0

Mt 1

1

−r1

. . . Mt N ~ 0

1 0

~ 0 Mt N

0

−cN

1

−rN

23

   

C

2N ×3



 =  

−c1 M t 1 −r1 M t 1 . . . −cN M t N −rN M t N

   



p1   p  14  Y =  p2   p24  p34

         

Z = p3

24

Then equation 2 can be rewritten as ǫ2 = ||BY + CZ||22 + λ(||Z||22 − 1)

25

Taking partial derivatives of ǫ2 with respect to Y and Z and setting them to zeros yield ∂ǫ2 = 0 =⇒ Y = −(B t B)−1 B t CZ ∂Y ∂ǫ2 = 0 =⇒ C t (I − B(B t B)−1 B t )CZ = λZ ∂Z Apparently, the solution to Z is the eigenvector of matrix C t (I − B(B t B)−1 B t )C. Given Z, we can then obtain solution to Y.

26

Substituting Y into ||BY + CZ||2 leads to ||BY + CZ||2

= = = =

|| − B(B t B)−1 B t CZ + CZ||2

||(I − B(B t B)−1 B t )CZ||2

Z t C t (I − B(B t B)−1 B t )t (I − B(B t B)−1 B t )CZ

Z t C t (I − B(B t B)−1 B t )CZ

=

Z t λZ

=

λ

This proves that solution to Z corresponds to the eigen vector of the smallest positive eigenvalue of matrix C t (I − B(B t B)−1 B t )C. Note (I − B(B t B)−1 B t )(I − B(B t B)−1 B t ) = (I − B(B t B)−1 B t ) 27

Other Linear Techniques • Another new method at https://www.ecse.rpi.edu/~ qji/CV/new_method.pdf

28

Compute P: Unconstrained Non-linear Method Let the 3D points be Mi = (xi , yi , zi )t and the corresponding image points be mi = (ci , ri )t for i = 1, 2, . . . , N . The criterion function to minimize is the sum of the squared projection errors N X Mit p1 + p14 Mit p2 + p24 2 2 ( t − ci ) + ( t − r i )2 ǫ = Mi p3 + p34 Mi p3 + p34 i=1

29

(3)

Compute P: Unconstrained Non-linear Method Let cˆi (p1 , p3 ) =

Mit p1 +p14 Mit p3 +p34



    ˆ Let V(p1 , p2 , p3 ) =     

cˆ1

and rˆi (p2 , p3 ) =  

   rˆ1      ..  .  and V =       cˆN   rˆN

Mit p2 +p24 Mit p3 +p34

c1



 r1   ..  .    cN   rN

The objective function in Eq. 3 can be re-written as ǫ2

=

N X Mit p2 + p24 Mit p1 + p14 2 − ci ) + ( t − r i )2 ( t Mi p3 + p34 Mi p3 + p34 i=1

=

ˆ − V)t (V ˆ − V) (V 30

(4)

Compute P: Constrained Non-linear Method 1 The criterion function to minimize is N X Mit p1 + p14 Mit p2 + p24 2 2 ( t − ci ) + ( t − r i )2 ǫ = Mi p3 + p34 Mi p3 + p34 i=1

(5)

subject to ||p3 ||22 = 1. Introducing the lagrangian multiplier λ1 , we have the constrained objective function

ǫ2

= + =

N X Mit p2 + p24 Mit p1 + p14 2 − ci ) + ( t − r i )2 ( t Mi p3 + p34 Mi p3 + p34 i=1

λ1 (||p3 ||22 − 1) ˆ − V)t (V ˆ − V) + λ1 (||p3 ||22 − 1) (V 31

(6)

Compute P: Constrained Non-linear Method 2 The criterion function to minimize is N X Mit p1 + p14 Mit p2 + p24 2 2 ( t ǫ = − ci ) + ( t − r i )2 Mi p3 + p34 Mi p3 + p34 i=1

subject to ||p3 ||22 = 1 and (p1 × p3 ) · (p2 × p3 ) = 0, i.e., the full orthonormal constraints on R.

32

(7)

Compute P: Constrained Non-linear Method 2 (cont’d) Introducing the lagrangian multipliers (λs), we have the constrained objective function

ǫ2

= + = +

N X Mit p1 + p14 Mit p2 + p24 2 ( t − ci ) + ( t − r i )2 Mi p3 + p34 Mi p3 + p34 i=1

λ1 (||p3 ||22 − 1) + λ2 (p1 × p3 ) · (p2 × p3 ) ˆ − V)t (V ˆ − V) (V

λ1 (||p3 ||22 − 1) + λ2 [(p1 × p3 ) · (p2 × p3 )]

33

(8)

Non-linear Least-Squares (NLS) Solutions The solutions to non-linear least-squares problems are typically iterative, starting from an initial guess. • The gradient descent method (first order, applicable to any objective function) • Gauss-Newton method (or Levenberg-Marquardt) (first order, applicable to only NLQ) • The Newton method (second order, applicable to any objective function) Given an initial pi , non-linear methods iteratively update pi , i.e., + η∆pi pti = pt−1 i They differ in the way ∆pi is computed. 34

Gradient Descent Method First order gradient descent may be used to solve P iteratively. For objective function 8, we can solve p3 as follows pt3 = pt−1 + η∆p3 3 where η is the learning rate and ∆p3 = −∇p3 ǫ2 . It can be computed as follows

∇p3 ǫ

2

= + +

∂ Vˆ ˆ 2 (V − V) ∂p3 ∂(p1 × p3 ) · (p2 × p3 ) 2λ1 p3 + λ2 [ ∂p3 ∂(p2 × p3 ) · (p1 × p3 )] ∂p3 35

where ∂ Vˆ = ∂p3

∂ˆ c1 ∂ˆ r1 ∂p3 ∂p3

∂ˆ cN ... ∂p3

∂ˆ rN ∂p3

And ∂(p1 × p3 ) ∂p3

=

p1 ×

∂p3 ∂p3

∂p3 ∂p1 + p3 × ∂p3 ∂p3

∂p3 is a 3 × 3 identity matrix. p1 × ∂p is cross product between a 3 vector and a matrix, which can be carried out by cross product of the vector with each column of the matrix, yielding another 3 × 3 2 ×p3 ) matrix. Similarly, we compute ∂(p∂p . We can follow the same 3 to compute p1 , p14 , p2 , p24 and p34 .

36

Note other criterion function such as the Sampson error (section 3.2.6 of Hartley’s book) function. Sampson represents the first order approximation to the geometric error in equation 3.

37

Gauss-Newton Method For unconstrained non-linear least squares problem, we can apply the Gauss-Newton method. For p3 in Eq. 4, we have pt3 = pt−1 + η∆p3 3 where p3 in Eq. 4 can be computed as follows ∂ Vˆ ∂ Vˆ t −1 ∂ Vˆ ˆ )( )] ( )(V − V) ∆p3 = −[( ∂p3 ∂p3 ∂p3 We can similarly solve for p1 and p2 .

38

Gauss-Newton Method For constrained non-linear least squares problem, let C(p3 ) be the equality constraint on p3 . The objective function changes to ˆ − V||22 + λC(p3 ) ǫ2 = ||V The Gauss-Newton method can be changed as follows. For p3 in above equation, we have pt3 = pt−1 + η∆p3 3 where p3 in Eq. 4 can be computed as follows ∂ Vˆ ∂ Vˆ t −1 ∂ Vˆ ˆ − V) + λ ∂C ] )( )] ( )[−(V ∆p3 = [( ∂p3 ∂p3 ∂p3 ∂p3 We can similarly solve for p1 and p2 .

39

Gauss-Newton Method Detailed derivations of Gauss-Newton method: First order Taylor expansion of Vˆ w.r.t. to current p03 , yields ∂ Vˆ (p03 ) t 0 ˆ ˆ ) ∆p3 V (p3 ) = V (p ) + ( 3

∂p3

Substituting it into Eq. 4 yields 2

ǫ =

[Vˆ (p03 )

∂ Vˆ (p03 ) t ∂ Vˆ (p03 ) t t ˆ 0 ) ∆p3 − V ] [V (p3 ) + ( ) ∆p3 − V ] +( ∂p3 ∂p3

Taking partial derivative of ǫ2 wrt to ∆p3 and the setting the result to zero yields ∂ǫ2 ∂ Vˆ (p03 ) ˆ 0 ∂ Vˆ (p03 ) t =2 [V (p3 ) + ( ) ∆p3 − V ] = 0 ∂∆p3 ∂p3 ∂p3

40

Solving ∆p3 yields ∂ Vˆ (p03 ) ∂ Vˆ (p03 ) t −1 ∂ Vˆ (p03 ) ˆ 0 )( )] ( )(V (p3 ) − V ) ∆p3 = −[( ∂p3 ∂p3 ∂p3 With constraint C(p3 ), ǫ

2

= +

∂ Vˆ (p03 ) t ∂ Vˆ (p03 ) t t ˆ 0 ) ∆p3 − V ] [V (p3 ) + ) ∆p3 − V ] +( ∂p3 ∂p3 ∂C(p03 ) t 0 λ[C(p3 ) + ( ) ∆p3 )] ∂p3

[Vˆ (p03 )

Hence, ∂C ∂ Vˆ ∂ Vˆ t −1 ∂ Vˆ ˆ )( ) ] ( )[−( V − V) + λ ] ∆p3 = [( ∂p3 ∂p3 ∂p3 ∂p3

41

Levenberg Marquardt (LM) algorithm LM algorithm is an improved version of Gauss-Newton method, i.e., for p3 in Eq. 4, we have pt3 = pt−1 + η∆p3 3 where ∆p3 is computed differently Unconstrained LM: ˆ ∂ Vˆ ∂ Vˆ t −1 ∂ V ˆ − V) ∆p3 = −[( )( ) + αI] ( )(V ∂p3 ∂p3 ∂p3 Constrained LM: ˆ ∂ Vˆ ∂ Vˆ t −1 ∂ V ¯ − V) + λ ∂C ] ∆p3 = [( )( ) + αI] ( )[−(V ∂p3 ∂p3 ∂p3 ∂p3 where I is an identity matrix and α is a damping factor that varies with each iteration. The iteration starts with a large α and 42

gradually reduces α value as the iteration goes. With a small α, the LM algorithm becomes Gauss-Newton method and becomes the gradient descent method with a large α.

43

Unconstrained Newton Method Newton method is the second order method for mainly solving unconstrained optimization. It assumes the objective function is second order differentiable. Given the objective function ˆ − V)t (V ˆ − V) ǫ 2 = (V Take second order Taylor expansion of ǫ2 wrt to p03 yields ǫ

2

∂ǫ2 (p03 ) t ) ∆p3 − V) + ( ∂p3

=

ˆ 03 ) (V(p

+

1 ∂ 2 ǫ2 (p03 ) t t ( ) (∆p )(∆p ) 3 3 2 ∂p23

− V)

t

ˆ 03 ) (V(p

∂ǫ2 ∂ǫ2 (p03 ) ∂ 2 ǫ2 (p03 ) )∆p3 = 0 = +( 2 ∂∆p3 ∂p3 ∂p3 44

Hence

where

∂ 2 ǫ2 (p03 ) −1 ∂ǫ2 (p03 ) ) ( ) ∆p3 = −( 2 ∂p3 ∂p3 ∂ 2 ǫ2 ∂p23

is the Hessian matrix and

∂ǫ2 ∂p3

pt3 = pt−1 + η∆p3 3

45

is the Jacobian matrix.

Constrained Newton Method Given the constrained objective function with constraint C(p3 ) on p3 ˆ − V)t (V ˆ − V) + λC(p3 ) ǫ 2 = (V

Take second order Taylor expansion of ǫ2 wrt to p03 yields ǫ

2

∂ǫ2 (p03 ) t ) ∆p3 − V) + ( ∂p3

=

ˆ 03 ) (V(p

+

1 ∂ 2 ǫ2 (p03 ) t t ( ) (∆p )(∆p ) 3 3 2 ∂p23

+

2 0 0 ∂ C(p 1 ∂C(p ) 3) t 3 t 0 t λ[C(p3 ) + ( ) (∆p )(∆p ) ] ) ∆p3 + ( 3 3 ∂p3 2 ∂p23

− V)

t

ˆ 03 ) (V(p

46

∂ǫ2 (p03 ) ∂ 2 ǫ2 (p03 ) ∂ǫ2 = +( )∆p3 2 ∂∆p3 ∂p3 ∂p3 ∂C(p03 ) ∂ 2 C(p03 ) +λ + λ( )∆p3 = 0 2 ∂p3 ∂p3 Hence ∂ 2 ǫ2 (p03 ) ∂ 2 C(p03 ) −1 ∂ǫ2 (p03 ) ∂C(p03 ) ∆p3 = −[ +λ ] [ +λ ] ∂p23 ∂p23 ∂p3 ∂p3 pt3 = pt−1 + λ∆p3 3

47

Compute P: Non-linear Method (cont’d) Another way of solving this problem is to perform minimization directly with respect to the intrinsic and extrinsic parameters.

48

Linear Method v.s Non-linear Method • Linear method is simple but less accurate and less robust • Linear solution can be made robust via the robust method such as the RANSAC method • Linear method does not require initial estimate • Non-linear method is more accurate and robust but complex and require good initial estimates The common approach in CV is two steps: • Use a linear method to obtain initial estimates of the camera parameters. • Refine the initial estimates using an non-linear method. 49

Data Normalization Hartley introduces a data normalization technique to improve estimation accuracy. Details of the normalization technique may be found on section 3.4.4 of Hartley’s book. This normalization should precede all estimation that involves image data. A brief discussion of this normalization procedure can also be found at page 156 of Trucco’s book.

50

Robust Linear Method with RANSAC The linear LSQ method is sensitive to image errors and outliers. One solution is to use a robust method. The most commonly used robust method in CV is the RANSAC (Random Sample Consensus) method. It works as follows • Step 1: Randomly pick a subset of K points from N (K>6) pixels in the image and compute the projection matrix P using the selected points. • Step 2: For each of the remaining pixels in the image, compute its projection error using the P computed from step 1. If it is within a threshold distance, increment a counter of the number of points (the “inliers”) that agree with the hypothesized P. • Step 3: Repeat Steps 1 and 2 for a sufficient number of times 51

a

, and then select the subset of points corresponding to the P with the largest count. • Step 4: Using the subset of points selected in Step 3 plus all of the other inlier points which contributed to the count, recompute the best P for all of these points. a the

exact number of times is determined by the required probability that one of subset does not contain the outliers

52

Compute Camera Parameters from P 

sx f r1t

+

c0 r3t

sx f tx + c0 tz



 sy f ty + r0 tz   tz

 t t P =  sy f r2 + r0 r3 r3t r3

=

p3

tz = p34

c0

=

sx f

=

sy f

=

tx

=

pt1 p3 r0 = pt2 p3 q pt1 p1 − c20 = ||p1 × p3 || q pt2 p2 − r02 = ||p2 × p3 ||

ty

=

(p14 − c0 tz )/(sx f )

(p24 − r0 tz )/(sy f ) 53

r1

=

(p1 − c0 r3 )/(sx f )

r2

=

(p2 − r0 r3 )/(sy f )

54

Compute Camera Parameters from P (cont’d) Alternatively, we can compute W algebraically from P. Since P = W M = W [R T ] = [W R W T ], let P3 be the first 3 × 3 submatrix of P , the P3 = W R. Following the RQ decomposition algorithm (different from the well-known QR decomposition) in https://math.stackexchange.com/posts/1640762/revisions We can decompose P3 into P3 = AB, where A is an upper triangle matrix and B is an orthnormal matrix. Hence, we have W = A and R = B. Given W , T can be recovered as T = W −1 P4 , where P4 is the last column of P . 55

Approximate solution to imposing orthonormality ˆ be the estimated rotation matrix R. It is not orthonormal. Let R ˆ via We can find an orthonormal matrix R that is closest to R ˆ = U DV t , find another matrix E that is closest to D SVD. Let R and that satisfies E −1 = E t . If we take E be a diagonal matrix, then we have E = I, the identity matrix. As a result, we have a ˆˆ ˆˆ is the orthonormal = U IV t . R new estimate of R, which is R ˆ matrix that is closest to R.

56

Image Center using Vanishing Points Let Li , i=1,2, . . . , N be parallel lines in 3D, li be the corresponding image lines. Due to perspective projection, lines li appear to meet in a point, called vanishing point, defined as the common intersection of all the image lines ii . Given the orientation of the Li lines be N = (nx , ny , nz )t relative to the camera frame, then the coordinates of the vanishing point in the n image frame are ( nnxz , nyz ). Let T be the triangle on the image plane defined by the three vanishing points of three mutually orthogonal sets of parallel lines in space. The image center, i.e., the principal point (c0 , r0 ), is the orthocenter a of T . a it

is defined as the intersections of the three altitudes.

57

Non-Linear Direct Solution to Camera Parameters Let Θ g(Θ, Mi ) f (Θ, Mi )

=

(c0 r0 f sx sy ω φ κ tx ty tz )t

=

Mit p1 + p14 Mit p3 + p34

=

Mi2 p2 + p24 Mit p3 + p34

Then the problem can be stated as follows : Find Θ by minimizing ǫ2 =

N X i=1

[g(Mi , Θ) − ci ]2 + [f (Mi , Θ) − ri ]2

Gradient descent can be used to solve for each parameter 58

iteratively, i.e., Θt = Θt−1 − α∇Θ ǫ2 Other methods to solve for non-linear optimization include Newton method, Gauss-Newton, and Levenberg-Marquardt method. See chapter 3 of Forsyth and Ponce’s book on how these methods work. Refer to appendix 4 of Hartley’s book for additional iterative estimation methods. Non-linear methods all need good initial estimates to correctly converge. Implement one of the non-linear method using Matlab. It could improve the results a lot. Organize the objective function in matrix format and then introduce the gradient descent, the Gauss-Newton, and the Levenberg-Marquardt method directly with respect to the camera parameters. 59

Calibration with Planar Object Since planar points reduces the rank of A matrix to 8, we cannot uniquely solve for V using planar object as the solution to V is a linear combination of the 4 null vectors of A matrix (up to four scale factors). But this becomes possible if we acquire two images of the planar object, producing two A matrices. With each A providing 8 independent equations, we can have a total of 16 independent equations, theoretically sufficient to solve for the intrinsic matrix W. But since the two A matrices share the same W but different extrinsic parameters, the extrinsic parameters must be eliminated from the system of linear equations to only determine W.

60

Calibration with Planar Object (cont’d) For planar calibration object, we can place the object frame on the plane such that z=0. Hence, given the ith 2D/3D point, we have 

ci



   λi  ri   1

=

=

W



r1

xi

r2 

   H  yi   1





 xi   T   yi  1 (9)

where r1 ,r2 , and T are respectively the first and second columns 61

of the rotation matrix, and the translation vector. Given a minimum of 4 points, we can solve H using either linear or non-linear methods. Given H, we have W r1

=

h1

W r2

=

h2

(10)

Eq. 10 can be equivalently written as r1

=

W −1 h1

r2

=

W −1 h2

Using the fact that r1 and r2 are unit vectors and they are orthogonal to each other, we can derive the following ||r1 ||22

||r2 ||22

= =

1 ⇒ ht1 W −t W −1 h1 = 1

1 ⇒ ht2 W −t W −1 h2 = 1 62

(11)

rt1 r2

=

0 ⇒ ht1 W −t W −1 h2 = 0

(12)

Let K = W −t W −1 , Eq. 12 can be re-rewritten as ht1 Kh1 = 1 ht2 Kh2 = 1 ht1 Kh2 = 0

(13)

Eq. 13 provides three equations for K. We need another image to produce another three equations. In total, we have six equations to solve for six unknowns in K as K is symmetric. Given K, we can then apply matrix decomposition to solve for W . For details see Zhengyou Zhang’s calibration method at http://research.microsoft.com/en-us/um/people/zhang/Calib/

63

Camera Calibration using Lines and Conics Besides using points, we can also perform camera calibration using correspondences between 2D/3D lines and 2D/3D conics. Furthermore, we can also extend the point-based camera calibration to estimate the lens distortion coefficient k1 . These can be topics for the final project.

64

Camera Calibration under Weak Perspective Projection For weak perspective projection, we have    x ¯  c¯   = M2×3  y¯  r¯ z¯

   

Given 2D/3D relative coordinates (¯ ci , r¯i ) and (¯ xi , y¯i , z¯i ), the goal is to solve for matrix M . A minimum of 3 points are enough to uniquely solve for the matrix M . And, more importantly, these points can be coplanar points.

65

Camera Calibration with Weak Perspective Projection Given M and the parameterization for M introduced in the previous chapter, we have f sx z¯c f sy z¯c

=

|m1 |

=

|m2 |

where m1 and m2 are the first row and the second row of the M matrix. Then, r1

=

r2

=

r3

=

m1 |m1 | m2 |m2 | r1 × r2 66

With weak perspective projection, we need a minimum of 3 points. But we can only solve the above parameters, i.e., the f sy z¯c ,

and the orientation of the object frame relative to the camera frame.

67

f sx z¯c ,

Camera Calibration with Lens Distortion We present an approach for simultaneous linear estimation of the camera parameters and the lens distortion, based on the division lens distortion model proposed by Fitzgibbon a . According to the divisional model, we have     c − c0 cˆ − c0   = (1 + ks2 )   r − r0 rˆ − r0 where s2 = (ˆ c − c0 )2 + (ˆ r − r0 )2 . This is an approximation to the a the

paper appears in CVPR 01, pages 125-132

68

camera true distortion model. Hence,      x   cˆ − c0 c0      y  λ  rˆ − r0  = P    − λ    r0 2  z  1 + ks   0 1 1

   

After solving for λ = p3 (x y z 1)t and with some algebraic simplifications yield (D1 + kD2 )V = 0 where V=(p1 p2 p3 )t , pi is the ith row of matrix P, and   x y z 1 0 0 0 0 −ˆ cx −ˆ cy −ˆ cz −c0   D1 = 0 0 0 0 x y z 1 −ˆ r x −ˆ r y −ˆ rz −r0 69



D2 = 

xs2

ys2

zs2

s2

0

0

0

0

0

0

0

0

xs2

ys2

zs2

s2

−c0 xs2

−r0 xs2

−c0 ys2

−r0 ys2

The above equation can be solved as a polynomial eigen value problem. MATLAB function polyeig can be used to obtain the solution for both k and V . To use ployeig function, the matrices on the left side of above equations must be square matrices. To achieve this, multiple both sides of the above equation by (D1 + kD2 )t yields the following, which can be solved for using polyeig (D1t D1 + k(D1t D2 + D2t D1 ) + k 2 D2t D2 )V = 0b The solution, however, assumes the knowledge of the image center. We can fix it as the center of the image. Study shows that b when

k is small, D1 is close to the A matrix. 70

−c0

−r0

the precise location of the distortion center does not strongly affect the correction (see Ref. 9 of Fitz’s paper). Alternatively, we can perform alternation, i.e., assume image center as the principal point, then use the above to compute k and the internal camera parameters. Then, substitute the computed center back to recompute k and the camera parameters. This process repeats until it converges, i.e., when the change in the estimated parameters is small. The procedure can be summarized as follows: 1. Assume principal center is at image center. This allows to compute cˆ − c0 , rˆ − r0 , and s2 = (ˆ c − c0 )2 + (ˆ r − r0 )2 for each point. 2. Use Polyeig to solve for k and matrix P 3. Obtain the intrinsic camera parameters from P 71

4. Repeat steps 2) and 3) with the new principal center until the change in the computed intrinsic parameters is less than a pre-defined threshold.

72

Degeneracy with Camera Calibration Degeneracy occurs when the solution to the projection matrix is not unique due to special spatial point configurations. see sections 3.2.3 and 3.3.3 of Forsyth’s book.

73

Camera Self Calibration Self camera calibration refers to determining the interior camera parameters of a camera by using only image data obtained from different view directions or different view points. Either camera or the object must move to acquire different images.

74

Methods for Camera Self-Calibration • General camera movement (involving both rotation and translation) • Only rotational movement (same view point but different view directions) • Only translational movement (different view points but same view direction)

75

Camera Self-Calibration With Only Rotation By exploiting the fact that we do not have 3D information, we can locate the object frame anywhere we want to simplify the subsequent calculations. Let’s assume that we select the initial camera frame as the reference frame and the object frame coincide with the initial camera frame. Let the image generated by the initial camera frame be represented with subscript 0.

76

77







X





X







    X         Y  Y     λ0   = W  = W[I 0]  Y     r0  = WM0   Z   Z      Z 1 1 1 (14) c0

If we rotate the camera frame from the reference by a rotation matrix Ri , we have         X X     X ci          Y   Y     λi  ri  = WMi   = WRi  Y   = W[Ri 0]    Z   Z      Z 1 1 1 (15) 78

Denote λi =

λ0 λi ,

substituting 14 to 15 to remove (X, Y, Z)t yields 

ci





c0



     ri  = λi WRi W−1  r0      1 1

Let Bi = WRi W



ci

−1





Bi11

 =  Bi21 Bi31 

Bi11

    ri  = λi  Bi 21    1 Bi31

Bi12

Bi13

Bi22

Bi23

Bi32

Bi33

Bi12

Bi13

Bi22

Bi23

Bi32

Bi33

79

(16)



 , we have  

c0



    r0    1

(17)

This leads to three equations ci

=

λi (Bi11 c0 + Bi12 r0 + Bi13 )

ri

=

λi (Bi21 c0 + Bi22 r0 + Bi23 )

1 =

λi (Bi31 c0 + Bi32 r0 + Bi33 )

Since λi = 1/(Bi31 c0 + Bi32 r0 + Bi33 ), substituting λi to the above equations yields ci (Bi31 c0 + Bi32 r0 + Bi33 )

=

(Bi11 c0 + Bi12 r0 + Bi13 )

ri (Bi31 c0 + Bi32 r0 + Bi33 )

=

(Bi21 c0 + Bi22 r0 + Bi23 )

Given N points, we can setup a system of linear equations, through which we can solve B as the null vector of the measurement matrix up to a scale factor. Alternatively, we can divide the both sides of the above 2 80

(18)

equations by Bi33 and they can be rewritten in matrix format     −ci −c0 −r0 −1 0 0 0 c i c 0 c i r0   bi =   −ri 0 0 0 −c0 −r0 −1 ri c0 ri r0 (19) where bi = (Bi11 Bi12 Bi13 Bi21 Bi22 Bi23 Bi31 Bi32 )T /Bi33 . If we know at least 4 points in two images (such as i = 0, 1), we can solve for bi up to a scale factor. The scale factor can be solved using the fact that the determinant of WRi W−1 is unit. If Ri is known, then we can solve for W using the equation Bi W = W Ri . In this case, one rotation, i.e., a total of two images, is enough to solve for W . To solve for W with a unknown Ri . From Bi = WRi W−1 , we have −T = WT B−1 Ri = W−1 Bi W R−T i W i 81

Since R = R−T , therefore we have T (WWT )B−T = B (WW ) i i

Assume C = WWT , we have  sx f 0  C =  sy f  0 0 0  s2x f 2 + c20  =  c 0 r0  c0

c0



sx f

  r0   0 1 c0 c 0 r0 s2y f 2

+

r0

0 sy f

c0 r02

(20)

r0 

 r0   1

0



 0   1 (21)

equation 20 can be rewritten Bi CBT i −C=0 82

(22)

Since C is symmetric, Eq. 22 provides only six equations. To solve for C, it is necessary to use two Bi , i.e., two rotations, which leads to three images. Given two or more Bi , C can be solved using equation 22 up to a scale factor. The scale factor can subsequently be resolved using the fact that the last element of W is 1. Given C, from equation C = W W t , we can obtain W via SVD and RQ decomposition as detailed below. Since C is symmetric, from SVD, we have: C = U DU t where D is a diagonal matrix and U an orthonormal matrix, whose columns are the eigenvectors of C. Since D is diagonal, we may write D = EE t , where E is another diagonal matrix. As a 83

result, C =VVt where V = U E. Note the above SVD can be directly obtained via Choleski factorization. Since V is not upper triangular yet, we can perform a RQ decomposition on V , this yields to V = BQ, where B is an upper triangular matrix and Q is just an orthnormal matrix. Hence, W=B. We can also prove that the solution is unique. The solution requires C be positive definite.

84

Camera Self-Calibration With Only Translation Like for the previous case, the camera frame coincides with the object frame. We then translate the camera frame by Ti . For the image points in the reference frame and the newly translated frame, we have       X   X c0        Y     (23) λ0  r0  = W[I T0 ]   = W Y    Z    Z 1 1

85







X







  X        Y     + WTi λi  ri  = W[I Ti ]   = W Y    Z    Z 1 1 ci

From the above equations, we have     ci c0        λi  ri  = λ0  r0   + WTi 1 1 Equation 25 can be rewritten      sx f c0 ci          λi  ri  = λ0  r0  + 0 0 1 1 86

0 sy f 0

c0



(24)

(25)

tix



    r0   tiy   1 tiz

(26)

then, we get three equations, assuming T is known λi ci

=

λ0 c0 + (sx f tix + c0 tiz )

λi ri

=

λ0 r0 + (sy f tiy + r0 tiz )

λi

=

λ0 + tiz

a

(27)

Substituting λi = λ0 + tiz to the above two equations yields λ0 ci + tiz ci = λ0 ci + sx f tix + c0 tiz λ0 ri + tiz ri = λ0 ri + sy f tiy + r0 tiz Equation 25 can be rewritten     t c 0 tiz 0 c0 − ci t  W ′ =  iz i   ix tiz ri 0 tiy 0 tiz r0 − ri

(28)

(29)

where W′ = (sx f sy f c0 r0 λ0 )T . If we know at least 3 points a there

is no linear solution if T is unknown 87

in three images (producted by two translations), we can solve W’, then we can get the solution to W. Note the matrix is rank deficient (rank=3) (since the first 2x4 sub-matrix is the same for all points) if only using one translation, no matter how many points are used.

88

Camera Self Calibration Summary Camera self calibration can be carried out without using 3D data. It requires camera to move to produce different images. Camera motions can be • general motion-involving both rotation and translation. Solution is unstable and less robust • pure rotation-requires a minimum of two rotations (or three images) if rotation is unknown. If rotation is known, one rotation or two images is enough. • pure translation-requires a minimum of two translations and they must be known to have a linear solution. • degenerate motions may happen and they cannot be used for self calibration. 89

Pose Estimation The goal of pose estimation is to estimate the relative orientation and position between the object frame and the camera frame, i.e., determining R and T or extrinsic camera parameters. Pose estimation is an area that has many applications including HCI, robotics, photogtametry, etc..

90

Linear Pose Estimation For pose estimation, it is necessary to know 3D features and their 2D image projections. They are then used to solve for R and T. Assume W is known, then from the projection equation     x   c     y    = [R, T ]  λW −1   r    z    1 1 Given more than 6 sets of 2D/3D points, we can solve for R and T linearly in the similar fashion to that of linear camera calibration. The solution, however, does not impose the constraint that R be orthnormal, i.e., R−1 = Rt . 91

We can perform a postprocessing of the estimated R to find ˆ that is closest to R and orthnormal. See previous another R, pages for details. Alternatively, we can still have a linear solution if we impose one constraint, i.e., ||r3 || = 1 during optimization. If W is unknown, we can follow the same procedure as camera calibration to first solve for P and then extract R and T from P.

92

Non-linear Pose Estimation (cont’d) Alternatively, we can set it up as a non-linear optimization problem, with the constraint of R−1 = Rt imposed.

93

Pose Estimation Under Weak Perspective Projection For weak perspective projection, we have    x ¯  c¯   = M2×3  y¯  r¯ z¯

   

Given 2D/3D relative coordinates (¯ ci , r¯i ) and (¯ xi , y¯i , z¯i ), the goal is to solve for matrix M . A minimum of 3 points are enough to uniquely solve for the matrix M .

94

Pose Estimation Under Weak Perspective Projection Given M and the parameterization for M introduced in the previous chapter, we have f sx z¯c f sy z¯c

=

|m1 |

=

|m2 |

where m1 and m2 are the first row and the second row of the M matrix. Then, r1

=

r2

=

r3

=

m1 |m1 | m2 |m2 | r1 × r2 95

Pose Estimation Under Weak Perspective Projection Given R, assuming W is known (i.e., calibrated camera), T can be solved using     c x     −1    λW  r  = R  y  +T 1 z If W is unknown, 



c

=

f sx z¯c tx



x







=

f sx z¯c ty

  vx      = M2×3   y + r vy z where vx =

p14 p34

+ c0 and vy = 96

p24 p34

+ r0 . If we

assume c0 and r0 are in image center, we can solve for the translation tx and ty up to a scale factor. We can never solve for tz . With weak perspective projection, we need a minimum of 3 points. We can solve R but T up to a scale factor.

97