Variance Estimation NTA Mexico

Review of selected NTA profiles for Mexico 2004 and variance estimation IMG February 25, 2010 Outline I Variance est...

0 downloads 93 Views 2MB Size
Review of selected NTA profiles for Mexico 2004 and variance estimation IMG

February 25, 2010

Outline

I

Variance estimation for age profiles

I

Friedman’s Super Smoother (supsmu)

I

Private education consumption

I

Private asset-based reallocations

I

Labor income

I

Private transfers: remittances and interhh-inflows

Variance estimation for age profiles

I

Age profile estimation in NTA: ∑na y a wia yia y¯a = = ∑ na w a wia

(1)

where y¯a is the mean value of variable y (e.g. education) for individual aged a, wia is the sampling weight for the individual i aged a, na is the sampling size of individuals in the age group a. I

Complex design survey (CDS): estratified multi-stage cluster * Survey variables in CDS: 1) strata, 2) primary sampling units (PSU), 3) weights

Variance estimation for age profiles I

I I

Variance estimation for Simple Random Samples (SRS): ( ) 2 Var wy = sn ( ) Var (y ) Variance estimation for CDS: Var wy 6= Var (w ) Taylor series linearization method (TSL): let’s define r = then: var (¯ ya ) =

1 [var (y ) + r 2 · var (w ) − 2 · r · cov (y , w )] w2

where: ∑ ( ) [∑ ] 2 nh nh 2 − yh var (y ) = H y α=1 hα h=1 nh −1 nh ] ∑H ( nh ) [∑nh wh2 2 var (w ) = h=1 nh −1 α=1 whα − nh ( ) [∑ ∑ nh nh cov (y , w ) = H α=1 yhα whα − h=1 nh −1 where: H : number of estrata nh : number of individuals in stratum h

yh wh nh

]

y w,

(2)

Mexican survey: I

Income and expenditure survey (ENIGH)

I

Survey design: multi-stage stratified cluster survey: - Stratified: by marginalization level (CONAPO) and geographic area (urban/rural). I joined the two categories of strata to obtain a total of 16 joined-strata. - Primary sampling units: not explicitly defined but constructed using geographic information reported in the survey via the construction of SECU (sampling error computation units), a method widely used for variance estimation of survey data. - Sampling weights: reported in the survey. A new weight was constructed to adjust the survey population to actual population.

Private education consumption-CFE

Lifecycle deficit: Mexico 2000-2004 (Santiago-oct 99)

Education profile (CFE): methods of estimation

I

1. Direct method: in 2004, around 74% of the total education expenditure is reported at individual level (in 2005 is around 68%). Using only this information, the age profile results by tabulating (computing the mean) the education consumption by age. The remaining information is ignored.

I

2. Regression method: NTA methodology.

Education profile 2004 (CFE)- WITHOUT macro control 2500 direct method regression method

mexican pesos

2000

1500

1000

500

0 0

5

10

15

20

age

25

30

35

40

Education profile 2004 (CFE)- WITH macro control 20000 direct method regression method

mexican pesos

15000

10000

5000

0 0

5

10

15

20

age

25

30

35

40

CFE-2004: direct method (survey) confidence interval (95%): direct method 2500 CDS-l CFE CDS-u SRS-l SRS-u

mexican pesos

2000

1500

1000

500

0 0

5

10

15

20

age

25

30

35

40

CFE-2004: regression method confidence interval (95%): regression method 4000 CDS-l CFE CDS-u SRS-l SRS-u

mexican pesos

3000

2000

1000

0 0

5

10

15

20

age

25

30

35

40

CFE-2004: coefficient of variation se(¯ ya )/¯ ya

100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

9

10

8

7

6

5

4

3

2

1

0

0%

CFE-2005: direct method (srs standard error) confidence interval (95%): direct method 1600

1400

mexican pesos

1200

1000

800

600

400

200

0 0

5

10

15

20

age

25

30

35

40

CFE-2005: regression method Does the regression method reduces the standard errors?? confidence interval (95%): regression method 2500

mexican pesos

2000

1500

1000

500

0 0

5

10

15

20

age

25

30

35

40

CFE-2005: coefficient of variation se(¯ ya )/¯ ya 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

0%

age

Friedman’s Super Smoother: supsmu

Friedman’s Super Smoother: supsmu I

(x1 , y1 )...(xn , yn ): yi = s(xi ) + ri , i = 1...n

I

(3)

Smoothed value at point xi : i+J/2 1 ∑ s(xi ) = yi J i−J/2

I

Expected squared error at point xi , under E (ri ) = 0, Var (ri ) = σ 2 : 

2 i+J/2 ∑ 1 1 e 2 (xi kJ) = f (xi ) − f (xi ) + σ 2 J J i−J/2

(4)

supsmu: Variable span smoother I

I

Span selection, e.g. 0.1, 0.2, 0.9...: defines the size of the neighborhood: - Tradeoff: big span − > small variance, but big bias, and viceversa - J=span*n; e.g. J=0.2n Choice of span: - Optimal selection: cross-validation, Jcv , which minimizes e 2 (xi kJ) - Tone control (bass), Jm : people find smoother curves more visually pleasing (sacrificing accuracy for an estimate that is less rough). This method enhance the low frequency (bass) component of the smoother output. Then: J(xi ) = Jcv (xi ) + (Jw − Jcv (xi ))Ri10−α , 0 <= 0 <= 10, ] [ (ˆ e )(Jcv (xi )kxi ) Ri = (5) (ˆ e )(Jw kxi )

supsmu: R code

I

supsmu(x, y, wt, span = ”cv”, periodic = FALSE, bass = 0) -Arguments: x: x values for smoothing y: y values for smoothing wt: case weights, by default all equal span: the fraction of the observations in the span of the running lines smoother, or ”cv” to choose this by leave-one-out cross-validation. periodic: if TRUE, the x values are assumed to be in [0, 1] and of period 1. bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness.

supsmu: NTA framework I

(a, y¯a )...(a, y¯a ): y¯a = s(¯ ya ) + ra , a = 0...ω

I

(6)

Smoothed value at age a: i+J/2 1 ∑ s(¯ ya ) = y¯a J i−J/2

I

Expected squared error at age a, under E (ra ) = 0, Var (ra ) = σi2 = Varcds (¯ ya ): 

2 a+J/2 a+J/2 ∑ ∑ 1 1 2 e (akJ) = f (a) − Varcds (¯ ya ) (7) f (a) + 2 J J a−J/2

a−J/2

Example-supsmu: remittances (span=0.05) confidence interval (95%) 1300

1200

cds-l rem cds-u ci-l: span=0.05 ci-u: span=0.05

1100

1000

900

800

mexican pesos

700

600

500

400

300

200

100

0

-100

-200

-300 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

Example-supsmu: remittances (span=0.1) confidence interval (95%) 1300

1200

cds-l rem cds-u ci-l: span=0.1 ci-u: span=0.1

1100

1000

900

800

mexican pesos

700

600

500

400

300

200

100

0

-100

-200

-300 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

Example-supsmu: remittances (span=0.3) confidence interval (95%) 1300

1200

cds-l rem cds-u ci-l: span=0.3 ci-u: span=0.3

1100

1000

900

800

mexican pesos

700

600

500

400

300

200

100

0

-100

-200

-300 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

Private asset-based reallocations

ABR: per capital interest expense (pcie)

I

Information employed for the age allocation: - interests payments - credit card payments (include interests). I assume that 40% of the payment correspond to interest payment, based on the interest rates usually applied in Mexico. - morgage payments. I assume that 30% of the payment correspond to interest payment.

I

This information is reported at household level.

I

The total amount by household is assigned to the household head.

ABR: per capital interest expense (pcie) confidence interval (95%) 1200

CDS-l pcie CDS-u SRS-l SRS-u pcie-smooth

1000

mexican pesos

800

600

400

200

0

-200 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcie)

cv: pcie 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

ABR: per capital property income (pcpi) I

I I

Information employed for the age allocation: - lending: houses, land, buildings, etc. (within the country and abroad) - interest received: from saving accounts, borrowing to other persons, short term banking investments - yield: from dividends, shares - copyrights, patents - other property income - divestment: savings, ”tandas“ (informal-popular borrowing among households, neighbors, etc.) - selling of: gold, precious metals, jewelry, art, copyrights, bonds, stocks, houses, apartments, land, electronics. (I suspect that these items should n’t be included here, since they represent capital income???) This information is reported at individual level. The total amount by household is assigned to the household head.

ABR: per capital property income (pcpi) confidence interval (95%) 30000

CDS-l pcpi CDS-u SRS-l SRS-u pcie-smooth

mexican pesos

20000

10000

0

0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcpi)

cv: pcpi 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

ABR: per capita familial capital transfers inflows (pcfcti)

I

Information employed for the age allocation: - payments received from lendings to other households - borrowing from other households or institutions (excluding morgages) - bequest, legacy or dowry.

ABR: per capita familial capital transfers inflows (pcfcti) confidence interval (95%) 1500

CDS-l pcfcti CDS-u SRS-l SRS-u pcie-smooth

mexican pesos

1000

500

0

0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcfcti)

cv: pcfcti 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

ABR: per capita familial capital transfers inflows (pcfcti)

I

Information employed for the age allocation: - lending to other households - payments from borrowing received in the past from other households or institutions (excluding morgages) - bequest, legacy or dowry.

ABR: per capita familial capital transfers outflows (pcfcto) confidence interval (95%) 1500

CDS-l pcfcto CDS-u SRS-l SRS-u pcie-smooth

mexican pesos

1000

500

0

0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcfcto)

cv: pcfcto 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

ABR: housing (pcosh) confidence interval (95%) 8000

CDS-l pcosh CDS-u SRS-l SRS-u pcie-smooth

mexican pesos

6000

4000

2000

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcosh)

cv: pcosh 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0

ABR: per capita mixed income (pcmi) confidence interval (95%) 14000

CDS-l pcmi CDS-u SRS-l SRS-u pcmi-smooth

12000

mexican pesos

10000

8000

6000

4000

2000

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

ABR: coefficient of variation (pcmi)

cv: pcmi 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0

Labor income

YL: earnings (yle) confidence interval (95%) 20000

cds-l yle cds-u srs-l srs-u

17500

15000

mexican pesos

12500

10000

7500

5000

2500

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

YL: coefficient of variation (yle)

cv: yle 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

YL: entrepreneurial income (yls) confidence interval (95%) 4000

cds-l yls cds-u srs-l srs-u

3500

3000

mexican pesos

2500

2000

1500

1000

500

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

YL: imputed self-employed income (ylss) confidence interval (95%) 2000

cds-l ylss cds-u srs-l srs-u

mexican pesos

1500

1000

500

0

0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

YL: coefficient of variation (yls)

cv: yls 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

Remittances

ABR: per capital property income (pcpi)

I

Information employed for the age allocation: - income from the rest of the world is reported in the survey which is used for the allocation by age of remittances.

I

This information is reported at individual level.

I

The total amount by household is assigned to the household head.

Private transfers: remittances (rem) confidence interval (95%) 1300

cds-l rem cds-u srs-l srs-u

1200

1100

1000

900

mexican pesos

800

700

600

500

400

300

200

100

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

Private transfers: coefficient of variation (rem)

cv: remittances 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%

Private transfers: inter-household inflows (interhh-i) confidence interval (95%) 2000

cds-l interhh-i cds-u srs-l srs-u

mexican pesos

1500

1000

500

0 0

5

10

15

20

25

30

35

40

45

age

50

55

60

65

70

75

80

85

90

Private transfers: coefficient of variation (interhh-i)

cv: i nterhh-i 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%

age

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

0%