Review of selected NTA profiles for Mexico 2004 and variance estimation IMG
February 25, 2010
Outline
I
Variance estimation for age profiles
I
Friedman’s Super Smoother (supsmu)
I
Private education consumption
I
Private asset-based reallocations
I
Labor income
I
Private transfers: remittances and interhh-inflows
Variance estimation for age profiles
I
Age profile estimation in NTA: ∑na y a wia yia y¯a = = ∑ na w a wia
(1)
where y¯a is the mean value of variable y (e.g. education) for individual aged a, wia is the sampling weight for the individual i aged a, na is the sampling size of individuals in the age group a. I
Complex design survey (CDS): estratified multi-stage cluster * Survey variables in CDS: 1) strata, 2) primary sampling units (PSU), 3) weights
Variance estimation for age profiles I
I I
Variance estimation for Simple Random Samples (SRS): ( ) 2 Var wy = sn ( ) Var (y ) Variance estimation for CDS: Var wy 6= Var (w ) Taylor series linearization method (TSL): let’s define r = then: var (¯ ya ) =
1 [var (y ) + r 2 · var (w ) − 2 · r · cov (y , w )] w2
where: ∑ ( ) [∑ ] 2 nh nh 2 − yh var (y ) = H y α=1 hα h=1 nh −1 nh ] ∑H ( nh ) [∑nh wh2 2 var (w ) = h=1 nh −1 α=1 whα − nh ( ) [∑ ∑ nh nh cov (y , w ) = H α=1 yhα whα − h=1 nh −1 where: H : number of estrata nh : number of individuals in stratum h
yh wh nh
]
y w,
(2)
Mexican survey: I
Income and expenditure survey (ENIGH)
I
Survey design: multi-stage stratified cluster survey: - Stratified: by marginalization level (CONAPO) and geographic area (urban/rural). I joined the two categories of strata to obtain a total of 16 joined-strata. - Primary sampling units: not explicitly defined but constructed using geographic information reported in the survey via the construction of SECU (sampling error computation units), a method widely used for variance estimation of survey data. - Sampling weights: reported in the survey. A new weight was constructed to adjust the survey population to actual population.
Private education consumption-CFE
Lifecycle deficit: Mexico 2000-2004 (Santiago-oct 99)
Education profile (CFE): methods of estimation
I
1. Direct method: in 2004, around 74% of the total education expenditure is reported at individual level (in 2005 is around 68%). Using only this information, the age profile results by tabulating (computing the mean) the education consumption by age. The remaining information is ignored.
I
2. Regression method: NTA methodology.
Education profile 2004 (CFE)- WITHOUT macro control 2500 direct method regression method
mexican pesos
2000
1500
1000
500
0 0
5
10
15
20
age
25
30
35
40
Education profile 2004 (CFE)- WITH macro control 20000 direct method regression method
mexican pesos
15000
10000
5000
0 0
5
10
15
20
age
25
30
35
40
CFE-2004: direct method (survey) confidence interval (95%): direct method 2500 CDS-l CFE CDS-u SRS-l SRS-u
mexican pesos
2000
1500
1000
500
0 0
5
10
15
20
age
25
30
35
40
CFE-2004: regression method confidence interval (95%): regression method 4000 CDS-l CFE CDS-u SRS-l SRS-u
mexican pesos
3000
2000
1000
0 0
5
10
15
20
age
25
30
35
40
CFE-2004: coefficient of variation se(¯ ya )/¯ ya
100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
9
10
8
7
6
5
4
3
2
1
0
0%
CFE-2005: direct method (srs standard error) confidence interval (95%): direct method 1600
1400
mexican pesos
1200
1000
800
600
400
200
0 0
5
10
15
20
age
25
30
35
40
CFE-2005: regression method Does the regression method reduces the standard errors?? confidence interval (95%): regression method 2500
mexican pesos
2000
1500
1000
500
0 0
5
10
15
20
age
25
30
35
40
CFE-2005: coefficient of variation se(¯ ya )/¯ ya 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0%
age
Friedman’s Super Smoother: supsmu
Friedman’s Super Smoother: supsmu I
(x1 , y1 )...(xn , yn ): yi = s(xi ) + ri , i = 1...n
I
(3)
Smoothed value at point xi : i+J/2 1 ∑ s(xi ) = yi J i−J/2
I
Expected squared error at point xi , under E (ri ) = 0, Var (ri ) = σ 2 :
2 i+J/2 ∑ 1 1 e 2 (xi kJ) = f (xi ) − f (xi ) + σ 2 J J i−J/2
(4)
supsmu: Variable span smoother I
I
Span selection, e.g. 0.1, 0.2, 0.9...: defines the size of the neighborhood: - Tradeoff: big span − > small variance, but big bias, and viceversa - J=span*n; e.g. J=0.2n Choice of span: - Optimal selection: cross-validation, Jcv , which minimizes e 2 (xi kJ) - Tone control (bass), Jm : people find smoother curves more visually pleasing (sacrificing accuracy for an estimate that is less rough). This method enhance the low frequency (bass) component of the smoother output. Then: J(xi ) = Jcv (xi ) + (Jw − Jcv (xi ))Ri10−α , 0 <= 0 <= 10, ] [ (ˆ e )(Jcv (xi )kxi ) Ri = (5) (ˆ e )(Jw kxi )
supsmu: R code
I
supsmu(x, y, wt, span = ”cv”, periodic = FALSE, bass = 0) -Arguments: x: x values for smoothing y: y values for smoothing wt: case weights, by default all equal span: the fraction of the observations in the span of the running lines smoother, or ”cv” to choose this by leave-one-out cross-validation. periodic: if TRUE, the x values are assumed to be in [0, 1] and of period 1. bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness.
supsmu: NTA framework I
(a, y¯a )...(a, y¯a ): y¯a = s(¯ ya ) + ra , a = 0...ω
I
(6)
Smoothed value at age a: i+J/2 1 ∑ s(¯ ya ) = y¯a J i−J/2
I
Expected squared error at age a, under E (ra ) = 0, Var (ra ) = σi2 = Varcds (¯ ya ):
2 a+J/2 a+J/2 ∑ ∑ 1 1 2 e (akJ) = f (a) − Varcds (¯ ya ) (7) f (a) + 2 J J a−J/2
a−J/2
Example-supsmu: remittances (span=0.05) confidence interval (95%) 1300
1200
cds-l rem cds-u ci-l: span=0.05 ci-u: span=0.05
1100
1000
900
800
mexican pesos
700
600
500
400
300
200
100
0
-100
-200
-300 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
Example-supsmu: remittances (span=0.1) confidence interval (95%) 1300
1200
cds-l rem cds-u ci-l: span=0.1 ci-u: span=0.1
1100
1000
900
800
mexican pesos
700
600
500
400
300
200
100
0
-100
-200
-300 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
Example-supsmu: remittances (span=0.3) confidence interval (95%) 1300
1200
cds-l rem cds-u ci-l: span=0.3 ci-u: span=0.3
1100
1000
900
800
mexican pesos
700
600
500
400
300
200
100
0
-100
-200
-300 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
Private asset-based reallocations
ABR: per capital interest expense (pcie)
I
Information employed for the age allocation: - interests payments - credit card payments (include interests). I assume that 40% of the payment correspond to interest payment, based on the interest rates usually applied in Mexico. - morgage payments. I assume that 30% of the payment correspond to interest payment.
I
This information is reported at household level.
I
The total amount by household is assigned to the household head.
ABR: per capital interest expense (pcie) confidence interval (95%) 1200
CDS-l pcie CDS-u SRS-l SRS-u pcie-smooth
1000
mexican pesos
800
600
400
200
0
-200 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcie)
cv: pcie 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
ABR: per capital property income (pcpi) I
I I
Information employed for the age allocation: - lending: houses, land, buildings, etc. (within the country and abroad) - interest received: from saving accounts, borrowing to other persons, short term banking investments - yield: from dividends, shares - copyrights, patents - other property income - divestment: savings, ”tandas“ (informal-popular borrowing among households, neighbors, etc.) - selling of: gold, precious metals, jewelry, art, copyrights, bonds, stocks, houses, apartments, land, electronics. (I suspect that these items should n’t be included here, since they represent capital income???) This information is reported at individual level. The total amount by household is assigned to the household head.
ABR: per capital property income (pcpi) confidence interval (95%) 30000
CDS-l pcpi CDS-u SRS-l SRS-u pcie-smooth
mexican pesos
20000
10000
0
0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcpi)
cv: pcpi 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
ABR: per capita familial capital transfers inflows (pcfcti)
I
Information employed for the age allocation: - payments received from lendings to other households - borrowing from other households or institutions (excluding morgages) - bequest, legacy or dowry.
ABR: per capita familial capital transfers inflows (pcfcti) confidence interval (95%) 1500
CDS-l pcfcti CDS-u SRS-l SRS-u pcie-smooth
mexican pesos
1000
500
0
0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcfcti)
cv: pcfcti 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
ABR: per capita familial capital transfers inflows (pcfcti)
I
Information employed for the age allocation: - lending to other households - payments from borrowing received in the past from other households or institutions (excluding morgages) - bequest, legacy or dowry.
ABR: per capita familial capital transfers outflows (pcfcto) confidence interval (95%) 1500
CDS-l pcfcto CDS-u SRS-l SRS-u pcie-smooth
mexican pesos
1000
500
0
0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcfcto)
cv: pcfcto 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
ABR: housing (pcosh) confidence interval (95%) 8000
CDS-l pcosh CDS-u SRS-l SRS-u pcie-smooth
mexican pesos
6000
4000
2000
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcosh)
cv: pcosh 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
ABR: per capita mixed income (pcmi) confidence interval (95%) 14000
CDS-l pcmi CDS-u SRS-l SRS-u pcmi-smooth
12000
mexican pesos
10000
8000
6000
4000
2000
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
ABR: coefficient of variation (pcmi)
cv: pcmi 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
Labor income
YL: earnings (yle) confidence interval (95%) 20000
cds-l yle cds-u srs-l srs-u
17500
15000
mexican pesos
12500
10000
7500
5000
2500
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
YL: coefficient of variation (yle)
cv: yle 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
YL: entrepreneurial income (yls) confidence interval (95%) 4000
cds-l yls cds-u srs-l srs-u
3500
3000
mexican pesos
2500
2000
1500
1000
500
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
YL: imputed self-employed income (ylss) confidence interval (95%) 2000
cds-l ylss cds-u srs-l srs-u
mexican pesos
1500
1000
500
0
0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
YL: coefficient of variation (yls)
cv: yls 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
Remittances
ABR: per capital property income (pcpi)
I
Information employed for the age allocation: - income from the rest of the world is reported in the survey which is used for the allocation by age of remittances.
I
This information is reported at individual level.
I
The total amount by household is assigned to the household head.
Private transfers: remittances (rem) confidence interval (95%) 1300
cds-l rem cds-u srs-l srs-u
1200
1100
1000
900
mexican pesos
800
700
600
500
400
300
200
100
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
Private transfers: coefficient of variation (rem)
cv: remittances 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%
Private transfers: inter-household inflows (interhh-i) confidence interval (95%) 2000
cds-l interhh-i cds-u srs-l srs-u
mexican pesos
1500
1000
500
0 0
5
10
15
20
25
30
35
40
45
age
50
55
60
65
70
75
80
85
90
Private transfers: coefficient of variation (interhh-i)
cv: i nterhh-i 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5%
age
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0%