
Table of Contents
The data for this survey came from the March 1996 Current Population
Survey (CPS), conducted by the Bureau of the Census. The March
survey uses two sets of questions, the basic CPS and the supplements.
Basic CPS. The monthly CPS collects
primarily labor force data about the civilian noninstitutional
population. Interviewers ask questions concerning labor force
participation about each member 15 years old and over in every
sample household.
March supplement. In addition to the
basic CPS questions, interviewers asked supplementary questions
in March about money income received in the previous calendar
year, educational attainment, household and family characteristics,
marital status and geographical mobility.
To obtain more reliable data for the Hispanic population, the
March CPS sample was increased by about 2,500 eligible housing
units. These housing units were interviewed the previous November
and contained at least one sample person of Hispanic origin. In
addition, the sample included persons in the Armed Forces living
off post or with their families on post.
Sample design. The present CPS
sample was selected from the 1990 Decennial Census files with
coverage in all 50 states and the District of Columbia. The sample
is continually updated to account for new residential construction.
The United States was divided into 2,007 geographic areas. In
most states, a geographic area consisted of a county or several
contiguous counties. In some areas of New England and Hawaii,
minor civil divisions are used instead of counties. A total of
754 geographic areas were selected for sample. About 50,000 occupied
households are eligible for interview every month. Interviewers
are unable to obtain interviews at about 3,200 of these units.
This occurs when the occupants are not found at home after repeated
calls or are unavailable for some other reason.
Since the introduction of the CPS, the Bureau of the Census has
redesigned the CPS sample several times. These redesigns have
improved the quality and accuracy of the data and have satisfied
changing data needs. The most recent changes were completely implemented
in July 1995.
Estimation procedure. This survey's
estimation procedure adjusts weighted sample results to agree
with independent estimates of the civilian noninstitutional population
of the United States by age, sex, race, Hispanic/non-Hispanic
origin, and state of residence. The adjusted estimate is called
the post-stratification ratio estimate. The independent estimates
are calculated based on information from four primary sources:
· The 1990 Decennial Census of Population and Housing.
· An adjustment for undercoverage in the 1990 census.
· Statistics on births, deaths, immigration, and emigration.
· Statistics on the size of the
Armed Forces.
The estimation procedure for the March supplement included a further
adjustment so husband and wife of a household received the same
weight. The independent population estimates include some, but
not all, undocumented immigrants.
Since the CPS estimates come from a sample, they may differ from
figures from a complete census using the same questionnaires,
instructions, and enumerators. A sample survey estimate has two
possible types of error: sampling and nonsampling. The accuracy
of an estimate depends on both types of error, but the full extent
of the nonsampling error is unknown. Consequently, one should
be particularly careful when interpreting results based on a relatively
small number of cases or on small differences between estimates.
The standard errors for CPS estimates primarily indicate the magnitude
of sampling error. They also partially measure the effect of some
nonsampling errors in responses and enumeration, but do not measure
systematic biases in the data. (Bias is the average overall possible
samples of the differences between the sample estimates and the
desired value.)
Nonsampling variability. Several
sources of nonsampling error include the following:
o Inability to get information about all sample cases.
o Definitional difficulties.
o Differences in interpretation of questions.
o Respondents' inability or unwillingness to provide correct information.
o Respondents' inability to recall information.
o Errors made in data collection, such as recording and coding data.
o Errors made in processing the data.
o Errors made in estimating values for missing data.
o Failure to represent all units with the sample (undercoverage).
CPS undercoverage results from missed housing units and missed
persons within sample households. Overall CPS undercoverage is
estimated to be about 8 percent. CPS undercoverage varies with
age, sex, and race. Generally, undercoverage is larger for males
than for females and larger for Blacks and other races combined
than for Whites. As described previously, ratio estimation to
independent age-sex-race-Hispanic population controls partially
corrects for the bias due to undercoverage. However, biases exist
in the estimates to the extent that missed persons in missed households
or missed persons in interviewed households have different characteristics
from those of interviewed persons in the same age-sex-race-origin-state
group.
A common measure of survey coverage is the coverage ratio, the
estimated population before post-stratification divided by the
independent population control. Table A shows CPS coverage ratios
for age-sex-race groups for a typical month. The CPS coverage
ratios can exhibit some variability from month to month. Other
Census Bureau household surveys experience similar coverage.
|
| |||||||
For additional information on nonsampling error including the
possible impact on CPS data when known, refer to Statistical Policy
Working Paper 3, An Error Profile: Employment as Measured
by the Current Population Survey, Office of Federal Statistical
Policy and Standards, U.S. Department of Commerce, 1978 and Technical
Paper 40, The Current Population Survey: Design and Methodology,
Bureau of the Census, U.S. Department of Commerce.
Comparability of data. Data
obtained from the CPS and other sources are not entirely comparable.
This results from differences in interviewer training and experience
and in differing survey processes. This is an example of nonsampling
variability not reflected in the standard errors. Use caution
when comparing results from different sources.
A number of changes were made in data collection and estimation
procedures beginning with the January 1994 CPS. The major change
was the use of a new questionnaire. The questionnaire was redesigned
to measure the official labor force concepts more precisely, to
expand the amount of data available, to implement several definitional
changes, and to adapt to a computer-assisted interviewing environment.
The March supplemental income questions were also modified for
adaptation to computer-assisted interviewing, although there were
no changes in definitions and concepts. Due to these and other
changes, one should use caution when comparing estimates from
data collected in 1994 and later years with estimates from earlier
years.
Caution should also be used when comparing data from this microdata
file, which reflects 1990 census-based population controls, with
microdata files from March 1993 and earlier years, which reflect
1980 census-based population controls. This change in population
controls had relatively little impact on summary measures such
as means, medians, and percentage distributions. It did have a
significant impact on levels. For example, use of 1990 based population
controls results in about a 1-percent increase in the civilian
noninstitutional population and in the number of families and
households. Thus, estimates of levels for data collected in 1994
and later years will differ from those for earlier years by more
than what could be attributed to actual changes in the population.
These differences could be disproportionately greater for certain
subpopulation groups than for the total population.
Since no independent population control totals for persons of
Hispanic origin were used before 1985, compare Hispanic estimates
over time cautiously.
Based on the results of each decennial census, the Bureau of the
Census gradually introduces a new sample design for the CPS. During
this phase-in period, CPS data are collected from sample designs
based on different censuses. While most CPS estimates have been
unaffected by this mixed sample, geographic estimates are subject
to greater error and variability. Users should exercise caution
when comparing estimates across years for metropolitan/nonmetropolitan
categories.
Note when using small estimates.
Because of the large standard errors involved, summary measures
probably do not reveal useful information when computed on a base
smaller than 75,000.
Take care in the interpretation of small differences. Even a small
amount of nonsampling error can cause a borderline difference
to appear significant or not, thus distorting a seemingly valid
hypothesis test.
Sampling variability. Sampling variability
is variation that occurred by chance because a sample was surveyed
rather than the entire population. Standard errors as calculated
below are primarily measures of sampling variability, but they
may include some nonsampling error.
Standard errors and their use. A
number of approximations are required to derive, at a moderate
cost, standard errors applicable to estimates from this data.
Instead of providing an individual standard error for each estimate,
generalized sets of standard errors are provided for various types
of characteristics. Thus, the tables show levels of magnitude
of standard errors rather than the precise standard errors.
Table B shows parameters to use for basic CPS monthly labor force
estimates. Table C shows parameters for March supplement data
including the Hispanic supplement.
The sample estimate and its standard error enable one to construct
a confidence interval. A confidence interval is a range that would
include the average result of all possible samples with a known
probability. For example, if all possible samples were surveyed
under essentially the same general conditions and the same sample
design, and if an estimate and its standard error were calculated
from each sample, then approximately 90-percent of the intervals
from 1.645 standard errors below the estimate to 1.645 standard
errors above the estimate would include the average result of
all possible samples.
A particular confidence interval may or may not contain the average
estimate derived from all possible samples. However, one can say
with specified confidence that the interval includes the average
estimate calculated from all possible samples.
Standard errors may be used to perform hypothesis testing. This
is a procedure for distinguishing between population parameters
using sample estimates. The most common type of hypothesis is
that the population parameters are different. An example of this
would be comparing the percentage of Whites with a college education
to the percentage of Blacks with a college education.
Tests may be performed at various levels of significance. A significance
level is the probability of concluding that the characteristics
are different when, in fact, they are the same. For example, to
conclude that two parameters are different at the 0.10 level of
significance, the absolute value of the estimated difference between
characteristics must be greater than or equal to 1.645 times the
standard error of the difference.
The Census Bureau uses 90-percent confidence intervals and 0.10
levels of significance to determine statistical validity. Consult
standard statistical texts for alternative criteria.
For information on calculating standard errors for labor force
data from the CPS which involve quarterly or yearly averages,
changes in consecutive quarterly or yearly averages, consecutive
month-to-month changes in estimates, and consecutive year-to-year
changes in monthly estimates see "Explanatory Notes and Estimates
of Error: Household Data" in the corresponding Employment
and Earnings published by the Bureau of Labor Statistics.
Standard errors of estimated numbers. The approximate standard error, sx, of an estimated number from this microdata file can be obtained using this formula:
(1)

Here x is the size of the estimate and a and b are the parameters
in Table B or C associated with the particular type of characteristic.
When calculating standard errors for numbers from cross-tabulations
involving different characteristics, use the factor or set of
parameters for the characteristic which will give the largest
standard error.
Suppose there were 5,360,000 unemployed females in the civilian
labor force. Use the appropriate parameters from Table B and formula
(1) to get
| Number, x | 5,360,000 |
| a parameter | -0.000018 |
| b parameter | 2,957 |
| standard error | 124,000 |
| 90% conf. int. | 5,156,000 to 5,564,000 |
The standard error is calculated as

the 90-percent confidence interval is calculated as 5,360,000
± 1.645 x 124,000.
A conclusion that the average estimate derived from all possible
samples lies within a range computed in this way would be correct
for roughly 90-percent of all possible samples.
Suppose there are 8,419,000 high school graduates aged 20 to 24
years old. Use the appropriate parameters from Table C and formula
(1) to get
| Number, x | 8,419,000 |
| a parameter | -0.000017 |
| b parameter | 2,757 |
| Standard error | 148,000 |
| 90% conf. int. | 8,176,000 to 8,662,000 |
The standard error is calculated as

The 90-percent confidence interval is calculated as 8,419,000
± 1.645´148,000.
A conclusion that the average estimate derived from all possible
samples lies within a range computed in this way would be correct
for roughly 90-percent of all possible samples.
Standard errors of estimated percentages.
The reliability of an estimated percentage, computed using sample
data for both numerator and denominator, depends on the size of
the percentage and its base. Estimated percentages are relatively
more reliable than the corresponding estimates of the numerators
of the percentages, particularly if the percentages are 50 percent
or more. When the numerator and denominator of the percentage
are in different categories, use the factor or parameter from
Table B or C indicated by the numerator.
Alternatively, formula (2) will provide more accurate results:
(2)

Here x is the total number of persons, families, households, or
unrelated individuals in the base of the percentage, p is the
percentage (0 £ p £ 100) and b is the parameter in Table B or C associated
with the characteristic in the numerator of the percentage.
Suppose that of the 8,419,000 high school graduates aged 20 to
24, 12 percent were Black. Use the appropriate parameter from
Table C and formula (2) to get
| Percentage, p | 12.0 |
| Base, x | 8,419,000 |
| b parameter | 3,736 |
| Standard error | 0.7 |
| 90% conf. int. | 11.8 to 13.2 |
|
| ||
| Characteristic | ||
Labor Force and Not In Labor Force Data Other than Agricultural Employment and Unemployment | ||
| Total 1
Men 1 Women Both sexes, 16 to 19 years |
|
|
| White 1
Men Women Both sexes, 16 to 19 years |
|
|
| Black
Men Women Both sexes, 16 to 19 years |
|
|
| Hispanic origin | ||
Not In Labor Force (use only for Total, Total Men, and White) | ||
| Agricultural Employment | ||
| Total or White
Men
Women or
|
|
|
| Black | ||
| Hispanic origin
Total or Women
Men or
|
|
|
Unemployment Total or White Black Hispanic origin |
|
|
Note: These parameters are to be applied to basic CPS monthly
labor force estimates.
1 For not in labor force characteristics, use the Not
In Labor Force parameters.
The standard error is calculated as

The 90-percent confidence interval for the percentage of high
school graduates aged 20 to 24 who were Black is calculated as
12.0 ± 1.645´0.7.
Standard error of a difference. The standard error of the difference between two sample estimates is approximately equal to
(3)

where sx and sy are the standard errors
of the estimates, x and y. The estimates can be numbers, percentages,
ratios, etc. This will represent the actual standard error quite
accurately for the difference between estimates of the same characteristic
in two different areas, or for the difference between separate
and uncorrelated characteristics in the same area. However, if
there is a high positive (negative) correlation between the two
characteristics, the formula will overestimate (underestimate)
the true standard error.
Suppose 8,419,000 persons 20 to 24 years old and 8,228,000 persons
25 to 29 years old had completed four years of high school and
no more. Use the appropriate parameters from Table C and formulas
(2) and (3) to get
| Estimate | |||
| a parameter | |||
| b parameter | |||
| Standard error | |||
| 90% conf. int. |
The standard error of the difference is calculated as

The 90-percent confidence interval around the difference is calculated
as 191,000 + 1.645 ´ 209,000.
Since this interval contains zero, we cannot conclude, at the
10-percent significance level, that the number of persons who
completed four years of high school and no more is different for
20 to 24 year olds and 25 to 29 year olds.
Suppose that of 6,285,000 employed males between 20-24 years of
age, 1,516,000 or 24.1 percent were part-time workers, and of
the 5,824,000 employed females between 20-24 years of age, 2,169,000
or 37.2 percent were part-time workers. Use the appropriate parameters
from Table B and formulas (2) and (3) to get
| Percentage | |||
| Number, x | |||
| b parameter | |||
| Standard error | |||
| 90% conf. int. |
The standard error of the difference is calculated as

The 90-percent confidence interval around the difference is calculated
as 13.1 ± 1.645´1.3. Since
this interval does not include zero, we can conclude with 90-percent
confidence that the percentage of part-time female workers between
20-24 years of age is greater than the percentage of part-time
male workers between 20-24 years of age.
Standard error of a mean for grouped data. The formula used to estimate the standard error of a mean for grouped data is
(4)

In this formula, y is the size of the base of the distribution and b is a parameter from Table B or C. The variance, S², is given by the following formula:
(5)

where
, the mean of the distribution,
is estimated by
(6)

c is the number of groups; I indicates a specific group, thus
taking on values 1 through c.
pi is the estimated proportion of households, families
or persons whose values, for the characteristic (x-values) being
considered, fall in group I.
is (Z I-1 + Z I)/2
where Z I-1 and Z I are the lower and upper
interval boundaries, respectively, for group I.
is assumed to be the most representative value for the characteristic
for households, families, and unrelated individuals or persons
in group I. Group c is open-ended, i.e., no upper interval boundary
exists. For this group the approximate average value is
(7)

Standard error of a ratio. Certain estimates may be calculated as the ratio of two numbers. The standard error of a ratio, x/y, may be computed using
(8)

The standard error of the numerator, sx , and that
of the denominator, s y , may be calculated using formulas
described earlier. In formula (8), r represents the correlation
between the numerator and the denominator of the estimate.
For one type of ratio, the denominator is a count of families
or households and the numerator is a count of persons in those
families or households with a certain characteristic. If there
is at least one person with the characteristic in every family
or household, use 0.7 as an estimate of r. An example of this
type is the mean number of children per family with children.
For all other types of ratios, r is assumed to be zero. If r is
actually positive (negative), then this procedure will provide
an overestimate (underestimate) of the standard error of the ratio.
Examples of this type are the mean number of children per family
and the poverty rate.
NOTE: For estimates expressed as the ratio of x per 100 y or x
per 1,000 y, multiply formula (10) by 100 or 1,000, respectively,
to obtain the standard error.
Suppose there are 641,000 male movers from abroad and 501,000 female movers from abroad. The ratio of male movers, x, to female movers, y, is 1.28. The standard error of this ratio is calculated as follows:
| ratio | |||
| Estimate | 1.28 | ||
| a parameter | |||
| b parameter | |||
| Standard error | |||
| 90% conf. int. |
Using formula (8) with r = 0, the estimate of the standard error is

Standard error of a median.
The sampling variability of an estimated median depends on the
form of the distribution and the size of the base. One can approximate
the reliability of an estimated median by determining a confidence
interval about it. (See the section on sampling variability for
a general discussion of confidence intervals.)
Estimate the 68-percent confidence limits of a median based on
sample data using the following procedure.
1. Determine, using formula (2), the standard error of the estimate
of 50 percent from the distribution.
2. Add to and subtract from 50 percent the standard error determined
in step 1.
3. Using the distribution of the characteristic, determine upper
and lower limits of the 68-percent confidence interval by calculating
values corresponding to the two points established in step 2.
Use the following formula to calculate the upper and lower limits.
(9)

where
XpN = estimated upper and lower bounds for the confidence
interval (0 £ p £
1). For purposes of calculating the confidence interval, p takes
on the values determined in step 2. Note that XpN estimates
the median when p = 0.50.
N = for distribution of numbers: the total number of units
(persons, households, etc.) for the characteristic in the distribution.
N = for distribution of percentages: the value 1.0.
p = the values obtained in step 2.
A1, A2 = the lower and upper bounds, respectively,
of the interval
containing XpN .
N1, N2 = for distribution of numbers:
the estimated number of units (persons, households, etc.) with
values of the characteristic greater than or equal to A1
and A2, respectively.
N1, N2= for distribution of percentages: the estimated percentage
of units (persons, households, etc.) having values of the characteristic
greater than or equal to A1 and A2, respectively.
4. Divide the difference between the two points determined in
step 3 by two to obtain the standard error of the median.
Suppose median income for families has the following distribution.
Total families ........................................ 66,090Under $5,000 .................................................... 2,398
Median income..................................................
$34,213
1. Using formula (2) with b = 2,241, the standard error of 50
percent on a base of 66,090,000 is about 0.3 percent.
2. To obtain a 68-percent confidence interval on an estimated
median, add to and subtract from 50 percent the standard error
found in step 1. This yields percent limits of 49.7 and 50.3.
3. The lower and upper limits for the interval in which the median
falls are $30,000 and $35,000, respectively.
Then, by addition, the estimated numbers of families with an income
greater than or equal to $30,000 and $35,000 are 37,597,000 and
32,303,000, respectively.
Using formula (9), the upper limit for the confidence interval of the median is found to be about

Similarly, the lower limit is found to be about

Thus, a 68-percent confidence interval for the median income for
families is from $34,100 to $34,500.
4. The standard error of the median is, therefore,

Accuracy of state estimates.
The redesign of the CPS following the 1980 census provided an
opportunity to increase efficiency and accuracy of state data.
All strata are now defined within state boundaries. The sample
is allocated among the states to produce state and national estimates
with the required accuracy while keeping total sample size to
a minimum. Improved accuracy of state data has been achieved with
about the same sample size as in the 1970 design.
Since the CPS is designed to produce both state and national estimates,
the proportion of the total population sampled and the sampling
rates differ among the states. In general, the smaller the population
of the state the larger the sampling proportion. For example,
in Vermont approximately 1 in every 300 households was sampled
each month. In New York the sample was about 1 in every 1,600
households. Nevertheless, the size of the sample in New York is
four times larger than in Vermont because New York has a larger
population.
Computation of standard errors
for state estimates. Standard errors for a state may be
obtained by adjusting generalized standard errors given in the
tables or by adjusting the a and b parameters and using the standard
error equations described earlier.
Multiply the a and b parameters in Table B or C by f² from
Table D to obtain state parameters.
Suppose there were 11,200,000 persons 18 years old and over living
in New York, 2,542,000 (22.7 percent) of whom had completed college.
Use the appropriate parameter from Table C and formula (2) to
get
| Percentage, p | 22.7 |
| Base, x | 2,542,000 |
| b parameter | 2,757 |
| Standard error | 1.4 |
Table D shows the f factor for New York to be 0.94. Thus, the
standard error on the estimate of the percentage of persons 18
and older in New York state who had completed college is approximately
1.32 = 0.94´1.4.
To obtain state parameters for educational attainment in New York,
multiply the parameters in Table C by f² in Table D for the
state of interest. For educational attainment for total or white
in New York this gives a = -.000017´0.89
= -0.000015 and b = 2,757´0.89
= 2,453.
Computation of a factor for groups of states.
The factor adjusting standard errors for a group of states may
be obtained by computing a weighted sum of the squared factors
for the individual states in the group and taking the square root
of the result. Depending on the combination of states, the resulting
figure can be an overestimate.
The squared factor for a group of n states is given by

where POPi in the state population and f I²
is obtained from Table D. The 1996 civilian noninstitutionalized
population from the CPS for each state is also given in Table
D.
Suppose a factor for the state group Illinois-Wisconsin-Michigan was required. The appropriate squared factor would be

Multiply the a and b parameters by f², 1.02, to obtain parameters
for the state group; multiply standard errors by f, 1.01, for
standard errors for this state group.
Computation of standard errors for data
for combined years. Sometimes estimates for multiple years
are combined to improve precision. For example, suppose
is a mean derived from n consecutive years' data, i.e.,
where
the xi are the estimates for the individual years.
Use the formulas described previously to estimate the standard
error,
,of each year's estimate. Then the
standard error of
is

where

The correlation between consecutive years, r, is 0.35 for non-Hispanic
households and 0.55 for Hispanic households. Correlation between
nonconsecutive years is zero. The correlations were derived for
income estimates but they can be used for other types of estimates
where the year-to-year correlation between identical households
is high.
Suppose a mean for three consecutive years for some characteristic
is 1,000,000 and the standard errors for the individual years
are 67,000, 73,000, and 65,000.
Using formula (12), the standard error for the three years combined data is

Therefore, the standard error of the mean, using formula (11), is

| Characteristics | ||||||
| PERSONS | ||||||
| Educational Attainment | -0.000017 | 2,757 | -0.000200 | 3,736 | -0.000196 | 3,736 |
| Employment Characteristics | -0.000018 | 2,985 | -0.000125 | 3,139 | -0.000206 | 3,139 |
| Persons by Family Income | -0.000026 | 4,901 | -0.000260 | 5,611 | -0.000330 | 5,611 |
| Income | -0.000013 | 2,241 | -0.000119 | 2,447 | -0.000210 | 2,447 |
| Marital Status, Household & Family Characteristics, Health Insurance | ||||||
| -0.000019 | 5,211 | -0.000221 | 7,486 | -0.000263 | 7,486 |
| -0.000023 | 6,332 | -0.000326 | 11,039 | -0.000388 | 11,039 |
| Mobility Characteristics (Movers) Educational Attainment, Labor Force, Marital Status, Household, Family, and Income | -0.000011 | 2,869 | -0.000085 | 2,869 | -0.000101 | 2,869 |
| US, County, State, Region or MSA | -0.000030 | 7,791 | -0.000231 | 7,791 | -0.000282 | 7,791 |
| Poverty | -0.000039 | 10,380 | -0.000307 | 10,380 | -0.000366 | 10,380 |
| Unemployment | -0.000018 | 2,957 | -0.000212 | 3,150 | -0.000102 | 3,150 |
| FAMILIES, HOUSEHOLDS, OR UNRELATED INDIVIDUALS | ||||||
| Income | -0.000013 | 2,241 | -0.000119 | 2,447 | -0.000210 | 2,447 |
| Marital Status, Household and Family Characteristics, Educational Attainment, Population by Age and/or Sex | -0.000012 | 2,068 | -0.000077 | 1,871 | -0.000155 | 1,871 |
| Poverty | 0.000102 | 2,442 | 0.000102 | 2,442 | 0.000102 | 2,442 |
NOTES: These parameters are to be applied to March
supplemental data including the Hispanic supplement.
Multiply a and b parameters by 1.5 when tabulating nonmetropolitan. If the characteristic of interest is total state population, not subtotaled by race or ethnic origin, the a and b parameters are zero.
For foreign-born characteristics for Total and White, the a and b parameters should be multipliedby 1.3. No adjustment is necessary for foreign-born characteristics for Blacks and Hispanics.
|
|
|
|
Annual Demographic Survey (March 1996 CPS) Data Quality Page
CPS Main Page