
The data in this microdata file come from the October 1996 Current
Population Survey (CPS). The Bureau of the Census conducts the
survey every month, although this file has only October data.
The October survey uses two sets of questions, the basic CPS and
the supplement.
Basic CPS. The basic CPS collects primarily labor
force data about the civilian noninstitutional population. Interviewers
ask questions concerning labor force participation about each
member 15 years old and over in every sample household.
Sample design. The present CPS sample was selected
from the 1990 Decennial Census files with coverage in all 50 states
and the District of Columbia. The sample is continually updated
to account for new residential construction. The United States
was divided into 2,007 geographic areas. In most states, a geographic
area consisted of a county or several contiguous counties. In
some areas of New England and Hawaii, minor civil divisions are
used instead of counties. A total of 754 geographic areas were
selected for sample. About 50,000 occupied households are eligible
for interview every month. Interviewers are unable to obtain interviews
at about 3,200 of these units. This occurs when the occupants
are not found at home after repeated calls or are unavailable
for some other reason.
Since the introduction of the CPS, the Bureau of the Census has
redesigned the CPS sample several times. These redesigns have
improved the quality and accuracy of the data and have satisfied
changing data needs. The most recent changes were completely implemented
in July 1995.
October Supplement. In addition to the basic CPS
questions, interviewers asked supplementary questions in October
about school enrollment for all household members 3 years old
and over.
Estimation procedue. This survey's estimation procedure
adjusts weighted sample results to agree with independent estimates
of the civilian noninstitutional population of the United States
by age, sex, race, Hispanic/non-Hispanic origin, and state of
residence. The adjusted estimate is called the post-stratification
ratio estimate. The independent estimates are calculated based
on information from four primary sources:
The independent population estimates include some, but not all,
undocumented immigrants.
Since the CPS estimates come from a sample, they may differ from
figures from a complete census using the same questionnaires,
instructions, and enumerators. A sample survey estimate has two
possible types of errors: sampling and nonsampling. The accuracy
of an estimate depends on both types of errors, but the full extent
of the nonsampling error is unknown. Consequently, one should
be particularly careful when interpreting results based on a relatively
small number of cases or on small differences between estimates.
The standard errors for CPS estimates primarily indicate the magnitude
of sampling error. They also partially measure the effect of some
nonsampling errors in responses and enumeration, but do not measure
systematic biases in the data. (Bias is the average over all possible
samples of the differences between the sample estimates and the
desired value.)
Nonsampling Variability. There are several sources
of nonsampling errors including the following:
CPS undercoverage results from missed housing units and missed
persons within sample households. Overall CPS undercoverage is
estimated to be about 8 percent. CPS undercoverage varies with
age, sex, and race. Generally, undercoverage is larger for males
than for females and larger for Blacks and other races combined
than for Whites. As described previously, ratio estimation to
independent age-sex-race-Hispanic population controls partially
corrects for the bias due to undercoverage. However, biases exist
in the estimates to the extent that missed persons in missed households
or missed persons in interviewed households have different characteristics
from those of interviewed persons in the same age-sex-race-origin-state
group.
A common measure of survey coverage is the coverage ratio, the
estimated population before post-stratification divided by the
independent population control. Table A shows CPS coverage ratios
for age-sex-race groups for a typical month. The CPS coverage
ratios can exhibit some variability from month to month. Other
Census Bureau household surveys experience similar coverage.
| Age | |||||||
| 014 | |||||||
| 15 | |||||||
| 16-19 | |||||||
| 2029 | |||||||
| 3039 | |||||||
| 4049 | |||||||
| 5059 | |||||||
| 6064 | |||||||
| 6569 | |||||||
| 70+ | |||||||
| 15+ | |||||||
| 0+ | |||||||
For additional information on nonsampling error including the
possible impact on CPS data when known, refer to Statistical Policy
Working Paper 3, An Error Profile: Employment as Measured
by the Current Population Survey, Office of Federal Statistical
Policy and Standards, U.S. Department of Commerce, 1978 and Technical
Paper 40, The Current Population Survey: Design and Methodology,
Bureau of the Census, U.S. Department of Commerce.
Comparability of data. Data obtained from the CPS
and other sources are not entirely comparable. This results from
differences in interviewer training and experience and in differing
survey processes. This is an example of nonsampling variability
not reflected in the standard errors. Use caution when comparing
results from different sources.
A number of changes were made in data collection and estimation
procedures beginning with the January 1994 CPS. The major change
was the use of a new questionnaire. The questionnaire was redesigned
to measure the official labor force concepts more precisely, to
expand the amount of data available, to implement several definitional
changes, and to adapt to a computer-assisted interviewing environment.
The March supplemental income questions were also modified for
adaptation to computer-assisted interviewing, although there were
no changes in definitions and concepts. Due to these and other
changes, one should use caution when comparing estimates from
data collected in 1994 and later years with estimates from earlier
years.
Caution should also be used when comparing data from this microdata
file, which reflects 1990 census-based population controls, with
microdata files from March 1993 and earlier years, which reflect
1980 census-based population controls. This change in population
controls had relatively little impact on summary measures such
as means, medians, and percentage distributions. It did have a
significant impact on levels. For example, use of 1990 based population
controls results in about a 1-percent increase in the civilian
noninstitutional population and in the number of families and
households. Thus, estimates of levels for data collected in 1994
and later years will differ from those for earlier years by more
than what could be attributed to actual changes in the population.
These differences could be disproportionately greater for certain
subpopulation groups than for the total population.
Since no independent population control totals for persons of
Hispanic origin were used before 1985, compare Hispanic estimates
over time cautiously.
Based on the results of each decennial census, the Bureau of the
Census gradually introduces a new sample design for the CPS. During
this phase-in period, CPS data are collected from sample designs
based on different censuses. While most CPS estimates have been
unaffected by this mixed sample, geographic estimates are subject
to greater error and variability. Users should exercise caution
when comparing estimates across years for metropolitan/ nonmetropolitan
categories.
Note When Using Small Estimates. Because of the
large standard errors involved, summary measures (such as medians
and percentage distributions) would probably not reveal useful
information when computed on a smaller base than 75,000.
Take care in the interpretation of small differences. For instance,
even a small amount of nonsampling error can cause a borderline
difference to appear significant or not, thus distorting a seemingly
valid hypothesis test.
Sampling Variability. Sampling variability is variation
that occurred by chance because a sample was surveyed rather than
the entire population. Standard errors, as calculated by methods
described later in "Standard Errors and Their Use,"
are primarily measures of sampling variability, although they
may include some nonsampling error.
Standard Errors and Their Use. A number of approximations
are required to derive, at a moderate cost, standard errors applicable
to all the estimates in this microdata file. Instead of providing
an individual standard error for each estimate, parameters are
provided to calculate standard errors for various types of characteristics.
These parameters are listed in Tables 2-4. Table 5 shows factors
to apply to prior year parameters.
The sample estimate and its standard error enable one to construct
a confidence interval, a range that would include the average
result of all possible samples with a known probability. For example,
if all possible samples were surveyed under essentially the same
general conditions and using the same sample design, and if an
estimate and its standard error were calculated from each sample,
then approximately 90 percent of the intervals from 1.645 standard
errors below the estimate to 1.645 standard errors above the estimate
would include the average result of all possible samples.
A particular confidence interval may or may not contain the average
estimate derived from all possible samples. However, one can say
with specified confidence that the interval includes the average
estimate calculated from all possible samples.
Standard errors may also be used to perform hypothesis testing,
a procedure for distinguishing between population parameters using
sample estimates. One common type of hypothesis is that the population
parameters are different. An example of this would be comparing
the percentage of employed males 20 to 24 years old working part
time to the percentage of employed females in the same age group
who were part-time workers. An illustration of this is included
in the following pages.
Tests may be performed at various levels of significance. A significance
level is the probability of concluding that the characteristics
are different when, in fact, they are the same. To conclude that
two parameters are different at the 0.10 level of significance
the absolute value of the estimated difference between characteristics
must be greater than or equal to 1.645 times the standard error
of the difference.
The Census Bureau uses 90-percent confidence intervals and 0.10
levels of significance to determine statistical validity. Consult
standard statistical textbooks for alternative criteria.
Standard errors of estimated numbers. The approximate standard error, sx, of an estimated number, with the exception of school enrollment estimates, from this microdata file can be obtained using this formula:

Here x is the size of the estimate and a and b are the parameters
in Table 2 associated with the particular type of characteristic.
When calculating standard errors from crosstabulations involving
different characteristics, use the set of parameters for the characteristic
which will give the largest standard error.
Illustration
Suppose there were 6,000,000 unemployed men in the civilian labor
force. Use the appropriate parameters from Table 2 and formula
(1) to get
Number, x 6,000,000 a parameter -0.000018 b parameter 2,957 Standard error 131,000 90% conf. int. 5,785,000 to 6,215,000
The standard error is calculated as

The 90-percent confidence interval is calculated as 6,000,000
± 1.645´131,000.
A conclusion that the average estimate derived from all possible
samples lies within a range computed in this way would be correct
for roughly 90 percent of all possible samples.
Standard errors of estimated school enrollment numbers. The approximate standard error, sx, of an estimated school enrollment number from this microdata file can be obtained using the formula

Here x is the size of the estimate, T is the total number of persons
in a specific age group and b is the parameter in Table 3 associated
with the particular type of characteristic. If T is not known,
for Total or White use 100,000,000; for Blacks and Hispanic use
10,000,000. When calculating standard errors for numbers from
cross-tabulations involving different characteristics, use the
set of parameters for the characteristic which will give the largest
standard error.
Illustration
Suppose there were 4,274,000 3 and 4 year olds enrolled in school
and 6,711,000 children in that age group in October 1996. Use the appropriate b parameter from Table 3 and formula (2) to get
| Number, x | 4,274,000 |
| Total, T | 6,711,000 |
| b parameter | 3,184 |
| Standard error | 70,000 |
| 90% conf. int. | 4,159,000 to 4,389,000 |
The standard error is calculated as

The 90-percent confidence interval for this estimate is from 4,159,000
to 4,389,000, i.e., 4,274,000 ± 1.645´70,000.
Therefore, a conclusion that the average estimate derived from
all possible samples lies within a range computed in this way
would be correct for roughly 90 percent of all possible samples.
Standard Errors of Estimated Percentages. The reliability
of an estimated percentage, computed using sample data for both
numerator and denominator, depends on the size of the percentage
and its base. Estimated percentages are relatively more reliable
than the corresponding estimates of the numerators of the percentages,
particularly if the percentages are 50 percent or more. When the
numerator and denominator of the percentage are in different categories,
use the parameter from Table 2 or 3 indicated by the numerator.
The approximate standard error, sx,p, of an estimated percentage can be obtained by use of the formula

Here x is the total number of persons, families, households, or
unrelated individuals in the base of the percentage, p is the
percentage (0 £ p £
100), and b is the parameter in Table 2 or 3 associated with the
characteristic in the numerator of the percentage.
Illustration
Suppose there were 15,016,000 persons aged 18 to 21, and that
44.9 percent were enrolled in college. Use the appropriate parameter
from Table 3 and formula (3) to get
| Percentage, p | 44.9 |
| Base, x | 15,016,000 |
| b parameter | 2,766 |
| Standard error | 0.7 |
| 90% conf. int. | 43.7 to 46.1 |
The standard error is calculated as

The 90-percent confidence interval for the estimated percentage
of persons aged 18 to 21 in 1996 enrolled in college is from 43.7
to 46.1 percent, i.e., 44.9 ± 1.645´0.7.
Standard Error of a Difference. The standard error of the difference between two sample estimates is approximately equal to

where sx and sy are the standard errors
of the estimates, x and y. The estimates can be numbers, percentages,
ratios, etc. This will result in accurate estimates of the standard
error of the same characteristic in two different areas, or for
the difference between separate and uncorrelated characteristics
in the same area. However, if there is a high positive (negative)
correlation between the two characteristics, the formula will
overestimate (underestimate) the true standard error.
Illustration
Suppose that of 6,285,000 employed men between 20-24 years of
age, 1,516,000 or 24.1 percent were part-time workers, and of
the 5,824,000 employed women between 20-24 years of age, 2,169,000
or 37.2 percent were part-time workers. Use the appropriate parameters
from Table 2 and formulas (3) and (4) to get
x y difference Percentage, p 24.1 37.2 13.1 Number, x 6,285,000 5,824,000 - b parameter 2,764 2,530 - Standard error 0.9 1.0 1.3 90% conf. int. 22.6 to 25.6 35.6 to 38.8 11.0 to 15.2
The standard error of the difference is calculated as

The 90-percent confidence interval around the difference is calculated as 13.1 ± 1.645´1.3. Since this interval does not include zero, we can conclude with 90 percent confidence that the percentage of part-time women workers between 20-24 years of age is greater than the percentage of part-time men workers between 20-24 years of age.
|
Characteristics - October 1996 | ||
| Characteristic | ||
| Labor Force and Not In Labor Force Data Other than Agricultural Employment and Unemployment | ||
| Total 1 | ||
| - Men 1 | ||
| - Women | ||
| - Both sexes, 16 to 19 years | ||
| White 1 | ||
| - Men | ||
| - Women | ||
| - Both sexes, 16 to 19 years | ||
| Black | ||
| - Men | ||
| - Women | ||
| - Both sexes, 16 to 19 years | ||
| Hispanic origin | ||
| Not In Labor Force (use only for Total, Total Men, and White) | ||
| Agricultural Employment | ||
| Total or White | ||
| - Men | ||
| - Women or
Both sexes, 16 to 19 years |
||
| Black | ||
| Hispanic origin | ||
| - Total or Women | ||
| - Men or
Both sexes, 16 to 19 years |
||
| Unemployment | ||
| Total or White | ||
| Black | ||
| Hispanic origin | ||
Note: These parameters are to be applied to basic CPS monthly labor force estimates.
1 For not in labor force characteristics, use the Not
In Labor Force parameters.
| Characteristics | |||
| Persons Enrolled in School: | |||
| - Total | |||
| - Children 13 and under | |||
| Marital Status | |||
| Household Characteristics: | |||
| - Head, Wife, or Primary Individual | |||
| - Child or Other Relative
in Primary Family, Secondary Family Member |
|||
| Income, Earnings | |||
Notes: The b parameters should be multiplied by 1.5 for nonmetropolitan residence categories.
The b parameters should be multiplied by the factors in Table 4 for regional data.
| Table 4. Regional Factors to Apply to 1996 b Parameters | |
| U. S. Totals: | |
| Regions: | |
| Northeast | |
| Midwest | |
| South | |
| West | |
School Enrollment Estimates Prior to 1996 | ||||
| 1994-1995 | ||||
| 1990-1993 | ||||
| 1988-1989 | ||||
| 1985-1987 | ||||
| 1982-1984 | ||||
| 1977-1981 | ||||
| 1967-1976 | ||||
| 1957-1966 | ||||
| Before 1956 | ||||
CPS School Enrollment Supplement - 1996 Methodology and Documentation Page
CPS Main Page