Methodology and Documentation

Source and Accuracy Statement for the May 1995 CPS Microdata File for Race and Ethnicity


SOURCE OF DATA

The data for this microdata file come from the May 1995 Current Population Survey (CPS). This month's survey uses two sets of questions, the basic CPS and the supplement. The Bureau of the Census conducts the basic CPS every month and asks supplementary questions during certain months.

Basic CPS. The basic CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member 15 years old and over in every sample household.

The present CPS sample was selected from the 1990 Decennial Census files with coverage in all 50 states and the District of Columbia. The sample is continually updated to account for new residential construction. The United States was divided into 2,007 geographic areas. In most states, a geographic area consisted of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. A total of 792 geographic areas was selected for sample. About 58,000 occupied households are eligible for interview every month. Interviewers are unable to obtain interviews at about 3,500 of these units. This occurs when the occupants refuse to participate, are not found at home after repeated calls, or are unavailable for some other reason.

Since the introduction of the CPS, the Bureau of the Census has redesigned the CPS sample after the Decennial Censuses. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. By May 1995, the CPS sample based on the 1990 census was almost entirely phased-in. The phase-in procedure started in April 1994 and was completed in July 1995.

May 1995 supplement. In addition to the basic CPS questions, interviewers asked supplementary questions on race and ethnicity. This is the first and only time that this particular supplement was performed. The purpose of the May 1995 supplement was to test the effect of different sets of questions on the collection of racial and ethnic information. Data were collected on all members in every sample household. Household members 15 years old and older were asked to respond for themselves and parents answered for children too young to answer for themselves. If a household member was not available, a proxy could respond for that household member except for questions which were self-response only.

The supplement sample organized the basic CPS into 4 equal panels, each containing 25 percent of the sample or approximately 15,000 households. The questions in each panel differed; all respondents within a household were asked the same questions. The four panels represent a two-by-two experimental design focusing on separate race and Hispanic origin questions versus a combined question for race and Hispanic origin and a multiracial category versus no multiracial category on the race question. The panels were as follows:

Panel 1: Separate race and Hispanic-origin questions; no multiracial category

Panel 2: Separate race and Hispanic-origin questions with a multiracial category

Panel 3: A combined race and Hispanic-origin question; no multiracial category

Panel 4: A combined race and Hispanic-origin question with a multiracial category

Estimation procedure for supplement. This supplement's estimation procedure uses the CPS base weight and two noninterview adjustments - one for the CPS and one for the supplement. The CPS base weight, the inverse of the probability of selecting a housing unit for sample, includes an adjustment for areas where an interviewer finds more housing units than expected and selects a subsample. The CPS noninterview adjustment is applied after the CPS base weight to adjust the weights of interviewed households to account for households from which labor force information was not obtained. The CPS nonresponse rate was 6.5 percent for May 1995. The supplement noninterview adjustment is applied after the CPS noninterview adjustment to adjust the weights of interviewed persons to account for persons from which supplement information was not obtained. The May supplement nonresponse rate was 10.6 percent. Normally, other adjustments are made using data collected by the basic CPS (i.e., age, sex, race, Hispanic/Non-Hispanic, and state of residence) to inflate weighted sample results to independent estimates of the civilian noninstitutional population of the United States. However, the May supplement did not use these adjustments because it would distort the effects of the supplement's experimental design.

Racial and ethnic proportions from the supplement must be interpreted within the context of the experimental design, where only comparisons among the four panels are intended. Since each panel represents 25 percent of the basic CPS sample, the supplement's estimation procedure does not inflate each panel to represent the entire CPS sample. Therefore, data analysis should always be done focusing on percentages and not estimated levels.

ACCURACY OF THE ESTIMATES

Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. A sample survey estimate has two possible types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error, but the full extent of the nonsampling error is unknown. Consequently, one should be particularly careful when interpreting results based on a relatively small number of cases or on small differences between estimates. The standard errors for CPS estimates primarily indicate the magnitude of sampling error. They also may partially measure the effect of some nonsampling errors in responses and enumeration, but do not measure systematic biases in the data. (Bias is the average over all possible samples of the differences between the sample estimates and the desired value.)

Nonsampling variability. There are several sources of nonsampling errors which include:

· Inability to get information about all sample cases.

· Definitional difficulties.

· Differences in the interpretation of questions.

· Respondents' inability or unwillingness to provide correct information.

· Respondents' inability to recall information.

· Errors made in data collection such as recording and coding data.

· Errors made in processing the data.

· Errors made in estimating values for missing data.

· Failure to represent all units with the sample (undercoverage).

CPS undercoverage results from missed housing units and missed persons within sample households. Compared to the level of the 1990 Decennial Census, overall CPS undercoverage is about 8 percent. CPS undercoverage varies with age, sex, and race. Generally, undercoverage is larger for males than for females and larger for Blacks and other races combined than for Whites. When the second-stage ratio estimate is used in the estimation procedure, it partially corrects for bias due to undercoverage. However, biases exist in the estimates to the extent that missed persons in missed households or missed persons in interviewed households have different characteristics from those of interviewed persons in the same age­sex­race­origin-state group.

A common measure of survey coverage is the coverage ratio, the estimated population before the survey estimate divided by independent estimates of the population. Table A shows CPS coverage ratios for age-sex-race groups for a typical month. The CPS coverage ratios can exhibit some variability from month to month.


Table A. CPS Coverage Ratios


Non­Black

Black

All Persons

Age

M

F

M

F

M

F

Total

0­14

0.929

0.964

0.850

0.838

0.916

0.943

0.929

15

0.933

0.895

0.763

0.824

0.905

0.883

0.895

16-19

0.881

0.891

0.711

0.802

0.855

0.877

0.866

20­29

0.847

0.897

0.660

0.811

0.823

0.884

0.854

30­39

0.904

0.931

0.680

0.845

0.877

0.920

0.899

40­49

0.928

0.966

0.816

0.911

0.917

0.959

0.938

50­59

0.953

0.974

0.896

0.927

0.948

0.969

0.959

60­64

0.961

0.941

0.954

0.953

0.960

0.942

0.950

65­69

0.919

0.972

0.982

0.984

0.924

0.973

0.951

70+

0.993

1.004

0.996

0.979

0.993

1.002

0.998

15+

0.914

0.945

0.767

0.874

0.898

0.927

0.918

0+

0.918

0.949

0.793

0.864

0.902

0.931

0.921

For additional information on nonsampling error including the possible impact on CPS data when known, refer to Statistical Policy Working Paper 3, An Error Profile: Employment as Measured by the Current Population Survey, Office of Federal Statistical Policy and Standards, U.S. Department of Commerce, 1978 and Technical Paper 40, The Current Population Survey: Design and Methodology, Bureau of the Census, U.S. Department of Commerce.

Comparability of data. Data obtained from the CPS and other sources are not entirely comparable. This results from differences in interviewer training and experience and in differing survey processes. This is an example of nonsampling variability not reflected in the standard errors. Use caution when comparing results from different sources.

A number of changes were made in data collection and estimation procedures beginning with the January 1994 CPS. The major change was the use of a new questionnaire. The questionnaire was redesigned to measure the official labor force concepts more precisely, to expand the amount of data available, to implement several definitional changes, and to adapt to a computer-assisted interviewing environment. The supplemental questions are also computerized. Due to these and other changes, one should use caution when comparing estimates from data collected in 1994 and later years with estimates from earlier years.

For more information on the introduction of the new questionnaire and the modernized data collection methods, see "Revisions in the Current Population Survey Effective January 1994" in the February 1994 issue of Employment and Earnings published by the Bureau of Labor Statistics.

Data users should be aware of the effect of the redesigned CPS sample phase-in period from April 1994 through June 1995 on the metropolitan/nonmetropolitan estimates. During this phase-in period, CPS data were collected from sample designs based on both the 1980 and 1990 censuses. While most CPS estimates have been unaffected by this mixed sample, metropolitan and nonmetropolitan estimates have been affected. The 1990 sample cases were recoded to reflect the 1980 metropolitan/nonmetropolitan definitions to allow the estimates to be comparable with earlier data. The gross error rate for the conversions of central cities/suburbs is not expected to exceed 5%.

Note when using small estimates. Because of the large standard errors involved, the percent distributions probably do not reveal useful information when computed on an estimated base smaller than 75,000. Take care in the interpretation of small differences. For instance, even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.

Sampling variability. Sampling variability is variation that occurred by chance because a sample was surveyed rather than the entire population. Standard errors, as calculated below, are primarily measures of sampling variability, but they may include some nonsampling error.

Standard errors and their use. A number of approximations are required to derive, at a moderate cost, standard errors applicable to estimates from this microdata file. Instead of providing an individual standard error for each estimate, b parameters are provided to calculate standard errors for each type of characteristic. These parameters are in Table B.

The sample estimate and its standard error enable one to construct a confidence interval. A confidence interval is a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Standard errors may also be used to perform hypothesis testing. This is a procedure for distinguishing between population parameters using sample estimates. One common type of hypothesis is that two population parameters are different. An example of this would be comparing the percentage of persons in panel 1 who were Hispanic with the percentage of persons in panel 2 who were Hispanic.

Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. To conclude that two parameters are different at the 0.10 level of significance, for example, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.645 times the standard error of the difference.

The Census Bureau uses 90­percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical textbooks for alternative criteria.

Standard errors of estimated percentages. The reliability of an estimated percentage, computed using sample data from both numerator and denominator, depends on both the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the parameter from Table B indicated by the numerator.

The approximate standard error, sx,p, of an estimated percentage can be obtained by use of the formula


(1)

Here x is the total number of persons in the base of the percentage, p is the percentage (0 £ p £ 100), and b is the parameter in Table B associated with the characteristic in the numerator of the percentage.

Illustration #1

Of the 60,207,300 persons in panel 1, 10.79 percent identified themselves as Hispanic. Use the appropriate parameter from Table B and formula (1) to get

Percentage, p10.79
Base, x60,207,300
b parameter7,214
Standard error0.34
90% conf. int.10.23 to 11.35

The standard error is calculated as



The 90-percent confidence interval of the percentage of Hispanic persons in panel 1 is calculated as 10.79 ± 1.645´0.34.

Illustration #2

Of the 9,605,600 persons who identified themselves as Black in all panels, 28.07 percent preferred the term "African American" when given a list of terms describing their racial group. Use the appropriate parameter from Table B and formula (1) to get

Percentage, p28.07
Base, x9,605,600
b parameter2,975
Standard error0.79
90% conf. int.26.77 to 29.37

The standard error is calculated as



The 90-percent confidence interval of the percentage of persons in all panels who identified themselves as Blacks and preferred the term "African American" to describe their racial group is calculated as 28.07 ± 1.645´0.79.

Standard error of a difference. The standard error of the difference between two sample percentages is approximately equal to


(2)

where sx and sy are the standard errors of the percentages, x and y. The correlation coefficient, r, is determined from Table C depending on the difference being calculated. This will represent the actual standard error quite accurately for the difference between percentages of the same characteristic in two different panels or for the difference between separate and uncorrelated characteristics in same panel. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

Illustration #3

Of the 59,973,600 persons in panel 2, 10.41 percent identified themselves as Hispanic. Of the 61,190,200 persons in panel 4, 8.58 percent identified themselves as Hispanic. Use the appropriate parameter from Table B, correlation coefficient from Table C, and formulas (1) and (2) to get

x
y
difference
Percentage, p10.418.58 1.83
Number, x59,973,600 61,190,200-
b parameter7,2147,214 -
Standard error0.330.30 0.39
90% conf. int.9.87 to 10.95 8.09 to 9.071.19 to 2.47

The standard error of the difference is calculated as



The 90-percent confidence interval around the difference is calculated as 1.83 ± 1.645´0.39. Since this interval does not include zero, we can conclude with 90 percent confidence that the percentage of persons in panel 2 who identified themselves as Hispanic is greater than the percentage of persons in panel 4 who identified themselves as Hispanic.


Table B. Parameters for Computation of Standard Errors for May 1995 Supplement


Characteristic

b parameter

Estimates Calculated Within a Panel for All Questions except Self-Response Only Questions *

Estimates Calculated Within a Panel or Combining Panels for Self-Response Only Questions *

7,214


2,975


Table C. Correlation Coefficients for Computation of Differences for May 1995 Supplement


Level of Estimate

r

Comparisons Within the Same Panel

Comparisons of Panel Differences for
- Estimates Greater than 5% of Total Population
- Estimates Less than 5% of Total Population

Comparisons Combining Panels (applies to Self-Response Only Questions*)

0.00


0.22
0.00

0.00

* The following variables were from self-response only questions: PRSHSPRA, PRSTERMA, PRSTERMB, PRSTERMH, PRSTERMM, PRSTERMW, PUSA7A, PUSA7C, PUSA7D, PUSA8A, PUSA8C, PUSA8E, PUSA8G, PUSB9A, PUSB10A, PUSB10B, PUSB11A, PUSB11C, PUSB11E, PUSB11G, PUSC2E, PUSC3F, PUSC7A, PUSC7C, PUSC7E, PUSC7G, PUSC7I, PUSD2E, PUSD3F, PUSD4A, PUSD8A, PUSD8C, PUSD8E, PUSD8G, and PUSD8I.


1995 Race and Ethnicity - Data Quality Page

CPS Main Page


Source: U.S. Census Bureau
Author: Thomas Moore III-Census/DSMD
Contact: (ask.census.gov) CPS Help-Census/DSD/CPSB
Last revised: July 16, 1997
URL: http://www.bls.census.gov/cps/racethn/1995/ssrcacc.htm