Advance Survey Group
Number of Establishments
Auxiliaries (NAICS 484, 4931, 551114)
Publishers (NAICS 5111, 51223)
Electronic shopping mail order establishments (NAICS 4541)
Support activities for printing (NAICS 323120)
Mines (NAICS 2121, 2122, 2123)
Certainty establishments from the 2012 CFS
Other Large establishments
For the first four groups in the table above (Auxiliaries, Publishers, Electronic shopping, and Support activities for printing), the purpose was to identify those establishments that actually conduct shipping activities. In these groups, surveyed establishments that reported that they did not conduct any shipping activity were excluded from the eventual CFS sample universe. For the other categories the objective was to obtain an accurate measure of their shipping activity as well as information.
The (2012) NAICS industries covered in the 2017 CFS are listed in the following table:
Mining (Except Oil and Gas)
Beverage and Tobacco Product Manufacturing
Textile Product Mills
Leather and Allied Product Manufacturing
Wood Product Manufacturing
Printing and Related Support Activities
Petroleum and Coal Products Manufacturing
Plastics and Rubber Products Manufacturing
Nonmetallic Mineral Product Manufacturing
Primary Metal Manufacturing
Fabricated Metal Product Manufacturing
Computer and Electronic Product Manufacturing
Electrical Equipment, Appliance, and Component Manufacturing
Transportation Equipment Manufacturing
Furniture and Related Product Manufacturing
Motor vehicle and parts merchant wholesalers
Furniture and home furnishing merchant wholesalers
Lumber and other construction materials merchant wholesalers
Commercial equip. merchant wholesalers
Metal and mineral (except petroleum) merchant wholesalers
Electrical and electronic goods merchant wholesalers
Hardware and plumbing merchant wholesalers
Machinery, equipment, and supplies merchant wholesalers
Miscellaneous durable goods merchant wholesalers
Paper and paper product merchant wholesalers
Drugs and druggists' sundries merchant wholesalers
Apparel, piece goods, and notions merchant wholesalers
Grocery and related product merchant wholesalers
Farm product raw material merchant wholesalers
Chemical and allied products merchant wholesalers
Petroleum and petroleum products merchant wholesalers
Beer, wine, and distilled alcoholic beverage merchant wholesalers
Miscellaneous nondurable goods merchant wholesalers
Electronic Shopping and Mail-Order Houses
General Freight Trucking
Specialized Freight Trucking
Warehousing and Storage
Newspaper, Periodical, Book, and Directory Publishers
Corporate, Subsidiary, and Regional Managing Offices
Likewise, shipments traversing the United States from a foreign location to another foreign location (e.g., from Canada to Mexico) are not included, nor are shipments from a foreign location to an initial U.S. location. However, imported products are included in the CFS from the point that they leave the importer’s initial U.S. location for shipment to another location. Shipments that are shipped through a foreign territory with both the origin and destination in the United States are included in the CFS data. The mileages calculated for these shipments exclude the foreign country segments (e.g., shipments from New York to Michigan through Canada do not include any mileages for Canada). Export shipments are included, with the domestic destination defined as the U.S. port, airport, or border crossing of exit from the United States. See the Mileage Calculation section for additional detail on how mileage estimates were developed.
Establishments on the Frame
- Metropolitan area: The state part of a selected metropolitan statistical area (MSA) or combined statistical area (CSA).
- The remainder of the state (ROS): The portion of a state containing the counties that are not included in the metropolitan area type CFS Areas defined above.
- Whole state: An entire state where no metropolitan area type CFS Areas are defined within the state. (The remainder of the state is the whole state.)
Geographic Stratum (CFS Area) Type
Sampled CFS Areas
Metropolitan area (CSA or MSA) state part
Remainder of the state (ROS) (1)
Whole state (AK, AR, ID, IA, ME, MS, MT, NM, ND, SD, VT, WV, WY)
Total number of CFS Areas
The industry strata were defined as follows. Within each of the geographic strata, we defined 48 industry groups based on the 2012 NAICS codes:
- Three mining (four-digit NAICS).
- Twenty-one manufacturing (three-digit NAICS).
- Eighteen wholesale (four-digit NAICS).
- Two retail (NAICS 4541 and 45431).
- One services (NAICS 5111 and 51223 combined).
- Three auxiliary (combinations of NAICS 484, 4931 and 551114).
For auxiliaries that responded to the Advance Survey and were found to be shippers, 132 primary strata were created, one in each CFS Area, combining NAICS 484, 4931, and 551114. For auxiliary establishments that did not respond to the Advance Survey, two separate sets of strata were created as follows:
- Up to 132 strata (one per CFS Area) for nonresponding truck transportation establishments and warehousing and storage establishments (NAICS 484 and NAICS 4931).
- Up to 132 strata (one per CFS Area) for nonresponding corporate, subsidiary, and regional managing offices establishments (NAICS 551114).
In order to produce good estimates of shipments of hazardous materials (HAZMAT), twenty-one 6-digit NAICS industries with high amounts of HAZMAT shipments were identified and used to form primary strata. The 2012 CFS data were used to identify these industries and in general, these industries were chosen because:
- They had a large (weighted) total value or total tonnage of hazardous materials.
- A high percentage of their (unweighted) shipments were HAZMAT shipments.
Fifteen of the 21 industries were made certainty strata and the remaining six industries were made into primary strata defined by state and the 6-digit NAICS code.
The table below shows the number and types of primary strata for the main, auxiliary, HAZMAT and special certainty components of the sample. Note that these are the number of strata before they are further stratified by measure of size (MOS) size class.
Number of Primary Strata
Number of Sample Establishments
Main (NAICS x CFS Area)
Advance survey responders
Advance survey non-responders – NAICS 484 & 4931
Advance survey non-responders – NAICS 551114
Certainty (15 industries)
Sampled (6 industries x state)
Special Certainty Strata
Air or water shipper in prior CFS
Establishment specifically identified to be included
- A target coefficient of variation (CV) for estimated total MOS was assigned to each primary stratum (geography by industry cell).
- Within each primary stratum, substrata defined by MOS were developed to minimize the sample size needed to achieve the target CV. The establishments in the largest MOS size class were taken with certainty. For the noncertainty substrata, the sample was allocated according to the Neyman allocation, since the Neyman allocation minimizes the sample size needed to achieve a target CV.
- Once the minimum sample sizes for each primary stratum were determined, these were added together and compared to the desired total sample size of 100,000. If the total was not close enough to 100,000, we multiplied all of the target CVs by a fixed factor and repeated the process until the total sample size was close to 100,000.
- The establishments in the geography by industry by MOS size class substrata were selected by simple random sampling without replacement. The total sample size was 103,877 establishments of which 51,266 were selected with certainty (see the table below).
Primary Strata Type
Total MOS ($mil)
MOS of Sampled Estabs ($mil)
MOS of Certainty Estabs ($mil)
Total number of shipments in the reporting week
Minimum number of shipments to be reported
Maximum number of shipments to be reported
1 – 40
Report every shipment
41 - 600
Select (and report) a systematic sample
601 – 3,000
3,000 or more
- Shipment ID number.
- Shipment date (month, day).
- Shipment value.
- Shipment weight in pounds.
- Commodity code from Standard Classification of Transported Goods (SCTG) manual.
- Commodity description.
- An indication of whether the shipment was temperature controlled.
- United Nations or North American (UN/NA) number for hazardous material shipments.
- U.S. destination (city, state, ZIP code)—or gateway for export shipment.
- Modes of transport.
- An indication of whether the shipment was an export.
- City and country of destination for exports.
- Export mode.
For a shipment that included more than one commodity, the respondent was instructed to report the commodity that made up the greatest percentage of the shipment’s weight.
Prior to 2012 CFS, Fats and oils were all classified under Commodity Code 07. For CFS 2012 CFS, oils and fats treated for use as biodiesel moved to Commodity Code 18 under Fuel Oils.
Prior to the 2012 CFS, fats and oils intended for use as biodiesel were not specifically identified, but were included in Commodity Code 074. In the 2012 CFS, fats and oils intended for use as biodiesel were specified and classified in under Commodity Code 182 (biodiesel and blends of biodiesel).
Prior to the 2012 CFS, fats and oils intended for use as biodiesel were not specifically identified, but were included in Commodity Code 0743. In the 2012 CFS, fats and oils treated for use as biodiesel were specified and classified under Commodity Code 182.
Prior to the 2012 CFS, alcohols intended for use as fuel were not specifically identified, and were included under SCTG 08. In the 2012 CFS, ethanol for fuel moved to SCTG 17. Additionally, beverages and denatured alcohol were more clearly identified.
Prior to the 2012 CFS, denatured alcohol of more than 80% alcohol by volume was included in Com-modity Code 083. In the 2012 CFS, denatured alcohol of more than 80% by volume was moved to Commodity Code 084, and ethanol for use as biofuel was moved to Commodity Codes 175 and 176.
Prior to the 2012 CFS, both Denatured ethyl alcohol, and undenatured ethyl alcohol of more than 80% alcohol by volume were included in Commodity Code 0831. In the 2012 CFS, denatured alcohol of more than 80% by volume was moved to Commodity Code 0841, and ethanol for use as biofuel was specified and moved to Commodity Codes 175 and 176.
Prior to 2012 CFS, Denatured ethyl alcohol, and undenatured ethyl alcohol were all classified under SCTG 08. For CFS 2012 CFS, ethanol that is used for fuel was identified and removed from SCTG 08 to SCTG 17 under fuel alcohols. Also, kerosene, which prior to 2012 CFS, was included in Commodity Code 19, was moved under Commodity Code 17.
Prior to the 2012 CFS, Commodity Code 171 only included gasoline, and blend of gasoline and ethanol were not identified. In the 2012 CFS, Commodity Code 171 includes gasoline, and mixtures of up to 10% ethanol and gasoline.
Prior to the 2012 CFS, kerosene was included in Commodity Code 192, and type A jet fuel was classified under Commodity Code 172.. In the 2012 CFS, all kerosene are classified under Commodity Code 172.
Prior to the 2012 CFS, kerosene was included in Commodity Code 192, and type A jet fuel was classified under Commodity Code 1720. In the 2012 CFS, all kerosene is classified under Commodity Code 1720.
Prior to the 2012 CFS, fats and oils intended for use as fuel were not identified as such, and were included in Commodity Code 07. In the 2012 CFS, such fats and oils were identified as biodiesel and were moved under Commodity Code 18.
CFS Shipment Value and Weight Imputation Cell Descriptions
Description of Donor Pool Shipments
From same establishment and in the same detailed shipment size class
From same company and in the same detailed shipment size class
From same geographic area and in the same detailed shipment size class
From same establishment and in the same broad shipment size class
From same company and in the same broad shipment size class
From same geographic area and in the same broad shipment size class
From same establishment (no restriction on shipment size)
From same company (no restriction on shipment size)
From same geographic area (no restriction on shipment size)
- Changing a single digit (other than the first one), or
- Transposing two digits
then the ZIP code was changed to a valid one for the reported destination city. Approximately 72,700 destination ZIP codes were corrected in this process.
For certain shipments with missing destination ZIP codes, a value was imputed using a two stage hot-deck process. A shipment was considered a “recipient” if its destination city and state were valid but its destination ZIP code was missing. The recipient’s missing ZIP code was imputed as follows:
- In the first stage, the donor pool for each recipient consisted of all complete shipments with the same destination city and state as the recipient and also from the same establishment as the recipient. If this donor pool was not empty then one of the shipments in this donor pool was randomly selected and the destination ZIP code of this selected donor was assigned to the recipient.
- If the first stage donor pool was empty (there was no matching shipment from the same establishment), then the donor pool was enlarged to include all complete shipments with the same destination city and state as the recipient – regardless of source. Then one of the shipments in this larger donor pool was randomly selected and the destination ZIP code of the selected donor assigned to the recipient.
Approximately 27,400 shipment destination ZIP codes were imputed in this process.
For intra-ZIP shipments, shipments with the origin and destination in the same ZIP code, the square root of the total ZIP code area in square miles was used as an estimate for the distance shipped.
The following types of methodological changes to mileage processing were incorporated in 2017:
The quarter weight inflates an establishment’s estimate for a particular reporting week to an estimate for the corresponding quarter. For noncertainty shipments, the quarter weight is equal to 13. The quarter weight for most certainty shipments is also equal to 13. However, if a respondent was able to provide information about all large (or certainty) shipments made in the quarter containing the reporting week, then the quarter weight for each of these shipments was set to one. For each establishment, the quarterly estimates were added to produce an estimate of the establishment’s value of shipments for the entire survey year. Whenever an establishment did not provide the Census Bureau with a response for each of its four reporting weeks, we computed a quarter nonresponse weight. The quarter nonresponse weight for a particular establishment is defined as the ratio of the number of quarters for which the establishment was in business in the survey year (usually four) to the total number of quarters (reporting weeks) for which we received usable shipment data from the establishment.
Using these four component weights and the reported (or imputed) shipment values, we computed an estimate of each establishment’s value of shipments for the entire survey year. We then multiplied this estimate by a factor that adjusts this estimated value to the measure of the establishment’s value of shipments or receipts used for sample stratification purposes. This weight, the establishment-level adjustment weight, attempts to correct for any sampling or nonsampling errors caused by the selection of specific reporting weeks or that occur during the sampling of shipments by the respondent.
The adjusted value of shipments estimate for an establishment was then weighted by the establishment weight. This weight is equal to the reciprocal of the establishment’s probability of being selected into the first stage sample (see Sample Design below).
A final adjustment, for most industries, the nonresponse post-stratification adjustment weight, adjusts the weighted shipment value (using all prior weighting factors) to the tabulated revenue data from other Census Bureau sources. This accounts for:
- Establishments that did not respond to the survey or from which we did not receive any usable shipment data.
- Changes in the universe of establishments between the time the first-stage sampling frame was constructed (2016) and the year in which the data were collected (2017).
For the preliminary 2017 CFS estimates, the nonresponse post-stratification cells were defined by industry categories, typically by 3-digit NAICS codes (for Manufacturing) or 4-digit NAICS codes (all other industries). There were approximately 45 nonresponse post-stratification cells. The other Census Bureau sources for the adjustment data were:
- 2016 County Business Patterns
- 2017 Manufacturers’ Shipments, Inventories, and Orders
- 2017 Monthly Wholesale Trade Survey
- 2016 Annual Wholesale Trade Survey
- 2017 Monthly Retail Trade Survey
The sampling error of the estimates in this publication can be estimated from the selected sample because the sample was selected using probability sampling. Common measures related to sampling error are the sampling variance, the standard error, and the coefficient of variation (CV). The sampling variance is the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. For percentage estimates, such as percentage change or percentage of a total, the standard error of the estimate is provided.
Nonsampling errors are difficult to measure and can be introduced through inadequacies in the questionnaire, nonresponse, inaccurate reporting by respondents, errors in the application of survey procedures, incorrect recording of answers, and errors in data entry and processing. In conducting the 2017 CFS, every effort has been made to minimize the effect of nonsampling errors on the estimates. Data users should take into account both the measures of sampling error and the potential effects of nonsampling error when using these estimates.
The particular sample of shipments used in this survey is one of a large number of samples of the same size that could have been selected using the same design. If all possible samples had been surveyed under the same conditions, an estimate of a population parameter of interest could have been obtained from each sample. These samples give rise to a distribution of estimates for the population parameter. A statistical measure of the variability among these estimates is the standard error, which can be estimated from any one sample. The standard error is defined as the square root of the variance. The coefficient of variation (or relative standard error) of an estimator is the standard error of the estimator divided by the estimator. For the CFS, the coefficient of variation also incorporates the effect of the noise infusion disclosure avoidance method (see Disclosure Avoidance below). Note that measures of sampling variability, such as the standard error and coefficient of variation, are estimated from the sample and are also subject to sampling variability and technically, we should refer to the estimated standard error or the estimated coefficient of variation of an estimator. However, for the sake of brevity, we have omitted this detail. It is important to note that the standard error only measures sampling variability. It does not measure systematic biases of the sample. The Census Bureau recommends that individuals using estimates contained in this report incorporate this information into their analyses, as sampling error could affect the conclusions drawn from these estimates.
An estimate from a particular sample and the standard error associated with the estimate can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the result of a complete enumeration of the sampling frame conducted under the same survey conditions. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained, then:
- For approximately 90 percent of the possible samples, the interval from 1.833 standard errors below to 1.833 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.
- For approximately 95 percent of the possible samples, the interval from 2.262 standard errors below to 2.262 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions. The 1.833 and 2.262 values, used to compute the 90% and 95% confidence intervals, are taken from the t-distribution with nine degrees of freedom. This takes into account the uncertainty in the estimates of the CVs produced using the random group method with ten random groups.
To illustrate the computation of a confidence interval for an estimate of total value of shipments, assume that an estimate of total value is $10,750 million and the coefficient of variation for this estimate is 1.8 percent, or 0.018. First obtain the standard error of the estimate by multiplying the value of shipments estimate by its coefficient of variation. For this example, multiply $10,750 million by 0.018. This yields a standard error of $193.5 million. The upper and lower bounds of the 90-percent confidence interval are computed as $10,750 million or minus 1.833 times $193.5 million or $354.7 million. Consequently, the 90-percent confidence interval is $10,395 million to $11,105 million. If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of these intervals would contain the result obtained from a complete enumeration.
- Response errors.
- Differences in the interpretation of the questions.
- Mistakes in coding or keying the data obtained.
- Other errors of collection, response, coverage, and processing.
Although no direct measurement of the potential biases due to nonsampling error has been obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize their influence. The Census Bureau recommends that individuals using estimates in this report incorporate this information into their analyses, as nonsampling error could affect the conclusions drawn from these estimates.
Some possible sources of bias that are attributed to respondent-conducted sampling include:
- Constructing an incomplete frame of shipments from which to sample.
- Ordering the shipment sampling frame by selected shipment characteristics.
- Selecting shipment records by a method other than the one specified in the questionnaire’s instructions.
The respondents who had reported a shipment with unusually large value or weight when compared to the rest of their reported shipments were often ed for verification. In such cases, if we were able to collect information on all of the large shipments a respondent had made either for a particular reporting week or for the entire quarter, we then identified those large shipments as certainty shipments.
- Quarter (reporting week).
Item nonresponse occurs either when a particular shipment data item is unanswered or the response to the question fails computer or analyst edits. Nonresponse to the shipment value or weight items is corrected by imputation, which is the procedure by which a missing value is replaced by a predicted value obtained from an appropriate model. (See above for a description of the imputation procedure.)
Shipment, quarter, and establishment nonresponse describe the inability to obtain any of the substantive measurements about a sampled shipment, quarter, or establishment, respectively. Shipment and quarter nonresponse are corrected by reweighting (see the descriptions of the shipment and quarter nonresponse weights in the Estimation section above). Reweighting allocates characteristics to the nonrespondents in proportion to the characteristics observed for the respondents. The amount of bias introduced by this nonresponse adjustment procedure depends on the extent to which the nonrespondents differ, characteristically, from the respondents.
Establishment nonresponse is corrected during the estimation procedure by the nonresponse post-stratification adjustment weight. In most cases of establishment nonresponse, none of the four questionnaires have been returned to the Census Bureau after several attempts to elicit a response.
Table 1: 2017 CFS Preliminary Unit Response Rates
Type of Response Rate
Participation Response Rate (PRR) - The Participation Response Rate is the total number of unweighted establishments that provided usable data divided by the total number of establishments in the sample (103,877) (expressed as a percentage).
Unit Response Rate (URR) - The Unit Response Rate is defined as the ratio (expressed as a percentage) of the total unweighted number of establishments that provided usable data to the total number of establishments that were eligible (or potentially eligible) for data collection. URRs are indicators of the performance of the data collection process in obtaining usable responses.
Weighted Unit Response Rate (WRR) - The Weighted Unit Response Rate is defined as the percentage of the total weighted sampling measure of size of the establishments that provided usable data to the total weighted sampling measure of size of all establishments that were eligible (or potentially eligible) for data collection. This incorporates the size of the establishment as well as its establishment (first-stage sample) weight into the measure of response.
The fourth rate is based on the quality of the individual shipment data reported by the responding establishments. These total quantity response rates for the 2017 CFS are shown in Table 2 below (along with the final values from the 2012 survey).
Table 2: 2017 CFS Preliminary Total Quantity Response Rates
Total Quantity Response Rate (TQRR) - The Total Quantity Response Rate is defined as the percentage of the estimated (weighted) total of a given data item (VALUE, TONS, or TON-MILES) that is based on reported shipment data or from sources determined to be of equivalent-quality-to-reported data. The TQRR is an item-level indicator of the “quality” of each estimate. In contrast to the URR, these weighted response rates are computed for individual data items, so CFS produces several TQRRs.
The TQRR is the weighted proportion of the key estimates reported by responding establishments or obtained from equivalent quality sources. This measure incorporates the value of the individual shipment data items and the associated sampling and weighting factors.
Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk of disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.
For the CFS the primary method of disclosure avoidance is noise infusion. Noise infusion is a method of disclosure avoidance in which the weighted values for each shipment are perturbed prior to tabulation by applying a random noise multiplier to shipment value and weight. Disclosure protection is accomplished in a manner that causes the vast majority of cell values to be perturbed by at most a few percentage points. For sample-based tabulations, such as CFS, the estimated relative standard error for a published cell includes both the estimated sampling error and the amount of perturbation in the estimated cell value due to noise. Other cells in the table may be suppressed because the quality of the data does not meet publication standards. By far, the most common reason for suppressing a cell is a high coefficient of variation (greater than 50 percent). These suppressed cells are shown with an “S” in the tables.
The Census Bureau’s Disclosure Review Board (DRB) approved the methodology used to protect the confidentiality of the statistics provided in this release. (Approval CBDRB-FY18-349).
2017 CFS Area
2012 CFS Area
Description of Change (1)
Tallapoosa County, AL added to the CFS Area
Dallas-Fort Worth, TX-OK (TX Part)
Dallas-Fort Worth, TX
Fannin County, TX added to the CFS Area
Lake Charles-Jennings, LA
Lake Charles, LA
Jefferson Davis Parish, LA added to the CFS Area
- NAICS 484 was included as an in-scope auxiliary industry in 2017 and 2012 but not any prior surveys.
- NAICS 51223 (Music publishers) was included as an in-scope publishing industry in 2017 but not in 2012.
- In 2012 and prior surveys, Prepress Services establishments (2007 NAICS 323122) were excluded from the CFS. However the 2012 NAICS revision eliminated Prepress Services as a separate industry and grouped it with Trade Binding and Related Work (2007 NAICS 323121) into NAICS 323120 (Support Activities for Printing). For 2017 all of NAICS 323120 was considered to be in-scope.
The 2012 estimates were based on the industry classification of the sample establishments at the time those estimates were produced (Dec 2014). The 2012 and earlier estimates are never revised to account for subsequent industry classification changes to the sample establishments.
CFS Water Mode Codes
In 2012, certain export shipments that travelled by truck to the port of embarkation and then by ship to the foreign destinations were classified as single-mode truck shipments in 2012 and their domestic water mileage to the US border was not included. In 2017, these shipments are classified as multi-mode truck and water shipments and include the domestic water mileage to the US border.
For 2017, the mode category, “Private Truck” has been renamed “Company-owned Truck”.
The following methodological changes to mileage processing, implemented in 2012 and carried over to 2017, also affected mode assignment (and the shipment distance calculations).
- The maximum weight of a parcel shipment was limited to 150 pounds in 2012 and 2017. In 2007 the limit was 1000 pounds. Shipments with weights above the maximum were re-assigned to a non-Parcel mode, usually a truck mode.
- For 2012 and 2017, there was no minimum restriction on the weight of an air shipment. In 2007 air shipments with a weight of less than 100 pounds were reclassified as Parcel.
- Company-owned truck shipments ( called “Private truck” in 2012) were not routed more than 500 miles during 2012 and 2017 mileage calculation. In 2007 there was no mileage limit.
- In 2012 and 2017 there were major efforts to re-code shipments, where a respondent provided a mode of Other or Unknown, to one of the more descriptive codes. For these type shipments in 2007, “Other” and “Unknown” modes were generally acceptable. During the 2012 and 2017 CFS mileage calculation operations, a review of these “Other mode” shipments was conducted. This analysis showed there to be a few truly “Other mode” shipments. Such shipments were often transported via conveyor belts. The table below compares the value and tonnage estimates for the Other-type modes in the 2007, 2012 and preliminary 2017 releases.
Other multiple modes
More details about mileage calculation and related processing can be found in the Mileage Calculation section of the survey methodology.
- The 2012 ZIP codes were replaced with 2017 ZIP codes.
- Other changes are as described in Methodological changes to Mileage Calculation for the 2017 CFS above
For 2017, the CFS used a machine learning process to code some shipments where the respondent provided a description of the product but not an SCTG code. In particular, we developed a model using the 6.2 million records that respondents did code themselves. This model output the highest-likelihood SCTG code using two input variables: first, the NAICS code of the establishment from which the shipment record came, and second, the description (as a “bag-of-words”) from each record. Using the model’s reported prediction probability as a guide, we took a sampling of 750 records that did not have an SCTG code, and had expert analysts validate the model’s predictions on these records. From this validation exercise, we were able to assign an SCTG code to approximately 106,000 shipments with a high degree of confidence using the model’s output.
A particular combination of origin, destination, commodity, and mode (for example) may be common one year but rare or non-existent in the next survey. While this may reflect true changes in economic activity, it may also result from:
- Failing to include in the CFS sample, the establishments making these shipments, or
- If included, the sampled establishments failing to respond, or
- If responding, failing to include shipments with this particular combination of characteristics in the sample of shipments provided to the Census Bureau.