## Data-Driven Risk Models Could Help Target Pipeline Safety Inspections

by Rick Kowalewski, Pipeline and Hazardous Materials Safety Administration, and Peg Young, Ph.D., Bureau of Transportation Statistics

Federal safety agencies share a common problem—the need to
target resources effectively to reduce risk. One way this targeting is commonly done is with a risk model that uses
safety data along with expert judgment to identify and weight risk
factors. In a joint effort, the U.S.
Department of Transportation's Bureau of Transportation Statistics (BTS) and
Pipeline and Hazardous Materials Safety Administration (PHMSA) sought to
develop a new statistical approach for modeling risk by *letting the data weight the data*—by using
the statistical relationships among the data, not expert opinion, to develop
the weights.

Some key findings:

- Weighting data through statistical procedures was superior to judgment-weighting in predicting (targeting) relative risk.
- Statistical
modeling can help not only target
*which*operators*what*to inspect based on a set of risk factors. - Pipeline infrastructure, operator performance, and incident history appear to be about equally useful in predicting future risk.

### Program Background

PHMSA's mission is to protect people and the environment from the risks inherent in the transportation of hazardous materials by pipeline and other modes of transportation. Each year the pipeline safety program inspects several hundred thousand miles of interstate pipelines carrying natural gas and hazardous liquids across the United States. These pipelines are operated by over 1,000 operators who manage systems ranging from a few miles to tens of thousands of miles. While a pipeline might seem to be a very simple system, in fact these systems are very complex, and each system has some unique characteristics.

The general approach for conducting standard inspections
until now has been to inspect each major part of each system every 3
years. In 2006, PHMSA initiated a research/pilot project to
integrate the various kinds of inspections it conducted, to re-examine the
3-year inspection interval for standard inspections, and to focus the scope of
its inspections based on operator risk. Changing inspection intervals from a *periodic*-basis
to a *risk*-basis and changing from
comprehensive to focused inspections reflect a significant change in
approach. Program managers understood
from the outset that the new approach would require a better risk model.

### The Current Risk Model

For more than a decade, PHMSA has used the Pipeline Inspection Prioritization Program (PIPP) to schedule inspections and allocate resources. PIPP is a data-based model using 10 to 12 data variables (depending on type of pipeline) that are transformed into 9 indexes, which are added together for an overall risk score. The data variables for both hazardous liquid and gas transmission pipelines are listed in table 1.

Beginning with these input variables, each one is transformed into another variable (the individual PIPP scores) ranging from 0 to 9 points, depending on the input variable, and then combined into the final total PIPP score. The variables were selected using expert judgment, and the transformations that determine the weight for each variable also used expert judgment. PIPP results are used with other information to help set scheduling priorities for inspections.

PIPP has been shown to be 3 to 4 times better than random selection in
identifying ("predicting") future risk as reflected in the number of pipeline
incidents.^{1} However, PIPP tends to underestimate risk
(substantially) where the actual number of incidents is high, and overestimate
risk (somewhat) where the number of incidents is low. This difference is illustrated in the the two
PIPP score scatterplots in figure 1 for hazardous liquid pipelines and for
natural gas pipelines, respectively.

### The New Model

The new model predicts the number of pipeline incidents and the incident rate per mile of pipeline for each pipeline operator. To develop predictions, researchers took several years of historical data to run simulations—using, for example, data from 2002 to 2004 to "predict" 2005. The data were organized conceptually into three sets, each using different data; the results are reflected in the six remaining "risk" scatterplots in figure 1:

- The
*inherent risk*associated with the pipeline—represented by physical and operating characteristics such as age, materials and coatings, diameter, location, and throughput—is estimated using annual reports submitted by each pipeline operator.^{2}Inherent risk should be independent of how the pipeline is managed and maintained. - The
*performance risk*associated with the operator (i.e, the company)—represented by safety deficiencies—is estimated using the results of past safety inspections—particularly those with the broadest scope, known as Integrity Management (or IM) inspections.^{3}Performance risk should be independent of the pipeline characteristics. - The
*historical risk*associated with past incidents is estimated from incident data reported to PHMSA by operators.^{4}Historical risk is assumed to reflect the combination of both inherent risk of the pipe and performance risk of the operator.

Each set of data generated separate predictions of future incidents that were also combined into a single prediction for each operator. The diagonal line in each graph in figure 1 represents perfect prediction in which the predicted number of incidents equals the actual number of incidents. The further the data points are from the diagonal line, the poorer the performance of the predictive model. Gas transmission operators were separated from hazardous liquid operators, as they are in PIPP, because they present very different system profiles, different risks, different data, and different numbers of incidents (see table 2). Other breakouts might also make sense (e.g., by product for liquid pipelines, or onshore v. offshore pipeline) but the research has not explored these.

For presentation purposes, small operators (with less than 500 miles of pipeline) were separated from large operators because their operating environment tends to be different and the relatively lower number of incidents makes the results somewhat less reliable. The analysis behind all the models were performed in the statistical software package SAS 9.1.

### Statistical approaches

Three key characteristics of the data influenced the choice of statistical models:

- Incidents occur infrequently, so the models would have to deal well with small numbers.
- The number of incidents is a count value, with no fractional or negative values.
- The number of incidents per operator is highly skewed, with a large number of operators having zero incidents in any given year.

Traditional linear regression, which relies on the assumption
of normally distributed data, is inappropriate for count data that are highly
skewed towards zero. Two other
models—the Poisson distribution and negative binomial regression^{5}—can
handle such data. Another important
quality of these two models is their ability to control for exposure variables,
such as miles of pipeline. The negative
binomial is the more general model, and this was used to detect and weight risk
variables for both inherent risk and performance risk.^{6}

The analysis of the historical risk associated with past
incidents presented a different set of conditions. The past 3 years of incidents and the next
(to-be-predicted) year of incidents most likely are not independent from one
another, so the data were transformed to create an "orthogonal" regression
model that would allow modeling the 3 years of incidents together to estimate
future risk. ^{7}

Each of these major outputs—inherent risk, performance risk,
and historical risk—provide a separate prediction of risk, but they can also be
combined to present a single estimate. The approach taken here was to take the average of the three results.^{8} Other possibilities not examined here might use another model to weight these
three as inputs to an overall risk score, again *letting the data weight the data*, or developing an equation
that might relate any one output to the other two. Figure 1 provides a graphical synopsis of the
predictive accuracy for estimating the number of accidents per operator based
on PIPP scores, inherent risk, operator risk, and historical risk.

The predictive quality of each model tested was compared using a standard statistical measure of error—the mean absolute deviation (MAD)—which averages the absolute difference between the predicted value and the actual value for each operator (see table 3). For example, when the model predicts 7.5 incidents and 5 actually occur, the error is 2.5; when the model predicts 4 incidents and 5 actually occur, the error is 1. MAD provides a sense of "how far off" the model predictions are from the actual values.

### Testing Inputs to the Model

A key indicator for the effectiveness of any new model was its ability to predict risk better than the existing judgment-weighted model (PIPP ranking). In practice, this should be fairly easy because a statistical model could simply reweight the 10 input variables in PIPP or the 9 transformed variables for a better prediction using data-weighting. Other obvious inputs to test included:

- the nave model (which says that what happened last year is likely to happen again next year);
- mileage alone (which suggests that the extent of the system might be the most important indicator of the risk of incidents);
- the input variables into PIPP—reweighted using the new statistical procedures;
- the output variables (L-scores) from PIPP before the PIPP ranking is calculated—reweighted using the new statistical procedures; and
- each of the new indicators of risk—estimating inherent risk associated with the pipeline, performance risk associated with the operator, and historical risk associated with past incidents.

The results demonstrate that PIPP performs the worst in targeting risk, and that reweighting the PIPP variables can improve the predictive quality (reduce the error). Surprisingly, mileage alone and the nave model both were better (smaller error) than PIPP in predicting future risk, but such simple models offer little guidance in selecting appropriate sites to inspect. The new model performed well (with a MAD of 1.0), although the analysis indicated noticeable differences between gas transmission operators and hazardous liquid operators. Hazardous liquid pipeline incidents are more prevalent and more concentrated (fewer operators), so the data provide a better basis for prediction.

The three main components of the new model—inherent risk, performance risk, and historical risk—performed about equally well in predicting future incidents.

### Findings From the Modeling Research

Modeling inherent risk associated with the pipeline demonstrated that mileage, throughput (barrel-miles per year), date of installation, and pipeline diameter were significant risk factors. Six variables were significant in predicting future incidents for gas transmission systems, and 14 variables were significant for hazardous liquid systems. About half of these variables were negatively correlated with risk, meaning that they had a "protective effect." (Table 4 provides the listing of the significant variables for both models.)

Modeling performance risk associated
with the operator demonstrated that a few key inspection areas from Integrity
Management^{9} inspections were most highly correlated with future risk. One area (*integrity
assessment review*) was negatively correlated, suggesting that
finding deficiencies in this area helped an operator rapidly improve its safety
program. The most significant risk
factor was in the area of *continual
evaluation and assessment*—which inspection staff have suggested
might be a critical indicator of an operator's safety program.

Modeling historical risk associated with past incidents demonstrated that the passage of time rapidly degrades the utility of the data. After 2 years, past incidents do not appear to be useful in predicting future risk. The most recent year is most important, and the model weights this year most heavily.

### Significant Data and Modeling Issues

While the model demonstrates the general effectiveness of statistical tools as an alternative to judgment-weighting, several important data limitations and modeling issues remain to be addressed. Some of the more important issues are listed here:

- Data on operators' systems and operator relationships reflect a snapshot in time; changes might not be captured for up to a year, so some data are outdated.
- Deficiency data from inspections are largely limited to one major type of inspection—Integrity Management inspections—representing only a small portion of the inspections conducted.
- The model does not differentiate more serious incidents (the focus of the agency's performance goals) from those with less severe consequences (actual or potential).
- The model introduces an exponential function that can dramatically over-predict incidents when new data are outside the historical range.
- Small numbers of incidents each year limit the ability to isolate combinations of factors that might be statistically significant.

### Continuing Research

The first line of research, currently
underway, is to refine the incident measures to reflect the *consequences* of incidents—to weight
incidents by potential severity in terms of harm to people and/or the
environment. Using conditional
probabilities, we have found so far that three variables help explain whether
an incident is likely to be serious: fire/explosion (indicating a violent incident), whether the incident
occurred in a high consequence area (indicating proximity to people), and
incident cause (e.g., corrosion or excavation damage).

Some general model improvements are planned as well. These would separate out onshore v. offshore systems, interstate v. intrastate operators, and certain commodities that have special risk characteristics. The relationship between inherent risk, performance risk, and historical risk needs to be further explored and modeled. The issue of total number of incidents v. the rate of incidents per mile needs to be addressed; it is not clear which is more important in targeting inspections. And operator relationships—where some operators are part of a larger group of operators that share certain plans and management—need to be addressed because some inspections are targeted at this higher corporate level.

There are several areas where the measures for inherent risk, performance risk, and historical risk could be enhanced. Improvement would include targeted analyses of certain key variables to better understand why they are or aren't significant risk factors, adding more inspection data, and testing the time-sensitivity of inspection data.

After refinements are made, the model needs to be validated with data from other years, uncertainty should be incorporated into the results, and PHMSA program staff need to be involved in formulating the best presentation of results for the intended use—targeting and focusing inspections.

A parallel effort will extend the concepts from this modeling effort to another safety program—hazardous materials transportation safety—which cuts across four other modes of transportation. The model might be more generally applicable in other federal safety programs as well.

^{1} By scaling PIPP scores to the number of actual incidents, predictive quality
was measured by the correct "hits" to determine the percent correct. This was compared to a random selection model
where each operator was simply assigned an equal share of points.

^{2} See for access to annual reports filed by pipeline operators.

^{3} Deficiency data are captured at the point of inspection for Integrity
Management (IM) inspections of pipeline operators. Where deficiencies are serious, PHMSA pursues
enforcement action. Data on these
actions are available at .

^{4} Incident data are available at .

^{5} In a recent review of the Motor Carrier Safety Status Measurement System, or
SAFESTAT, model used by the Federal Motor Carrier Safety Administration, the
Government Accountability Office (GAO) recommended a negative binomial
regression in place of expert opinion to weight the risk factors used in
targeting motor carrier safety inspections. This work by GAO was a strong factor in the risk modeling effort by BTS
and PHMSA. See *Motor Carrier Safety: A Statistical Approach Will
Better Identify Commercial Carriers That Pose High Crash Risks Than Does the
Current Federal Approach, *June 2007 (GAO-07-585).

^{6} For a good explanation of the Poisson and negative binomial models and how they
are estimated in SAS, see *Logistic
Regression Using SAS: Theory and Application*, by Paul D. Allison,
1999 (SAS Institute Inc.).

^{7} "Orthogonal variables" are linearly independent. For details on orthogonal
regression, see A. Stuart, J.K. Ord, and S.F. Arnold. 1999. Kendall's Advanced
Theory of Statistics, 6^{th} ed. London: Edward Arnold,
pp. 764-766.

^{8} Although historical risk—using incident data—might reflect the nexus of the
inherent risk associated with the pipeline and the performance risk associated
with the operator, using equal weights to average provides a simple
approximation of overall risk. Other
statistical methods might provide a better way to combine these factors.

^{9} The Integrity Management program was introduced over the last several years,
first for hazardous liquid pipelines then later for gas transmission
pipelines. This program requires
pipeline operators to identify and understand the risks in their systems,
identify high consequence geographic areas, establish programs for inspecting
and repairing pipelines, and continuously monitoring their systems.

This report is the result of joint research by Rick Kowalewski, Senior Advisor of the Pipeline and Hazardous Materials Safety Administration (PHMSA), and Peg Young, Statistician for the Bureau of Transportation Statistics (BTS).
For questions about this or other BTS reports, call 1-800-111-1111, email [email protected], or visit emedjimurje.info. |