While catastrophe models follow a similar overall approach to risk assessment, there can be wide variations in results, leaving users to question which is the most appropriate. By Atul Khanduri

CATASTROPHE MODELS are ubiquitous. Insurers and reinsurers now routinely use several commercial models for a wide range of risk management applications from pricing and risk selection to developing capital management and reinsurance or retrocession strategies.

While the models follow a similar overall approach to catastrophe risk assessment, each uses different assumptions, data inputs and complex computational algorithms. The inherent randomness in the underlying natural phenomena pertaining to complex meteorological or geological processes, compounded with a lack of complete information and understanding of the hazards, adds uncertainty to modelling.

Given this uncertainty, as well as the differences in modelling methodologies, some variability in loss estimates is to be expected. However, the outputs of competing models, while generally consistent, often vary significantly (see Figure 1), raising questions about the validity of the models and complicating pricing and underwriting decisions.

Which model to use?

Using multiple models may help avoid individual model biases as well as help assess the range of modelling uncertainty, but the big question on many a risk manager’s mind is which model provides the most reasonable view of risk? Alternatively, for a company using two or more models, how should the data from multiple models be combined to reach the best estimate of probable maximum loss?

A prevalent view in the industry, in such cases, is to assume each of the models to be equally acceptable, implying that the average may provide the best answer. Seemingly close results from two models may convince underwriters to use either, or the average, of two results, placing less credibility on the third that suggests a far greater or smaller exposure. Alternatively, divergent results may sometimes spur detailed investigation of the models to establish the cause of the variation.

While using multiple models is a way of addressing some of the practical issues caused by differences between models, it puts significant demand on resources and time. A common alternative is to employ a single “best” model or a single primary model with a secondary model for comparison and benchmarking. However, problems arise when model selection is based solely on subjective criteria, like easy usage and attractive licensing fees, rather than on hard technical considerations. Whatever the criteria, given the potential of significant impact on a company’s bottom line, an objective and thorough understanding of the models and the results produced is essential.

The science and craft of catastrophe modelling

Catastrophe models are complex systems that rely on a range of assumptions and expertise across scientific, engineering and actuarial fields for assessing catastrophe risk. The three basic components of a catastrophe model are hazard, vulnerability and loss. The hazard module provides the frequency and magnitude of potential catastrophic events, including the intensity of the event at a given location. For a given intensity of hazard, the vulnerability component determines the damage to property.

The insured losses are calculated by applying the policy conditions, such as limits, deductibles and risk-specific reinsurance terms, to the total damage estimates. The catastrophe models present long term catastrophe risk through exceedence probability (EP) curves that give the probability that losses will exceed a certain amount, from either single or multiple occurrences of events.

Although each model component is grounded in sound theoretical and physical concepts, a good deal of calibration also takes place to fine-tune each component within the overall loss assessment framework. This involves the use of experience and subjective judgment, thus, making catastrophe modelling as much a craft as a science.

Evaluating catastrophe models

Over the years, catastrophe models have been constantly updated and refined to incorporate the latest technologies, data and research. The modellers have also taken advantage of growing computer power to make models increasingly granular. The incorporation of more refined modeling parameters, such as high resolution land use and land cover data, detailed soil information, expanded range of building types, helps provide a growing range of predictive capabilities and so better models, but it has also made them more complex and difficult to understand and evaluate.

“Models are updated frequently, so assessments tend to become outdated quickly, necessitating frequent evaluations

Moreover, the models are updated frequently, so the model assessments tend to become outdated quickly, necessitating frequent evaluations. Therefore, a systematic and objective approach to model evaluation can help companies keep on top of the model complexities and changes, allowing for both qualitative and quantitative assessment of the models.

Model evaluation or validation encompasses a range of activities undertaken to analyse and verify the overall performance of the models to assess their accuracy. Model evaluation can be performed on two fronts: a) outcome analysis and b) assessment of the conceptual and theoretical soundness of the model. Outcome analysis means validating the modelled loss estimates against predefined benchmarks. Evaluation of the conceptual and theoretical soundness involves assessment of the main components of catastrophe models, the hazard and vulnerability modules.

The key to successful loss validation lies in establishing suitable benchmarks. This includes collecting data on actual historical losses, trending them to current values and comparing them to modelled losses to assess the divergence in modelled results. For example, for the United States, historical industry insurance loss data are available through a variety of sources such as the Property Claims Service (PCS), National Oceanic and Atmospheric Administration (NOAA) and research publications.

The historical data are trended to current values, taking into account the changes, over time, in population or housing units, wealth, inflation and insurance penetration. Table 1 shows the 10 largest loss generating hurricanes and their associated insured losses – these storms have been responsible for over 60% of the total hurricane losses since 1950. Such loss benchmarks for individual events can be compared against the modelled loss estimates. For each known event, the deviation between modelled and actual losses provides a quantitative measurement of the loss assessment skill of the models. It also helps assess biases, like over or under-estimation, in modelled losses.

Figure 2 shows comparisons between modelled and actual sample US hurricane industry losses. The inclined line represents the convergence of modelled and observed data; the closer the dots near the inclined line, the better the correspondence between modelled and observed losses. Note that in this example Model A and Model B show a general tendency toward over-estimation of losses.

It is important to note that the modellers also use information on industry losses for a given event to calibrate or “tweak” their model parameters to match the historical losses. Therefore, a better and more independent test is to validate event-level losses for individual companies with a distinct mix of businesses and regional concentrations.

The validation of losses by line of business may provide insights into each model’s ability to estimate losses for, say, residential and commercial lines. A particular model may sometimes perform well for overall losses but show significant discrepancies for various underlying lines of business, the positive and negative biases for various lines sometimes canceling each other at an aggregate level.

While event-level historical loss validations are invaluable for assessing the performance of a model for individual historical events and providing a general idea about the strength of the model across events of different magnitudes, they do not give a complete picture regarding its quality and robustness, especially from the point of view of long term risk. For example, the well calibrated modelled losses for one event may not necessarily translate to reasonable losses for another, nor for the entire catalogue of potential events.


Therefore, it is important to review the stochastic model and the resulting EP curves for a more complete assessment of a catastrophe model. The stochastic loss validation includes comparisons of the modelled loss distributions with established benchmarks. For example, approximately 100 years of historical US hurricane loss information is available, which can be used to develop a limited benchmark EP curve for hurricane risk in the United States.

The hazard and vulnerability components of the model do not lend themselves to easy validation and quantification. The difficulties lie in developing reliable benchmarks, and obtaining data and information deeply embedded within the models. The hazard components can be benchmarked using historical data from meteorological or geological organisations. For example, baseline regional hurricane landfall probabilities can be developed using the past 150 years of hurricane data available from NOAA. This baseline, compared with each model’s corresponding landfall probability estimates, can help quantify the discrepancies modelled hazard.

Similarly, comparing hazard footprints produced by the model for historical events with observed hazard intensity can help assess the deviations from actual data. Validating the vulnerability functions is difficult, since they are deeply embedded within the models and hence do not lend themselves to easy scrutiny. There is some limited information available through public resources, such as from the state of Florida, that can be can be used to validate individual model components, including vulnerability.

A systematic and objective model evaluation quantifies, for each model, the deviations from benchmarks of event-level industry and company loss estimates as well as the probabilistic losses. Combining the results of all the model evaluation criteria can help rank or weight the models. One or more models can then be selected or blended based on the final aggregate weights given to each model.