Article Detail

The Failure of Financial Econometrics: “Stir-Fry” Regressions as a Con Job

Journal 34: Cass-Capco Institute Paper Series on Risk

Imad Moosa

This paper demonstrates the hazard of “stir-fry” regressions, which are used extensively in financial research to produce desirable results by reporting only one or a small number of regressions out of the tens or hundreds that are typically estimated. It is shown, by using data on the capital structure of some Chinese shareholding companies, that the sign and significance of an estimated coefficient change with the set of explanatory variables and that adding more explanatory variables to the regression equation changes the sign and significance of a coefficient on a variable that is already included in the model. It is demonstrated that it is possible to change coefficients from significantly positive to significantly negative and vice versa and that obtaining the desirable results can be achieved by introducing various forms of nonlinearities. Finally, it is shown that it is possible to support either the trade-off theory or the pecking order theory by changing model specification.

In 1983 Edward Leamer published his provocative article, “Let us take the con out of econometrics,” in which he justifiably criticized a practice in which economists tend to engage – that of estimating 1,000 regressions and reporting the one or few that they like [Leamer (1983)]. Almost thirty years later, this practice is still highly popular. In fact it has become more widespread because of the growth in the power of computing. It is particularly widespread in corporate finance where testable models are assembled by combining various hypotheses to come up with a cross-sectional regression equation that has no corresponding theoretical model. The regression equation is subsequently twisted and turned until it produces the results that make a dream come true.

The problem with cross-sectional regressions is that theory is not adequately explicit about the variables that should appear in the “true” model as determined by theory. This would be the case if, for example, the final model specification is derived by solving a theoretical optimization problem. In the absence of a theoretical model, the regression equation is constructed haphazardly, by specifying the dependent variable, y, to be a function of several explanatory variables, xj, where j = 1,….,n. The results typically turn out to be difficult to interpret – for example, x1 is significant when the regression includes x2 and x3, but not when x4 is included. So, which combination of all available xjs is to be chosen?

It is a common practice to report the most “appealing” or convenient regression or regressions after extensive search and data mining (given that we do not know what the “true” model is). While scientific research should be based on a quest for the “truth,” this practice is motivated by the desire to prove a preconceived idea, which is particularly alarming if the idea is driven by ideology. Gilbert (1986) casts significant doubt on the validity of the practice of assigning 999 regressions to the waste bin, because they do not produce the anticipated results. Because of this problem, Leamer (1983) suggested that “econometricians confine themselves to publishing mappings from prior to posterior distributions rather than actually making statements about the economy.” Leamer and Leonard (1983) argued strongly against the conventional reporting of empirical results, stating that “the reported results are widely regarded to overstate the precision of the estimates, and probably to distort them as well.” As a consequence, they pointed out, “statistical analyses are either greatly discounted or completely ignored.” They further argued that the conventional econometric methodology (or “technology” as they called it) “generates inference only if a precisely defined model were available, and which can be used to explore the sensitivity of inferences only to discrete changes in assumptions.” Hussain and Brookins (2001) point out that the usual practice of re-porting a preferred model with its diagnostic tests need not be sufficient to convey the degree of reliability of the determinants (explanatory variables).

The objective of this paper is to demonstrate the hazard of stir-fry regressions, using a dataset (obtained from the OSIRIS database) on the capital structure and its determinants of 343 Chinese shareholding companies. Specifically, five propositions are presented to demonstrate that (i) the sign and significance of an estimated coefficient change with the set of explanatory variables, (ii) adding more explanatory variables to the regression equation changes the sign and significance of a coefficient on a variable that is already included in the equation, (iii) it is possible to change coefficients from significantly positive to significantly negative and vice versa, (iv) obtaining the desirable results can be achieved by introducing various forms of nonlinearities, and (v) it is possible by changing model specification to support either of two competing theories.

Comments

Leave a comment

Comments are moderated and will be posted if they are on-topic and not abusive. For more information, please see our Comments FAQ
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
jailer