Friday, 13 June 2008

Spotting panel data problems

The last couple of posts have pointed out difficulties with specifications of growth models using panel data. A common method for estimating parameters in the specifications is the GMM System. GMM System takes panel data terms which are not correlated with each other and then equates expected and actual quantities based on them to calculate parameter values. It is an application of the Generalised Method of Moments discussed in the "disproportionately useful theories" series here at Great Lakes Economics.

GMM Sys has had lots of plaudits because it has good estimation properties when estimating the parameters of a correctly specified model where the parameters are constant across the panel, both in time and country dimensions. Its principal statistical tests are concerned with getting it to work well with the specified model, so indicate whether instruments used are reasonable and whether standard errors should make allowance for changes in data variation over time.

To check whether the usual reported tests could indicate a problem in a misspecified model, I generated thirty periods of data from a model

y at time t = a + b * y at time t-1 + Normal (0,1) error term.

a and b varied across three countries: country 1 had autoregressive parameter of 0.2, country 2 had AR parameter of 0.5, country 3 had parameter of 0.8. GMM System was used to estimate the results (technical note: after trial, the selected instruments were the third lags of y for the first difference equation, and the differenced second lag (y(t-2) - y(t-3)) for the levels equation. First step GMM System was used.).

The method estimated the common lag parameter at an average of around 0.63 after running twenty or so samples, with minor variations between 0.5 and 0.75. Coefficients were significant, the tests on serial correlation (Arellano-Bond AR(1) and AR(2)) had high significance for AR(1) and low significance for AR(2), and the Sargan test on instrument endogeneity had low significance. These are the common reported tests in panel data growth empirics, and the results would be considered good despite the true generating parameters being highly variable across countries.

The problem is that the tests were not designed to test model misspecification. If they cannot handle such well-behaved data, one would not be confident that they can handle real world data.

No comments: