Thursday, 26 June 2008

Time periods for averaging

It is common in growth empirics to use data which has been averaged over five year periods. So growth is measured as average annual growth over the periods 1980-4, 1985-9, 1990-4, and so on.

The idea is to remove some of the volatility from the data. I have my doubts that five year averaging should be used as widely as it is, in preference to shorter averaging over say three years or no averaging at all, for the following reasons:

1. Many interesting variables which could impact growth have only been collected in recent years, and data collection and empirical analysis may follow behind each other quickly. If five year averaging is used, then no time-series or panel data analysis may be possible.

2. Many estimation methods have biases which decrease with the number of time periods used. Using five year averaging for a wide set of countries means that only eight periods are used at most, which makes the data vulnerable to large biases.

3. Five year averaging is not a panacea for fluctuation removal. Longer term fluctuations remain in the data.

Technology gaps and transfers: their effect on growth

Here are some early conclusions from my estimates on the effects of technology gaps and transfers on economic growth:

* Imports of technical goods (except oxygen for some reason) are associated with lower income growth, even in the presence of log per capita income as a determinant variable. There is probably an estimation problem here – lags of foreign direct investment are used as instruments in the equations, which don't work well with trade variables because of endogeneity. The Sargan tests for exogeneity in the trade equations have tiny p-values. The effect will be to decrease the coefficient on trade, since more growth leads to more trade, so any initial positive effect of trade looks lower, over and above the fact that growth slows down in any case as people get richer.

* Openness and foreign direct investment per person is associated with higher growth.

* Large technology gaps lead to increased growth, and are highly significant, in both variable-minimal MRW/Solow regressions and expanded ones.

* Introduction of technology gaps as a determinant improves the fit of all Solow variables in the basic model – education significance and impact increases threefold (highly educated countries also have smaller gaps so education has a direct positive effect and proxy negative effect for the gap in the basic model), and moderate effect on investment effects too (same reason). However, slightly unexpected and from my perspective unfortunate is that the coefficient on previous period income increases in size and significance. Possibly the introduction clarifies the various relations, and one effect (via the education and investment effects) is to improve the coefficient measurement on lagged income too.

* Adding a cross term (gap*transfer) generally gives a term with far higher significance than individual transfer terms, and reduces their significance considerably. The effect is large with openness and foreign direct investment. The effect is much less noticeable with trade variables, although the estimation problem noted above may apply. To an extent the cross term makes the trade term more negative, indicating, as one of a few likely explanations, that it is capturing some of the positive effect that trade brings and leaving the negative. The cross term is also highly significant in models with a gap term, although the gap term retains some significance and the cross term does not always have the expected, positive, sign.

Monday, 23 June 2008

The outcomes of the Zimbabwean government's actions

The opposition Movement for Democratic Change in Zimbabwe pulled out of the presidential election yesterday, citing violence against its supporters by government linked forces. They have good reasons to be concerned about it, as the risk factors for civil war are very high in the country - major economic contraction, a history of ethnic division, and above all political mobilisation for conflict. These factors are primarily the responsibility of the Zimbabwean government. The MDC's decision and South African shipworkers' inspirational choice not to offload an arms shipment for Zimbabwe may have lowered the risk for the time being.

The economic contraction - possibly unique in a world of growing countries - means that incomes are far lower than they would have been otherwise in an already poor country. The worst damage of war is often indirect, through hunger or disease rather then armaments. Economic activity stops people falling into natural disasters, and war prevents that activity. Starting a war is like shooting the wings off a plane.

The excess mortality in Zimbabwe due to the economic contraction could be calculated. A change in under-five year old child mortality of fifty per thousand is consistent with the corresponding decline in income. The extra deaths may number in the tens of thousands from this source alone.

I hope that the current Zimbabwean government and military realise that history will remember them in the same way as the world's worst leaders if the country continues on its course for much longer.

Technology, growth, and causality

I've run some Granger causality tests on various technological measures and other variables across different countries. Granger causality examines whether a change in something precedes a change in something else. So we might want to test whether b is not zero in the following model
technological change in year t =
constant + a.techno change in year t-1 + b.techno gap in year t-1 + an error term

VAR models like this produce coefficients which are specific to the country, and since they may not be stable across countries, one has to find a way to produce combined statistics for the significance of b across countries. I took simple averages of the significance level of the t-statistic on b, and frequency counts of significance at the five percent level.

Results vary across measures, and the measures of the variables are imperfect, but broadly speaking:
gaps in technology granger-cause technology transfers, with a frequency of 60 percent
technology gaps g-cause technology change, 65 percent
transfers g-cause change, 55 percent
income levels g-cause transfers, 55 percent
income levels g-cause change, 35 percent
transfers g-cause growth, 35 percent
change g-causes growth, 35 percent
national savings g-cause transfers, 40 percent
government expenditure g-cause transfers, 45 percent
national savings g-cause change, 35 percent
government expenditure g-cause change, 50 percent

Observations on unit roots and the GMM

Hamilton in his textbook "Time Series Analysis" makes two points relevant to my recent comments in this blog.

The first (page 409 of the hardback 1994 edition) is that the GMM does not use all information in a dataset, but only selected moment conditions. The point is salient in observing the failure to identify misspecification in the AR1 models discussed previously, since OLS, instrumental variables, Arellano-Bond, and of course the various GMM estimation methods can all be cast as GMM models.

The second (page 516) is that "the goal of unit root tests is to find a parsimonious representation that gives a reasonable approximation to the true process". The issue is that if the true process has, say, an AR parameter of 0.95, working under the assumption of a unit root (AR=1) can sometimes give more accurate results than under the assumption that there is not a unit root. The point partially responds to the criticism reported in an earlier post that econometrics in pure and applied form has focussed excessively on unit root analysis. My only caveat would be that because unit roots are usually specified as a null and testing is usually conservative in rejecting the null, the unit root hypothesis might be accepted principally because of the advantage that the null has, and so the statistical benefits of acceptance would be diluted.

It is encouraging when a writer anticipates where a reader's logic will pass through, not least because one thinks that one is on the right track if someone else has passed through already.

GMM Sys behaviour under the misspecified AR1

More evidence on the behaviour of the GMM System estimates under the misspecified AR1:

The GMM System seems to select the highest AR parameter in the groups of data as its estimate for the misspecified AR parameter. The GMM Difference is usually a little lower in its estimates, but still close to the top end of the range. Arellano-Bond and OLS are usually around the GMM Dif level. It is probable that the OLS estimates are the product of two factors - a downward bias in estimates because of the OLS estimate of a correctly specified AR1, then a selection of a high parameter among the set of estimates.

The results are robust to the small changes in GMM estimation method between the software packages DPD for Ox and Stata.

Friday, 20 June 2008

Technology imports in Africa

I've been trying to apply the comments in Tuesday's post on technology imports to an African setting. The aim would be to promote commitment to internal technological development and usage of foreign knowledge in all circumstances. I am not sure how to achieve such a goal, but South East Asia and communist countries' experiences suggest that it is possible.

Bias in the misspecified AR model under common estimation techniques

Growth data split by country is commonly used in estimating parameters in growth models. The estimation techniques often assume that the model takes a particular form, so it is interesting to find out what happens when the assumed model is incorrect.

I've been running Monte Carlo tests on a misspecified AR model. The true generating process is lny(t)=a(i)*lny(t-1) + b(i) +N(0,V) for a and b fixed in time but varying by countries with starting incomes separated by 1000 from 1000 upwards. a is tested for various values - by turns, uniform over (0,1) for each country, uniform over (-1,1), evenly spread by country over 0 and 1, and correlated with starting income. b is adjusted either to give a starting growth rate of six percent or to give a growth rate in U(0%,10%). The normal variance V is adjusted to give less than an extra 10% variation in the data most of the time (so V equals 0.047). The misspecified model is lny(t)=a*lny(t-1)+b+c(i) error, with c time invariant and country variant, and a and b both constants.

Repeated Monte Carlo data were generated for different numbers of countries and time periods, and the estimates were made for each panel. The estimates were made by GMM-SYS, Arellano-Bond, and within group OLS. The model looks good according to the statistics commonly used in each estimation technique. The estimates of a are high. For the U(0,1) data, they are near 1 in GMM-SYS, around 0.7-0.8 in A-B, and between 0.8 and 0.85 in within group.

What happens is that countries with low AR parameter approach their maximum income quickly, and their main variation is random fluctuation, so after a while their data looks like it could be generated by a pure AR=1 process with no constant. The high AR data continues to display variation over a longer period. I suspect this is why the pooled data AR estimates are close to one for all the estimation methods.

Since the standard capital + education Solow model tends to display little variation in its key parameters and comparatively low explanatory power, models based on specifications like lny(t)=a*lny(t-1)+d*country education rate+e*country saving rate+c(i) is actually quite similar to the original misspecified model with the two rate terms taking the place of the constant term. High growth countries with much variation in the data will often have high a parameters (being the source of identified growth) and the overall panel data estimate of AR should be near 1.

The standard ARIMA estimates gave reasonable estimates of parameters for each individual country. Checking ARIMA values for AR stability before using panel data is sensible.

Tuesday, 17 June 2008

The Cobb Douglas technology

In growth theory it is often assumed that countries have a common technology describing national output. The technology frequently takes the Cobb Douglas form Y=A.L^alpha.K^(1-alpha), where L is labour, K is capital, and A and alpha are constants.

The Cobb Douglas has some properties which make it reasonable, like if you increase the labour and capital by a certain percentage, the output increases by the same percentage. I've never been convinced that this is its main selling point. Rather, taking logs gives

ln Y = ln A + alpha ln L + (1-alpha) ln K

So it could be painted as the first order approximation to any function expressing ln Y as a function of ln L and ln K, plus a condition on the coefficients. I'd drop the condition (and thus sacrifice the Cobb-Douglas form whilst keeping the multiplicative shape), replacing (1-alpha) by a free constant since there are many circumstances where the coefficients would not sum to unity. Any production function which uses only capital and labour omits other important factors influencing output - such as education or the existence of market networks - and mobilising labour and capital may influence them to such an extent that better than proportionate returns are obtained. When I was looking at the expansion of economies after conflict and attempting to explain their commonly unparalleled rapidity of growth, I considered Y=A.L.K as a possible short term production function.

Technology gaps, transfers, and narrowing

One of my pet theories is that technology is a leading driver of economic growth. At the moment it remains a pet because there is not a vast amount of published evidence on its effect.

I've been running some preliminary studies on the interactions between technology gaps (measured by things like the number of telephones per person in a technological leader minus the figure in a developing country), transfers (measured by things like foreign direct investment per person to a country), and changes in the gaps. One might think that a high gap might lead to increased transfers and quicker narrowing, with slowing over time. That would be a classical interpretation, with an alternative interpretation belonging to endogenous theory where a leader in technology can pull away from followers.

There are issues about who is the leader, and what measures of gap and transfer to use. For example, if there is a really technologically advanced country, but it has few people in it then its contribution to world technology might be less than a large but on average less technologically advanced country. In my preliminary work, I used the US as a leader because it is big and advanced, although the problem is not avoided. Examining correlations between the various variables gave some interesting propositions, notably on the relative performances of Sub-Saharan Africa and South East Asia.

For the overall data (Penn World Table, plus UN sourced data mostly), a large gap relative to the US seems to result in lower transfers and changes. However, the negative correlations are lower for changes than transfers, suggesting technology catch-up may be less hindered by poverty than trade is. This may help to explain why countries which have a reputation for high valuation of knowledge seem to have high growth. Whether it is the transfers which is responsible for the catch-up is open for discussion.

For a restricted subset of the data, a large gap from the US for eg computers or internet tends to be associated with higher changes in SE Asia and Southern and Eastern Europe, but the opposite in SSA. However the large gap seems to result in a decrease in FDI and imports of technological items (like precision engineering tools) almost as strong in SE Asia as in Africa. The correlation is also negative in S and E Europe, but less strong.

All regions have positive correlations between income and imports of dictionaries and encyclopaedias, but the correlation is less strong in SE Asia than in other regions, ie low income does not seem to lower this form of knowledge transfer to the same extent there as elsewhere.

All regions have positive correlations between students sent to the UK and Australia and the dictionary/encyclopaedia trade, but it is less strong in SE Asia than in other regions. Possibly if a country in SE Asia is less engaged with the international community through person exchanges, it does not prevent acceptance of their knowledge to the same extent that it does elsewhere.

Friday, 13 June 2008

Disproportionately useful theories #7: financial symmetries

Most of the disproportionately useful series has looked at theories which have already demonstrated their utility. This time it's prospective: a theory which has made some impact, but whose possibilities have not yet been fully investigated. A declaration of interest must be made - one of my colleagues has taken a lead in developing the analysis, and I have worked on it too.

At the heart of the theory is identification of quantities in a financial market, such as the value of a share or a company, which do not change when a certain event happens. The event is known as a symmetry, because real-world symmetries do not change something when they act on it. Any event which leaves a quantity unchanged qualifies.

There is a piece of mathematics called Noether's theorem which shows that, generally speaking, one can find another separate quantity which does not change over time whenever you have a symmetry for the first quantity. So if you can observe a quantity that is unchanged by an event, you can find another permanent relationship involving a market quantity which may be much less obvious.

The approach has already been applied to find company values, and there is recent work applying it in areas related to financial derivatives. A major potential success would be application to asset pricing theories more widely, and work is ongoing to link the theories.

Financial symmetries crop up everywhere, so investigating them and their related theory can be extraordinarily fruitful.

Spotting panel data problems

The last couple of posts have pointed out difficulties with specifications of growth models using panel data. A common method for estimating parameters in the specifications is the GMM System. GMM System takes panel data terms which are not correlated with each other and then equates expected and actual quantities based on them to calculate parameter values. It is an application of the Generalised Method of Moments discussed in the "disproportionately useful theories" series here at Great Lakes Economics.

GMM Sys has had lots of plaudits because it has good estimation properties when estimating the parameters of a correctly specified model where the parameters are constant across the panel, both in time and country dimensions. Its principal statistical tests are concerned with getting it to work well with the specified model, so indicate whether instruments used are reasonable and whether standard errors should make allowance for changes in data variation over time.

To check whether the usual reported tests could indicate a problem in a misspecified model, I generated thirty periods of data from a model

y at time t = a + b * y at time t-1 + Normal (0,1) error term.

a and b varied across three countries: country 1 had autoregressive parameter of 0.2, country 2 had AR parameter of 0.5, country 3 had parameter of 0.8. GMM System was used to estimate the results (technical note: after trial, the selected instruments were the third lags of y for the first difference equation, and the differenced second lag (y(t-2) - y(t-3)) for the levels equation. First step GMM System was used.).

The method estimated the common lag parameter at an average of around 0.63 after running twenty or so samples, with minor variations between 0.5 and 0.75. Coefficients were significant, the tests on serial correlation (Arellano-Bond AR(1) and AR(2)) had high significance for AR(1) and low significance for AR(2), and the Sargan test on instrument endogeneity had low significance. These are the common reported tests in panel data growth empirics, and the results would be considered good despite the true generating parameters being highly variable across countries.

The problem is that the tests were not designed to test model misspecification. If they cannot handle such well-behaved data, one would not be confident that they can handle real world data.

Common technology and AR(1) instability

In panel data estimation of growth models, data for different countries is collected over a number of time periods. It is different from cross-sectional estimation which uses a single time period, and time series analysis which uses a single country. The estimation methods are different, too.

For analysis of panel data to be useful, there has to be some similarity across countries between the way in which the time series are generated. For example, two countries may each have their national income estimated as

capital in country^a * labour in the country^b

where a and b are parameters specific to each country. In many growth models, it is usual to assume that the parameters are also constant across countries. The assumption is known as the common technology assumption.

The last post argued that the autoregressive parameter in growth estimations is not stable across countries. Economists should be very cautious about specifying models where the autoregressive parameter cannot vary across countries. A common specification does just that however, taking the form

ln income per person = a.ln income last period + other terms assuming common parameters + country specific constant.

a is the same across countries and the country specific constant varies.

Suppose that the constant a is estimated as an average across countries, and that a particular country is found to have a positive specific constant. During the period of measurement, its real a was higher than estimated a, and the specific constant corrects for it. The constant corrects for an average discrepancy in income during the period, so roughly (since the average is over ln income, not income)

specific constant = a(real).lnaverageincome - a(estimated).lnaverageincome
= (a(real) - a(estimated)).lnaverageincome

Now, if country income is lower than its period average, the discrepancy is

(a(real) - a(estimated)).(lnaverageincome - something positive),

so that

specific constant - (a(real) - a(estimated)).(lnaverageincome - something positive)
= (a(real) - a(estimated)).something positive

The specific constant overestimates the income growth at small incomes. Similarly, it underestimates at large incomes. Out-of-sample predictions will usually occur for future time periods where countries are richer, so that the model will underestimate by increasing amounts over time.

The presence of time period specific constants corrects the problem a tiny bit, but the main difficulty is that the model remains misspecified. Time constants are moreover useless for future prediction unless they are subject to their own modelling.

AR(1) growth model parameter stability

One of my earlier posts looked at autoregressive parameter stability in the AR(1) growth model for a few countries. This post investigates the stability for all countries in the Penn World Table dataset over 1950-2000. The model used was augmented Solow under an MRW specification, tested at five year intervals, for each country individually using an AR(1) specification rather than the usual panel estimation:

ln income in year t = a + b.ln income in year t-5 + c.ln savings rate average between t-5 and t + d.ln years of education among population aged 15 or older at t + e.ln (population growth in previous five years + 0.05).

The 0.05 is an adjustment for technology growth and depreciation. a,b,c,d,e are constants for estimation. Estimation was by maximum likelihood.

99 countries have sufficient data to allow estimation of the autoregressive parameter, b. The mean was -0.28, and the standard deviation was 0.55. The parameter is not stable across countries, nor does it seem to exhibit any clear distribution. Here is a histogram of the parameters:

Tuesday, 10 June 2008

Women's education and growth

One of the spin-offs from my research on technology's effect on economic growth is a computer program for generating estimates of the effect of almost any variable on growth, using multiple models, and quickly. Anyone want to know how much having a good football team is linked to economic growth?

Not many people, but the link between discrimination against women in education and economic growth is of interest, what with there being a lot of women and their having a large effect on economies. I took women's and men's literacy rates by country from the UN statistical database, and calculated five year averages since 1980. The base model for adding the literacy variable was Solow using Penn World Table data, and the alternative was education augmented Solow with population and technological growth dropped as variables to avoid over identification. Data limitations and a wish to avoid complication meant that a within group OLS was used with robust standard errors.

Increasing equality in literacy was associated with increased growth in both models, though as usual with regressions causality cannot be established. The variable was 1% significant in the first model, though not even 10% significant in the second. The other variables did what was expected of them, with the coefficient magnitudes anticipated on the lag terms.

Not much estimation here, but the couple of results could form the basis for a more substantial work. The question of whether women's equality is good for capitalist growth, and in what forms, is or used to be discussed in feminist works looking at the merits of the liberal and left-wing strands of the theory.

Correlations between technological measures

My research on technological transfers between countries is throwing up some predictable results and some more interesting ones. At the moment, it is looking at correlations between various quantities measuring technology across countries. It is looking at levels of technology, flows of technology, and changes in technological levels.

One may expect that low levels of technology might lead to large changes in those levels. The changes would be mediated by the flows of technological knowledge carried by students, investment, and trade. As these flows require finance and integration in the world economy, the large changes might occur among those countries which do not have the very lowest levels of technology, as they could be too disadvantaged to compete well in the global economy.

On the obvious front, the preliminary results find that different per-capita measures of technological level correlate positively and highly, as do the per-capita measures of trade and investment associated with technological knowledge transfers (such as trade in scientific instruments and foreign direct investment). Student movements correlate less well with them, but where the correlations are large, they are positive.

Among the less obvious results is the finding that large per-capita technology gaps do not correlate highly with the corresponding transfers. I think this may point to support for the earlier hypothesis on very disadvantaged countries not being able to catch-up quickly with advantaged countries. There are other explanations, which more detailed analyses will hopefully dig out.

Friday, 6 June 2008

Aid, inefficiency, and transparency

International development aid is criticised for its inefficiency. It is argued that aid is often stolen or wasted. It is suggested that business is better for reducing poverty than aid.

The critics have a point. Unlike most weak businesses, weak aid distributors do not have to go bankrupt. They have to fight for comparatively small amounts of money with much competition, but if they can persuade people to keep giving them money that is what matters for their survival.

The aid equivalent of market pressure for efficiency can only come through oversight by donors. One of my academic colleagues advocated that increased governmental aid should be made conditional on recipient governments opening their accounts for auditing. It's a good idea.

Bias in MLE estimates of AR(1)

I have been running some Monte Carlo simulations to see the bias produced by estimating an AR(1) model (x(t)=a+p.x(t-1)+N(0,1), a and p constants) using MLE. The MLE estimates coincide with the OLS estimates, so there is a small sample bias in the estimate for p, and the bias is downwards near p=1. The proof follows quickly from the OLS formula.

In the simulations, there is clear downwards bias for short time periods near p=1, even with large numbers of simulations. For large time periods (1000 or so for quite small differences), the theoretically predicted convergence was observed. Short time periods are not that short - the simulations at 50 time periods gave a clearly biased p parameter. And everything was very sensitive to the starting values of x(0) and a, with higher values tending to increase the estimate of p. For a hundred simulations, 30 periods, a=4, parameter of 0.7, and seed of 8, the mean MLE estimate from all of the simulations was 0.83, with a range of (0.42, 0.95). With the same parameters except a=3, the mean was 0.62 with range (0.19, 0.93). These are the sort of parameter values which occur in economic growth models, so the biases are relevant for reported estimates.

Monday, 2 June 2008

Economic software 2.0

Web 2.0 is discussed by lots of people lots of the time (I'm not an expert here as you can see but please bear with me until the topic meanders on to economic software which it does in just over a sentence's time so really it wasn't necessary to put in this solicitation but who knows it may bring a Tristam Shandy-esque quality to the blog) and is about users generating and interacting with content. Some people think that it could be important for e-commerce.

I'm a fan of interactivity, of course. This whole site is interactive. In economic software, the interactivity is enhanced big-time by the presence of an embedded language controlling the software and which the user is intended to access. An example: software which lets you find estimates from some data, say by putting the data in a spreadsheet and pressing an "estimate" button. The software also has a language which lets the user write a program like:

1. Estimate
2. Store the results
3. Get rid of the most recent year's data
4. Loop over 1-3 until all the data has gone

The utility of the programming language is far higher than the sum of the parts, because the parts are usually too numerous to sum together and often programs can do unexpected things unlike any rigid provided option. Hence the name Economic software 2.0; similar interactivity to Web 2.0, similar improvements in user experience.

The home countries of UK foreign students

I've been looking at the home countries of the UK's foreign students. There is some data from 2001 and earlier at

Some of the 2000-1 figures are perhaps surprising. Greece for example sent most students to the UK by a long way (31,000), followed by Ireland (14,000) and China (12,000).

All of the top twenty countries are either developed or rapidly growing developing states. The first African country is Nigeria (3,000) in 21st position. Possibly countries get rich then send students to the UK with their new wealth, but there is also the reverse causality, whereby countries which are outward looking and send students abroad benefit from the added knowledge they have. The first causality direction does not exclude the second, which raises the possibility that the feedback gives rise to endogenous growth.

Granger causality tests examine which of two time series tend to lead the other, so this question looks like it should be answerable. However, the data availability of country of origin is limited (back to the mid 1990s, and the recent data is charged) so one would have to assume that the observed cross-sectional data reveals a persistent characteristic of national outward-lookingness, or make some other assumption capable of moving from cross-section to time series, or just use a cross-sectional regression.