Editor’s Note: Stimulated by Deary, Brett, Anthony, and Lawn’s (2025) paper in this issue, the editors invited Gavan Tredoux to comment on the historically relevant dispute between Thomson and Spearman regarding g.
Godfrey Thomson (1881–1955) thought, on looking back at his long life, that he “learned a great deal from Charles Spearman” (1863–1945), “but only by crossing swords with him, not as a pupil” (Thomson, 1952). The published record shows that this disagreement—sometimes described as a “long debate” about the claims of factor analysis, e.g. (Bartholomew et al., 2009)—was rather one-sided. It was not so much a debate as a fusillade directed from Thomson and his supporters in Spearman’s general direction. It spanned a great many papers, several addresses and (multiple) chapters of (multiple editions) of Thomson’s books, consisting of reformulations and reiterations of what he thought was a decisive argument, his “sampling theory”.
The theory starts with dice experiments, aiming to show that overlapping group factors can induce a hierarchy of correlations (Thomson, 1916). It reaches maturity, stripped of correlated group factors, featuring “sampled bonds” with overlapping “richnesses”, in Thomson (1950a, 1951). (For a more comprehensive idea of Thomson’s prodigious output about the “sampling theory”, see the references, by no means complete.) Spearman seems to have returned fire in print far less frequently: a little irately in a short note to Thomson’s first paper about it (Spearman, 1916); in a general summary of factor theories (Spearman, 1920), supplemented by a couple of pages in his major work The Abilities of Man (Spearman, 1927b, p. pp. 96–97); and an exasperated rejoinder that same year (Spearman, 1927a) to “yet another paper” by Thomson. There were other mentions along the way but Spearman seldom made Thomson the center of attention in any of his output.
Thomson claimed to have shown through the evolving iterations of his sampling theory that “the whole underlying complex of causes of what are called ‘factors’ is an inextricable tangle”, and that Spearman’s general factor of intelligence was merely “a mathematical, not a psychological phenomenon”. He cautioned that “it is very dangerous to base psychological hypotheses on it”. (Perhaps this is not baldly calling Spearman’s g theory false, as Bartholomew, Allerhand, & Deary (2013) propose, but it is most of the way there.) Chance could explain it all. “Hierarchical order is the natural order to be found among any set of correlation coefficients, however the correlations may be caused, provided only that the bonds which cause the correlations are left, as we say, to chance: that is, provided the phenomenon is of that complex nature which, in default of analysis, we call random” (Thomson, 1920).
It is true that initially Thomson gave the impression that he was just constructing a feasible counterexample to g, to deflate the importance of the observed hierarchy of correlations. But later he made it clear that he believed his own theory a more likely explanation, drawing broad conclusions from it which strongly suggest that they are prior beliefs: “when we attack some task, some test, our ability to solve it depends upon a large number of things — genes we have inherited, pieces of information we have acquired, skills we have practiced, little habits of thought we have formed, all and sundry influences from past and present — then the correlation coefficients between performances in tests will show exactly the same relationships with one another as they would have done had our ability depended on our possession of a small number of common ‘factors’ (plus specifics)” (Thomson, 1952, p. 283). He was also drawn to variations of the Argument from Complexity that has become so familiar in modern times: “the mind is very much more complex, and also very much more an integrated whole, than any naive interpretation of any one mathematical analysis might lead a reader to suppose. Far from being divided up into ‘unitary factors,’ the mind is a rich, comparatively undifferentiated complex of innumerable influences—on the physiological side an intricate network of possibilities of intercommunication” (Thomson, 1950a, p. 303). He does not address the predictive power of in the face of this “complexity”.
Supporters of both sides have emerged over the years. From Spearman’s point of view, J. C. M. Garnett (1920) argued at length that you can always rewrite a factor theory in terms of a gaggle of other variables if you want to, as a transformation of coordinates, but only if you want to. Extensions and simplifications of Thomson’s ideas were proposed by (Bartlett, 1953; Mackie, 1928; Maxwell, 1972). More recently the theory has been exhumed for further examination. Eysenck (1987) promotes it to the “Binet-Thomson-Guilford” theory and finds it unlikely given reaction-time and other data from elementary cognitive tasks. Jensen (1998) considers it too vague to be directly falsifiable. It has also been speculated that it may be extended to become a viable explanation of how the mind actually works, e.g. (Bartholomew et al., 2009, 2013; Deary et al., 2016), the opposite of Eysenck and Jensen’s earlier conclusions. Thomson (1952) himself claimed that “only a handful of people have understood it”. If so, it was not for lack of exposition and adumbration on his part. To understand Spearman’s reaction to it, we need a closer look at the details.
Since Thomson did not reject factor analysis as a descriptive method—as used for example by Cyril Burt (1940)—but was dubious only about causal claims that might be based on it, its language will be used freely here. What we observe and seek to explain are test item score outcomes and their hierarchical correlation pattern, as first identified by Charles Spearman (1904a; Spearman & Krueger, 1906). Following one of Thomson’s own simple illustrations, suppose that we have just 4 test items and N hypothetical “bonds” in the mind, where “bond” is left as an undefined cause, for which a physical interpretation was invited by Thomson but never committed to. Each “bond” is hypothesized to be able in principle to contribute to the points obtained on any test item. As such, any is immediately some piece of general ability, or something yielding it (though Thomson seems to have resisted this straightforward conclusion). A “bond” may be thought of as situated in a physical entity but it is really just a label that a point contribution can be attributed to. The “bonds” are often treated as identical, though Thomson and his supporters are not consistent about this, sometimes implying that distinguishing them is useful.
The sum of contributions by “bonds” will vary by test item depending on the “richness” of the item, which Thomson helpfully states is analogous to its g-loading. However there is additional indirection. If for any
we ask, “will this contribute to the richness of test item ”, we must assign it a uniform probability of doing so, where is the expected number of bonds representing the item “richness”. This induces a Binomial distribution for the “richness bonds”. Although this is supposed to be a reflection of how “complex” the test item is, that complexity is never defined separately and spoken of only as “how much of the mind the item requires or induces”. Effort becomes a proxy for complexity. Though Thomson appears to speak of these “bonds” as exhausting the mind, that commitment is certainly not necessary, and we need only think of the set of “bonds” as some special reservoir within it containing identical components of general ability.It follows immediately, if these assumptions are true, that the scores obtained on two test items will correlate corresponding to the overlap between their “richness”. For factor analysis that is expressed as a fraction of a common general ability in each item. For Thomson it is shared “bonds” (unit general abilities) in each item, where the chances of a particular Jensen, 1980, ch. 6, pp. 252-6) for minutely detailed explanations, carefully distinguishing this variance shared by the variables from the variance in one variable predicted or explained by its correlate, which is the square of their correlation).
being included for multiple test items are assumed to be independent. Recall that the Pearson correlation between two variables is exactly equivalent to the proportion of overall variance that they share in standardized form (see (Thomson speaks of a test item inducing the use of a set of bonds as if this is an active process of random “sampling” by the mind. For test item
we may write that as where we do not seek to explain the error The expected number of bonds in the sum is and the expected proportion is the g-loading in factor analytic terms. Given the Binomial distribution, there would be many possible sets of these bonds with sizes centered around A great deal of energy is expended pursuing the myriad correlations, in terms of overlap, between an arbitrary set for one test item and an arbitrary set for another test item.If
are the supposed fractions of the bonds of the mind for each based on their respective “richnesses” then the expected overlap between and is and the expected correlation is,\[r_{kj} = \frac{p_{k}p_{j}N}{\sqrt{p_{k}N}\sqrt{p_{j}N}} = \sqrt{p_{k}}\sqrt{p_{j}}\]
Here the first harmonic mean is best thought of as the area of the rectangle containing the combined total. The other pairs have the same form. Note that this correlation is just the product of the roots of the g-loadings familiar from factor analysis,
. The expectation has collapsed the blurring introduced by the “richnesses”. Recall that is the amount of variance in explained byThe “richness” does not comprehend any hard threshold effects. Luck can lasso up to all the bonds or down to none at all—those extreme probabilities are nearly zero, but repeating this process enough times will eventually yield examples. Each repetition may occur under exactly the same conditions and (somehow) the test item will yield more, or less, points to identical ministrations. This is not a decision to eschew modeling other sources of variation but is supposed to be an intrinsic feature of the “bond” mechanism itself. This curious idea is revisited below.
Spearman made much use of (now archaic) “tetrads” to test goodness of fit for his favored general-factor-plus-specifics model and Thomson paid minute attention to the ramifications of that in his “sampling” model. With a single general factor, the columns of correlations are multiples of each other (ignoring the diagonal elements) which must be approximately true in the observed data. As a result, we expect the observed “tetrad” expressions, which are minor determinants, again excluding the diagonal, to be approximately zero, which Spearman used as a test for the presence of the general factor by calculating it all over his matrix of observed correlations. Given the above assumptions the expected correlations are as stated below
\[\begin{matrix} & 3 & 4 \\ 1 & \left( p_{1}p_{3} \right)^{\frac{1}{2}} & \left( p_{1}p_{4} \right)^{\frac{1}{2}} \\ 2 & \left( p_{2}p_{3} \right)^{\frac{1}{2}} & \left( p_{2}p_{4} \right)^{\frac{1}{2}} \\ & & \end{matrix}\]
and the tetrad is expected to be zero.
\[\left( p_{1}p_{3}p_{2}p_{4} \right)^{\frac{1}{2}} - \left( p_{1}p_{4}p_{2}p_{3} \right)^{\frac{1}{2}} = 0\]
which one can see is just the divisibility idea
\[\left( p_{1}p_{3} \right)^{\frac{1}{2}}/\left( p_{1}p_{4} \right)^{\frac{1}{2}} = \left( p_{2}p_{3} \right)^{\frac{1}{2}}/\left( p_{2}p_{4} \right)^{\frac{1}{2}}.\]
It might appear from this that the general factor and its hierarchical ordering are bound to result regardless of the actual values of the proportions, but the reader should keep the assumptions in clear view at all times and not be misled into thinking that
are arbitrary and can be altered at will merely by “sampling” more bonds from the set of N for each item. Those proportions are constrained by the “richnesses”, which inhere in the test items. Thomson imports them casually by saying that the tests elicit the “sample size”, but schooled as they may be in the arts of passive resistance to attempts at their solution, the test items surely cannot organize the “bonds”. The mind must do that.In practice there cannot be a one-shot “sample” of the “bonds” with the right “sampling probability” unless we posit some oracle to reveal what that probability is. It would have to be the product of an active encounter between mind and test item over some time period, prone to error in terms of operational effort. In that effort, all bonds must somehow be invoked and some succeed according to a probability not visible to the mind in advance. But then the fact that a bond succeeds, contributing to the total of points obtained, is a statement after-the-fact about observed success (see below). It would be absurd to claim that additional “bonds” could be enrolled at will to drive scores up on items with low “richness”, so this must be a property of the items themselves. Mere fecklessness cannot explain why Spearman’s pupils demonstrated lower g-loadings on their liminal pitch discrimination scores than they did on classics (Spearman, 1904a). Moreover, the success we observe, since we have the truth, is not directly visible to the subjects, who are necessarily inaccurate in their own assessments as they are examined. The very use of the word “sampling” here is a hindrance. Most practitioners will associate “sample sizes” with an arbitrary limit imposed by time, money or the opportunity afforded by something that fell off the back of a truck.
Thomson concluded from the elementary derivations above that “the laws of probability alone will cause a tendency to zero tetrad-differences among correlation coefficients” (Thomson, 1950a). Yet those laws, which are very definite, are certainly not sufficient “alone” to induce zero tetrad differences and therefore a general factor. We need the assumption that there are “bonds” that will yield returns for all problems, but have different “richnesses” for different items, even if that is a distribution. It is the existence of these “bonds” as unit general abilities that is key. They don’t have to exist. They are a conjectured reservoir of general intelligence sliced and diced into pieces. Without that property there would be no correlation merely because “bonds” are “shared”, since what is shared would do nothing. The zero-tetrad difference above which must yield a general factor (C. Spearman, 1927, p. 73-5) involves the expected correlations only. None of those expectations depend on chance, they are determined by the so-called “richness” of the test items, as the formula derived shows. Chance has been integrated out. Nor do we say that the “laws of arithmetic” determine that a person has ten fingers, not because it is not true, given two hands with five fingers on each, but because it is not helpful.
The concomitant claim that factor structure emerges automatically from an “unstructured” melange is not accurate either. The test items have structure imposed by the “richnesses” that Thomson has introduced by assumption as blurred g-loadings. The factor structure is an immediate consequence in the form of the expected correlations, which collapse the distributions Thomson has conjectured. This is not just a hypothetical example of how the correlations and resulting factor structure might be “random”. Thomson positively declared his faith in his “sampling” model, early and often, and expended a great deal of energy promoting it. “My position is that hierarchical order is the natural order among correlation coefficients, that it only expresses the well-known fact that correlation coefficients are themselves correlated, and that the degree of perfection of hierarchical order found among psychological correlation coefficients is merely that which occurs by chance, and not, as Professor Spearman has been led to believe, extraordinarily high” (Brown & Thomson, 1925, p. vi). Many more examples of these positive assertions could be given.
At root, Thomson’s “richness” interpretation of g-loading is no more than the commonplace that hard things require more brains to solve. However this understands complexity purely in terms of immediate effort. Complex problems frustrate attempts to solve them using naive tactics like “enrol more bonds”. Where that tactic works, the problem may be solved in constant time, assuming enough “bonds” are available, since they are assumed to be independent and can do their work in parallel. From the point of view of time, such an item is simple rather than complex. The need for multiple steps with sub-task dependencies—a graph of work to do—is a hallmark for complexity. That requires orchestration, which must itself form part of an operational explanation of general intelligence.
It is difficult to understand the industry Thomson expended on this theory over so many years. It extended to showing the ramifications of a 12-“bond” causal model of the mind yielding 1,078,110 correlations. “They have, nevertheless, all been calculated” he noted grimly. By hand. One hopes his wife and students helped. Yet many psychologists and even some factor analysts, were disturbed by the operational lustre of the “sampling theory”, and lured into speculating about the effects of “sampling” this or that overlapping physical substrate of the brain. Consider Red Devon’s preeminent factor analyst and Spearman’s own pupil, Raymond B. Cattell. “I first remember experiencing something near panic over the basic concepts when Thomson showed that Spearman’s results could be accounted for by a model of random factors. This passed (partly, I may say, in my case, through the reassuring findings with Dickman (1962), Jaspers (1967), and Gorsuch (1963) on plasmodes)” (Cattell, 1937, 1974) (see also (Cattell & Dickman, 1962; Cattell & Gorsuch, 1963; Cattell & Jaspers, 1967), where a plasmode is “a set of numerical values fitting a mathematico-theoretical model”). This “panic” may have been due to the loose use of statistical language in talking about the model, especially the word “sampling” and the subtle way that constraints on that are imported via the “item richnesses”, stated as bond inclusion probabilities. Inducing correlations from overlapping richness has impressed many, even Jensen (1980), but we have already seen that properly interpreted this is just a version of the well-known relationship
It is worth considering in more detail the idea, expressed for example by Maxwell (1972), that the “sampling” theory might correspond to how the mind actually works, in a physical sense. The notion that the “bonds” model represents what is actually done by the mind operationally is a peculiar goal for a statistical model in the first place. By their nature statistical models work best when we want a way to approximate or predict the results we observe. This leads to schemes in which the model features are a decision not to care about what is not essential for outcomes but must happen in some unknown way. As John Maynard Smith once observed, just because you can use differential equations to accurately model a person catching a ball does not commit you to supposing that mind really does solve differential equations or use Laplace transforms reflexively. Thomson himself repeatedly disavowed any formal commitment whatsoever to anything that specific.
“The ‘bonds’ spoken of may be identified by different readers with different entities. All a ‘bond’ means, is some very simple aspect of the causal background. Some of them may be inherited, some may be due to education. There is no implication that the combined action of a number of them is the mere sum of their separate actions. … In this mathematical treatment, bonds have been spoken of as though they were separate atoms of the mind, and, moreover, were all equally important. It is probably quite unnecessary to make the former assumption, which may or may not agree with the actual facts of the mind, or of the brain. Suitable mathematical treatment could probably be devised to examine the case where the causal background is, as it were, a continuum, different proportions of it forming tests of different degrees of richness. And as for the second assumption, it is in all likelihood merely formal. Let the continuum be divided into parts of equal importance, and then the number of these increased and their extent reduced, keeping their importance equal” (Thomson, 1950a, pp. 307–312).
If a bond really is (the result of) some component of the brain, disavowals aside, nothing prevents that component from being reused in serial order, producing a set of “bonds” bounded only by time, to deliver the entire set of realized gains on a test item. A “bond” could even be the same whole brain used in serial order, yielding one unit in each go until it runs out of opportunity or time. Or it could be a quantum of activity over a range of energy inputs, even over a multivariate combination of the above parameters, integrated piecewise rather than discretely summed, and so on. Anything goes in the “causal background”.
If an operational understanding is to be based on this, we must go beyond a too-convenient after-the-fact probability of a “bond” contributing to a score to ask how it is, exactly, that the mind would conduct such an exercise. What would ensure that all the bonds do not set futilely to work on the same part of the problem? Lassoing all of them to see which ones work—on which parts?—is a wasteful strategy if only a few are really needed. If the mind can solve a problem with a
proportion of parts and doubles its bond count overnight through genetic engineering, are we to suppose that it would now solve exactly the same problem using a much smaller per bond? How a task would be decomposed into bond-sized parts and the part-solutions recomposed, a synchronization step, must be explained if we are to have anything like a theory of how the mind actually works.Fixing on definite mechanisms for answering all these questions would show that the “bonds” must actually be dependent, not independent, since their contribution to a score is constrained by demands of the task in a way that “richness”, with its “all must have prizes” probability, only disguises (with the added side-effect of mysterious variation between invocations in exactly the same circumstances on exactly the same problem within exactly the same mind using exactly the same pool of bonds). Putting that aside, the supposed statistical independence of the bonds also implies that parallelism is always available—it would surely be very obliging of the universe of arbitrarily complex mental test items to organize itself in this fashion. As it stands the model even implies that total (Haier, 2023). Diminishing returns are inescapable. Information grows not as the sample size but rather as (Stigler, 1986).
would scale linearly with the size of the “bond” reservoir. Yet there is only middling correlation between brain size and intelligence, about 0.4 among adultsIt was never Thomson’s intent to provide any sort of definite operational physical description of “how the mind works”, so he did not concern himself overly with the obvious difficulties above. He needed a discrete mechanism as a simple illustration of his theories about “chance”, requiring only arithmetic. The details he assumed were for expediency and not important in themselves, which is why he would not commit to them. Yet he repeatedly invited some such interpretation, even as he protested about hypostatizing factors.
Spearman’s reactions to Thomson showed a progressive exasperation at (what he perceived to be) the loose use of language to describe unremarkable results which did not require dice. The back and forth between them and their supporters produced a lot of inconclusive diversions, which seem in retrospect to present the vaguest of metaphors in mathematical fancy dress, and we need not concern ourselves with here—the curious can consult Dodd (1928a, 1928b) for a largely neutral blow-by-blow account up to late 1927. By that year things had come to a head and fairly represents Spearman’s side of the debate.
“Yet another paper has come from Professor Thomson based on Weldon’s device of obtaining correlations by means of dice. Again he tries out the effect of arranging these dice according to chance. And once more he devotes himself to attacking the school of research to which I belong. But although not insensible to his compliment in so often noticing our work, still I would entreat him to pause a while and consider a little more whether he has rightly understood it. He complains himself that he is sick of polemics (though without letting this deter him from going on with them); and we for our part, too, are none too happy at being as we think time after time misrepresented” (C. Spearman (1927), a reply to Thomson (1927)).
That Spearman only paid intermittent attention to the “bond sampling” model is seen from his argument that it precludes individual differences (C. Spearman, 1927, pp. 96-7). Thomson was quick to deny this by supposing that there would be differences between individuals in the central location and even variance of their personal bond distributions, or somehow in the “quality” of their bonds. But here he only extended to general hand-waving (Thomson, 1950a, pp. 53–54).
Dice and distributions aside, the real difference between Spearman and Thomson appears to lie at the philosophical level. It is no surprise to discover that the latter was attracted to Karl Pearson, or that Pearson tried to recruit him in 1919, though without success (Thomson, 1952, 1969). Pearson’s rigid scientific positivism, with its hostility to “ghostly” entities and (pretensions to) causal explanations, catalyzed by his belated enthusiasm for correlation, might have informed Thomson’s world-view, though it is not easy to square the later emergence of “bond sampling” with all this (Pearson, 1900). Pearson was after all an old and very bitter foe of Spearman, locking horns with him as early as 1905 over the correction of attenuation by measurement error. The pugnacious mathematician insisted on drowning it inefficiently through gargantuan samples without learning from the instrument itself, and roundly declared that those who disagreed simply lacked mathematical nous (Tredoux, 2023a). Cyril Burt’s pupil Charlotte Banks recalled legends about awkward, possibly apocryphal, moments at University College London. “The building which housed Statistics and the Galton laboratory was roughly diagonally across the quadrangle from the entrance to the psychology department. The story was that if, by accident, Pearson and Spearman emerged at the same time, each, on seeing the other, would hastily retreat and wait to come out only when the coast was clear” (Banks, 1983).
Thomson was even inclined to view statistics as only dubiously real. In the fourth edition of the Factorial Analysis of Human Ability (1950a) he reiterated this when expressing optimism that opposing positions might be reconciled. “The advance of the science of factorial analysis of the mind to its present position has not taken place without opposition … objections which have been frequently raised by the present writer … and which indeed he still holds to, although there has been of late years a considerable change of emphasis in the interpretations placed upon factors by the factorists themselves, which have tended to remove his objections. Briefly, the opposition between the two points of view would disappear if factors were admitted to be only statistical coefficients, possibly without any more ‘reality’ than an average, or an index of the cost of living, or a standard deviation, or a correlation coefficient though, on the other hand, it may be admitted that some of them, Spearman’s for example, may come to have a very real existence in the sense of being both useful and influential in the lives of men” (Thomson, 1950a, pp. 303–4). This optimism—which echoes his concession in his Royal Society obituary of Spearman that “Probably there is a general factor of intelligence” (but likely will prove to be non-physical in nature)—was dashed just a year later when the fifth and final edition silently dropped these and other concessions (Thomson, 1947, 1951).
If in 1950 Thomson detected concessions by “factorists” it is hard to see from the published record how Spearman contributed to them. He was strongly inclined to view observations as valuable but ephemeral, thoroughly contaminated at the individual level by measurement error which only statistical treatment can control. This was reinforced by RA Fisher when he showed that with modest conditions an average is “sufficient” on its own to represent all the information in the sample (Casella & Berger, 2002; Stigler, 2016). Spearman looked from the very start to reach behind observations to latent factors. His invention of factor analysis itself was prompted by an attempt to correct for attenuation by measurement error, broadly conceived. Spearman always thought of “specific factors” in a test item as a non-stochastic error term (Spearman, 1904b, 1904a; Tredoux, 2023b).
Thomson was sensitive to this line of argument—that his model was just a reformulation of the same ideas, “compatible” with the test score observations because, so far as it can be definitely understood, it is the same model—bottom-up rather than top-down, involving commitments, however hedged, which Spearman preferred not to make. In his address to the British Association of 1949, Thomson expressed some irritation at that point of view. “A common form of misunderstanding is to suppose that all it means is that ‘g’ is not simple but complex or composite. It means a great deal more than that; it means that the whole underlying complex of causes of what are called ‘factors’ is an inextricable tangle” (Thomson, 1949, 1950b). It certainly meant a great deal to Thomson. It is harder to explain how much it has meant to others, and why pre-emptively reifying multiple latent “bonds” bottom-up, a gaggle of little gs, is to be preferred over reifying a single factor top down. The offence in reification is obscure (Carroll, 1993, p. 22), but if it be an offence, repetition must surely count as an aggravating circumstance.
Above all, the “bond” model leaves the central mystery of g untouched by just assuming it exists piecewise. Keeping the mind constant we observe different loadings over the test items, which must inhere, as noted before, in the items themselves. Instead of time and space complexity per se we can only observe greater or lower resistance to
a lack of purchase not exhausted by complexity. With the correct wherewithal, it appears that problems which seem different on the surface can be transformed into the same problem, which is exactly what we observe with machine intelligence, through embeddings and transforms between spaces. Hence Spearman’s “indifference of the indicator”. This is a matter of degree, depending on both the problems themselves and the wherewithal brought to bear on those transformations. In between helplessness and mastery there are many Babylonian waypoints involving crutches, empirical approximations, lookups, imitation and increasingly sophisticated heuristics. This universality of mental tasks—the fact that Raven’s Progressive Matrices and Vocabulary items are both heavily g-loaded—must explain how it is that we can do algebraic topology even though the capacity to do so was certainly not evolved for that purpose.