From Quasi-Replication to Generalization
Making "Basis Variables" Visible (co-authored with Dan Levinthal, and with comment from Phanish Puranam)
(Note: this post is reprinted from the Mack Institute Collective Impact Virtual Salon site, along with a follow-on post from Phanish Puranam, to allow for comments and subscriptions. Since the March 2022 original posting, we’ve revised our paper and you can find the new version here.)
A persistent challenge in social science research is understanding whether and when empirical results generalize beyond a specific study’s sample or context. In 2016, Rich Bettis, Connie Helfat and Myles Shaver produced a special issue of Strategic Management Journal containing several “quasi-replications” which examined whether and when results derived from particular industry, temporal, or geographic categories apply in adjacent research settings. In this issue, Lori’s paper with Ram Ranganathan and Anindya Ghosh explored how taken-for-granted determinants of alliance formations might vary across industries and timeframes. Some typical predictors – such as product-market similarity between two potential alliance partners – generated robust positive results across the industries and timeframes analyzed. Yet others – such as previous alliances between a pair of firms – yielded dramatically different results in the chemicals industry: Previous alliances were positively associated with future alliances in the 1980s, but negatively associated with them in the 1990s.
As Bettis et al (2016) note, any inability to quasi-replicate results need not invalidate the original study, but rather, offers an opportunity to theorize about conditions (read underlying variables) that may characterize the two distinct contexts. Might it be increasing technological maturity, or increasing industry consolidation, over the chemicals industry from the 1980s to the 1990s that shaped the opportunities and motivations to form new alliances with prior partners? This is an empirical question that is becoming increasingly easy to study across wider ranges of industries and timeframes as our ability to create and manipulate expansive datasets continues to grow.
Variables that connote concepts like industry maturity or industry concentration can help us to see connections across various industry-timeframe combinations, such that we might find alliance formation dynamics in the semiconductor industry in the 2000s to resemble those of the chemicals industry in the 1990s. Borrowing a concept from linear algebra, we call these underlying variables “basis variables”, as they allow us to transform typical categorical orderings along industries (such as SIC codes) or time (such as decades), allowing us to abstract from specific industry-timeframe contexts to more general constructs. In other words, by transforming our usual categorical representations of research settings, basis variables can promote commensurability, where seemingly distinct settings become comparable, enabling middle-range theorizing as theoretical contingencies are revealed. While there are many measures that researchers might identify as candidates for basis variables, we posit that these variables may group into three key constructs: uncertainty of the context, interdependence in the context, and distribution of attributes within the context.
We’ve developed these ideas in the working paper “Commensurability and collective impact in strategic management research: When non-replicability is a feature, not a bug.” How might our field make more progress if we coalesced around some basis variables that illuminated when and where research findings were more likely to generalize?
Response from Phanish Puranam (August 2022):
Lori and Dan’s post “From Quasi-Replication to Generalization: Making ‘Basis Variables’ Visible” gives us a nice way to think about generalization in terms of “basis variables”. I’d like to extend their thought with a complementary way of thinking about generalization using machine learning (ML) techniques. Generalization can be thought of as the special case of replication of new contexts and so, it’s useful to first consider why results don’t replicate. If we set aside the ambiguity that results from operationalization and methodological (in)competence, there are two important reasons: sampling error and omitted variables, and the combination of the two.
Sampling error is often why other samples from the same context fail to replicate the result, raising suspicions that the original result may have been overfitted. This is where I think the discussion on the replication crisis in social psychology and medicine is centred today. Machine learning techniques solve the sampling error problem using regularization and cross validation. For more on this topic please see my paper on algorithmic induction with He, Sreshtha and Von Krogh, OS, 2020.
Another important part of the discussion on the replication crisis in social psychology and medicine needs to include omitted variables that moderate the key relationships, and whose values vary across contexts, They also explain why results may not hold in other contexts. The idea of context dependence, and limits to generalizability, is formally equivalent to the problem of unobserved moderators that vary by context (see for instance Bareinbom and Pearl, 2016). Dan and Lori recommend finding, explicitly measuring and theorizing about these moderators, which they term basis variables. In meta-analyses, this would be equivalent to finding and coding study level moderators (Hunter and Schmidt, 2003). That’s an unarguably good move.
In addition, we can extend it by thinking of entire clusters of basis variables. What if the structure of inter-relationships among an entire set of variables resembles each other in contexts A and B? For instance, think of the functional forms for gravitational pull as in Newton’s law, and electrostatic attraction as in Coulomb’s. They are very similar despite being in very different contexts. A more prosaic example is pay per use business models which surface in what superficially appear to be very different contexts, so that many relationships between strategy variables might generalize across these contexts. Prothit Sen at ISB and I have been working on a project to use ML to discover such “structural relatedness” between industrial contexts that no human analyst might ever stumble on unaided. We’ll keep you posted!
One further comment from Phanish (January 2023):
A new BBS paper that is very relevant to these themes: https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/beyond-playing-20-questions-with-nature-integrative-experiment-design-in-the-social-and-behavioral-sciences/7E0D34D5AE2EFB9C0902414C23E0C292