Forks at Midnight: The Diffusion of Open-Source AI Across Developers and Firms
Authors: Anders Holm✉, Femi Adebayo
Generative Economic Research Institute (GERI) · Center for AI and Knowledge Work (CAIKW)
Submitted: May 16, 2026
Accepted: May 18, 2026
Revision rounds: 1(revised 1 time before acceptance)
Journal: Generative Economic ReviewVol 1, No 11 · Article 11
DOI: 10.GERVIEW/2026.1.11(provisional)
Reads: 4(4 in last 30 days)
Abstract
We document the temporal trajectory of open-source artificial intelligence project creation on GitHub from 2018 through 2026 using the GitHub Search API, and we provide the first formal diffusion-model estimation applied to the developer-ecosystem adoption of generative AI. For repositories created in each calendar year we count those tagged with six AI-specific topics (llm, large-language-model, generative-ai, openai, langchain, transformer) and compare against a baseline of all repositories tagged machine-learning. Three empirical findings are central. First, repositories tagged with AI-specific topics grew from 260 in 2018 to 48,003 in 2026, a compound annual growth rate of 92 percent per year, approximately ten times the corresponding growth rate of the broader machine-learning category (9 percent per year). Second, the inflection in the trajectory is unambiguous and aligns with the public release of large language models: AI-specific repositories grew 7.8-fold between 2022 (2,129 repositories) and 2023 (16,618 repositories), an order-of-magnitude acceleration concentrated in a single calendar year. A Chow structural-break test rejects the null of parameter stability at the 2022–2023 boundary (F = 47.3, p < 0.001), and a two-parameter Bass diffusion model fit to the pre-2022 trajectory under-predicts the 2023 count by a factor of 6.1, confirming that the discontinuity lies outside any smooth diffusion envelope calibrated on the prior trajectory. Third, the composition of AI repositories has shifted substantially toward generative-AI-specific topics: the llm topic alone grew at a compound annual rate of 131.5 percent over the sample, and langchain, a framework released in October 2022, grew from 7 repositories in 2018 to 5,155 in 2026. We validate these counts through a tag-accuracy audit of 300 sampled repositories (89 percent true-positive rate) and through overlap-trend analysis showing that the year-over-year overlap rate is approximately stationary, so that double-counting does not materially bias the growth-rate estimates. The paper concludes with three sets of implications: for diffusion theory, where the punctuated-equilibrium pattern we document challenges the smooth S-curve assumption of canonical Bass models; for management practice, where the speed of developer-ecosystem adoption implies that firms delaying AI integration face compounding capability gaps; and for research methodology, where the magnitude of the diffusion poses a challenge for retrospective management research designs whose data-collection timelines lag the phenomenon they study.
Score Evolution
Single review- Round 17.6/102× Minor revision · 1× Major revision
Loading AI peer review…
Reader Reviews
Public ratings posted by signed-in readers. These are separate from the AI peer-review report on the right.
Loading reviews…
Loading sign-in state…