Sources & references
Every essay in this series is built from the primary literature. This page collects the 60 citations across the six essays, grouped by the essay that cites them, with links to arXiv and proceedings.
On attribution
Each work below is cited to its authors and linked to its canonical source — an arXiv page or the publishing venue's proceedings. The papers themselves remain the work and property of their respective authors; they are referenced here for scholarly commentary under normal academic citation. The selection draws on a curated corpus of several hundred papers; only those actually cited by an essay appear here. This list reflects the literature as cited through June 2026. If you spot a citation that is wrong or incomplete, let me know at dattgoswami@gmail.com.
References by essay
The papers behind each of the six essays, in the order they are cited.
1 The Selection Lens: How to Bet on Papers
- Stanley, K. O., & Miikkulainen, R. (2002). Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation, 10(2), 99–127. MIT Press.
- Finn, C., Abbeel, P., & Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR 70. arXiv:1703.03400.
- Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., et al. (2017). Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences (PNAS), 114(13), 3521–3526. arXiv:1612.00796.
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (NeurIPS). arXiv:2005.14165.
- Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., et al. (2020). Scaling Laws for Neural Language Models. arXiv preprint. arXiv:2001.08361.
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139. arXiv:2103.00020.
- Jiang, M., Rocktäschel, T., & Grefenstette, E. (2022). General Intelligence Requires Rethinking Exploration. arXiv preprint. arXiv:2211.07819.
- Kunin, D., Atanasov, A., Boix-Adserà, E., Bordelon, B., Cohen, J., Ghosh, N., et al. (2026). There Will Be a Scientific Theory of Deep Learning. arXiv preprint. arXiv:2604.21691.
2 Paradigm Bets: The Ten-Year Tier
- Bruce, J., Dennis, M., Edwards, A., Parker-Holder, J., Shi, Y., et al. (2024). Genie: Generative Interactive Environments. Proceedings of the 41st International Conference on Machine Learning (ICML). arXiv:2402.15391.
- Chang, E., Le Lan, G., Fei, J., Zhang, W., Sun, Y., Cai, Z., Liu, Z., Xiong, Y., Yang, Y., Tian, Y., Shi, Y., Chandra, V., & Schmidhuber, J. (2026). Neural Computers. arXiv:2604.06425.
- Maes, L., Le Lidec, Q., Scieur, D., LeCun, Y., & Balestriero, R. (2026). LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels. arXiv:2603.19312.
- Behrouz, A., Zhong, P., & Mirrokni, V. (2025). Titans: Learning to Memorize at Test Time. arXiv:2501.00663.
- Yan, W., Lillicrap, T., Hafner, D., et al. (2025). Training Agents Inside of Scalable World Models (Dreamer 4). arXiv:2509.24527.
- Valevski, D., Leviathan, Y., Arar, M., & Fruchter, S. (2024). Diffusion Models Are Real-Time Game Engines (GameNGen). arXiv:2408.14837.
- Jiang, M., Rocktäschel, T., & Grefenstette, E. (2022). General Intelligence Requires Rethinking Exploration. arXiv:2211.07819.
- NVIDIA. (2025). Cosmos World Foundation Model Platform for Physical AI. arXiv:2501.03575.
3 Recursion: The Third Scaling Axis
- Li, J., Liang, C., & Lao, N. (2026). Training-Free Looped Transformers. Preprint. arXiv:2605.23872.
- Deng, C., Zhang, Y., Zhu, R., Xu, Y., Liu, J., Ng, T. S. E., & Chen, H. (2026). LT2: Linear-Time Looped Transformers. Preprint. arXiv:2605.20670.
- Lee, S., Hong, C., Kim, S., Lee, J., Park, J., & Park, D. (2026). Looped Diffusion Language Models. Preprint. arXiv:2605.26106.
- Fein-Ashley, J., & Rashidinejad, P. (2026). Solve the Loop: Attractor Models for Language and Reasoning. Preprint. arXiv:2605.12466.
- Sghaier, A., Parviz, A., & Jolicoeur-Martineau, A. (2026). Probabilistic Tiny Recursive Model. Preprint. arXiv:2605.19943.
- Jo, M., Kim, M., & Ren, M. (2026). Generative Recursive Reasoning. Preprint. arXiv:2605.19376.
- Tu, G., Fu, X., Yu, S., Tang, Y., Kang, H., Qin, L., Zhang, Y., & Gu, J. (2026). Latent Reasoning with Normalizing Flows. Preprint. arXiv:2606.06447.
- Gandhi, A., Chakraborty, S., Wang, X., Kumar, A., & Neubig, G. (2026). Recursive Agent Optimization. Preprint. arXiv:2605.06639.
- Tong, H., Zhang, T., Buehler, M. J., He, J., & Zou, J. (2026). Recursive Multi-Agent Systems. Preprint. arXiv:2604.25917.
- Zhou, S., Chai, W., Liu, K., Mao, H., Mang, Q., & Shang, J. (2026). OpenDeepThink: Parallel Reasoning via Bradley–Terry Aggregation. Preprint. arXiv:2605.15177.
4 On-Policy Distillation Quietly Ate Post-Training
- Li, Y., Zhao, G., Shi, Q., Sun, L., Zhang, X., & Yang, T. (2026). A Primer in Post-Training Reasoning Data: What We Know About How It Works. Preprint. arXiv:2606.02113.
- Yang, W., Liu, W., Xie, R., Yang, K., Yang, S., & Lin, Y. (2026). Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation. Preprint. arXiv:2602.12125.
- Fu, Y., Huang, H., Jiang, K., Liu, J., Jiang, Z., Zhu, Y., & Zhao, D. (2026). Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes. Preprint. arXiv:2603.25562.
- Abdali, S., Kim, Y. J., Chen, T., & Cameron, P. (2026). Scaling Reasoning Efficiently via Relaxed On-Policy Distillation. Preprint. arXiv:2603.11137.
- Luo, F., Chuang, Y.-N., Wang, G., Xu, Z., Han, X., Zhang, T., & Braverman, V. (2026). Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models. Preprint. arXiv:2604.08527.
- Li, Y., Zuo, Y., He, B., Zhang, J., Xiao, C., Qian, C., Yu, T., Yang, W., Liu, Z., & Ding, N. (2026). Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe. Preprint. arXiv:2604.13016.
- Hao, G., Shang, Y., Long, Y., Zhao, Z., & Liang, H. (2026). Self-Policy Distillation via Capability-Selective Subspace Projection. Preprint. arXiv:2605.22675.
- Lu, T., & Liu, Z. (2026). Strong Teacher Not Needed? On Distillation in LLM Pretraining. Preprint. arXiv:2605.23857.
- Zhou, Y., Zhang, L., Wu, Y., Wang, M., Bo, P., Liu, J., Fan, X., & Zhao, Z. (2026). OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification. Preprint. arXiv:2606.01476.
- Lambert, N. (2026). Reinforcement Learning from Human Feedback. Online book.
- Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., & Kaplan, J., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Preprint. arXiv:2212.08073.
- Bansal, R., Mohri, C., Qin, T., Alvarez-Melis, D., & Kakade, S. (2026). RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM Training. Preprint. arXiv:2606.04272.
- Wu, C. H., & Raghunathan, A. (2026). Self-Trained Verification for Training- and Test-Time Self-Improvement. Preprint. arXiv:2605.30290.
- Zhu, G., Song, B., Wang, H., Xia, M., Zheng, X., Ma, Y., Chen, Z., Wang, W., & Chen, G. (2026). OPRD: On-Policy Representation Distillation. Preprint. arXiv:2606.06021.
- Lu, Z., Yao, Z., Han, Z., Wang, Z., Wu, J., Gu, Q., Cai, X., Lu, W., Xiao, J., Zhuang, Y., & Shen, Y. (2026). Self-Distilled Agentic Reinforcement Learning. Preprint. arXiv:2605.15155.
- Lyu, Y., Wang, C., Zheng, H., Yue, Y., Yan, J., Wang, M., & Huang, J. (2026). AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use. Preprint. arXiv:2604.21590.
- Qwen Team. (2025). Qwen3 Technical Report. Preprint. arXiv:2505.09388.
- Kimi Team. (2025). Kimi K2: Open Agentic Intelligence. Preprint. arXiv:2507.20534.
- GLM-4.5 Team. (2025). GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models. Preprint. arXiv:2508.06471.
5 When AI Did Mathematics
- Alon, N., Bloom, T. F., Gowers, W. T., Litt, D., Sawin, W., Shankar, A., Tsimerman, J., Wang, V., & Matchett Wood, M. (2026). Remarks on the Disproof of the Unit Distance Conjecture. Mathematical note (manuscript).
- Tsoukalas, G., Kovsharov, A., Shirobokov, S., Surina, A., Firsching, M., Bérczi, G., Ruiz, F. J. R., Suggala, A., Wagner, A. Z., Wieser, E., Yu, L., Huang, A., Horváth, M. Z., Ferrauiolo, A., Michalewski, H., Grosu, C., Hubert, T., Balog, M., Kohli, P., & Chaudhuri, S. (2026). Advancing Mathematics Research with AI-Driven Formal Proof Search. Preprint, Google DeepMind. arXiv:2605.22763.
- Kung, P.-N., Song, L., Hwang, D., Yoon, J., Li, C.-L., Severini, S., Olšák, M., Lockhart, E., Le, Q. V., Gokturk, B., Luong, T., Pfister, T., & Peng, N. (2026). LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks. Preprint, Google. arXiv:2606.03303.
- Wiemann, M. L., Smith, L. M., Melchior, P., Mishra-Sharma, S., Wilson, A. G., Izmailov, P., & Cuesta-Lázaro, C. (2026). DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking. Preprint, Princeton University et al. arXiv:2605.26087.
6 The Continual Agent
- Goswami, D. (2026). cl-agent: A Continual-Learning Substrate for Coding Agents — Episode Capture, Replay, and Rule-Based Distillation for Cross-Session Improvement Without Fine-Tuning. Preprint, Independent Research, April 2026. PDF · github.com/dattgoswami/cl-agent
- Goswami, D. (2026). cl-agent (earlier draft). Preprint, Independent Research, April 2026. [Superseded by the current draft; cited as revision history in §3.]
- Wang, Y., Chen, X., Jin, X., Wang, M., & Yang, L. (2026). OpenClaw-RL: Train Any Agent Simply by Talking. Preprint, Princeton University. arXiv:2603.10165.
- Xue, T., Liao, Z., Shi, T., Wang, Z., Zhang, K., Song, D., Su, Y., & Sun, H. (2026). Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents (ACuRL). Preprint, The Ohio State University. arXiv:2602.10356.
- Memento-Team (2026). Memento-Skills: Let Agents Design Agents. Preprint. arXiv:2603.18743.
- Tiwari, R., Sareen, K., Agrawal, L. A., Gonzalez, J. E., Zaharia, M., Keutzer, K., Dhillon, I. S., Agarwal, R., & Khatri, D. (2026). Learning, Fast and Slow: Towards LLMs That Adapt Continually. Preprint, UC Berkeley / Mila / UT Austin. arXiv:2605.12484.
- Asawa, P., Glaze, C. M., Orlanski, G., Ramakrishnan, R., Xu, B., Biswal, A., Chen, V. S., Sala, F., Zaharia, M., & Gonzalez, J. E. (2026). Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments. Preprint, UC Berkeley / Snorkel AI / University of Wisconsin–Madison. arXiv:2606.05661.
- Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., & Anandkumar, A. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. Transactions on Machine Learning Research. arXiv:2305.16291.
- Dohare, S., Sutton, R. S., & Mahmood, A. R. (2021). Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. Preprint, University of Alberta / Amii. arXiv:2108.06325.
- Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Betteridge, J., Carlson, A., et al. (2015). Never-Ending Learning. Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI).
- Sutton, R. S., Koop, A., & Silver, D. (2007). On the Role of Tracking in Stationary Environments. Proceedings of the 24th International Conference on Machine Learning (ICML).