Returning to the Era of Research

After the Scaling Laws

Introductions

Long time for updating the blogs!

Scaling Law 给出了大语言模型在工程上迅速落地并且产生实际效果的切实路线,并在近两年被 proved right。但是很快,数据和算力的消耗增长成为阻碍基座模型能力提升的瓶颈之一,预训练数据的稀缺,大量工作的关注度被引导至后训练微调。一个棘手的问题摆在眼前: What is after the scaling laws?

从一个批判性的视角来看,Scaling Law 一方面提供了一条切实可行的路径,在短期内模型的能力获得了极大的提升,但是另一方面使得当前的大语言模型发展陷入严重的路径依赖,一些可能根据潜力的架构在当前的表现上并不能比当前模型表现的更好,但是这些研究往往不能获得很高的关注度,科学研究的道路总是崎岖的。

故本文分享两篇颇具前瞻性的研究报告的随手笔记,也作为作者本人对 After Scaling 的一些思考。

Maybe this is the future beyond scaling.

Rich Sutton - Oak Architecture

Rich Sutton - Oak Architecture

The Definition of Intelligence

  • Core View: The essence of intelligence lies in machine learning from experiments (derived from Alan Turing’s concept).
  • Constituent Elements of an Experiment:
    1. Observational data during runtime
    2. Reward signals for feedback
    3. Autonomous action decisions
  • Division of Labor Between Design and Runtime Phases:
    1. Design Phase: Embed domain-specific knowledge.
    2. Runtime Phase: Complete dynamic learning based on the experimental process.
  • Limitation: Relying solely on knowledge from the design phase cannot adapt to complex real-world scenarios.

Ilya Sutskever – We’re moving from the age of scaling to the age of research

Evals for Models

  1. Disconnect Between Evaluation and Real-World Performance
    • Models perform excellently in standard evaluation tasks but yield poor results in real-world scenarios.
    • Core Cause: Reinforcement Learning (RL) training relies on narrow datasets, which easily leads to overfitting.
  2. Human-Centric “Reward Hacking”
    • The real reward hacking does not come from the models themselves, but from human researchers who overly focus on evaluation metrics.
    • Derived Issue: A large number of incorrect priors are introduced during the training phase, and the training process itself becomes a “reward hacking” behavior, making it difficult to verify the model’s generalization ability solely through evaluations.
  3. Specificity of Value Functions
    • Dilemma of DeepSeek R1: The trajectory space is too vast to establish a mapping between intermediate trajectories and values (i.e., limitations of token-level reward signals).
    • Characteristics of Human Value Functions: Influenced by emotions, some perceptions are hard-coded by evolution, and overall, they exhibit strong robustness except in addiction-related cases.

Scaling Laws

  1. Scaling Logic for Pre-Training: There are clear scaling laws in the pre-training phase, and model performance can be improved by expanding the scale of data, computing power, and parameters.
  2. Transition of Domain Development Phases: The AI field has undergone a transformation from the Age of Research (2012-2020) to the Age of Scaling (2020-2025), and finally back to the Age of Research. The renewed Age of Research is supported by a more massive computing resource base.

Generalizations

  1. Two Core Bottlenecks
    • Sample Efficiency: Humans require far fewer samples for learning than AI models, indicating an inherent difference in capabilities.
    • Teaching Efficiency: AI agents need verifiable reward signals, yet the cost of verifying rewards for long trajectories is extremely high, highlighting the necessity of continual learning.
  2. Importance of Evolutionary Priors: From the perspective of human evolution, prior knowledge is crucial for improving learning efficiency and generalization ability, a mechanism that AI models have not yet replicated.

The Era of Research

  1. Drawbacks of Scaling: The trend of scaling has led to convergence in research directions across the industry, with all institutions focusing on the same paths and suppressing the diversity of innovation.
  2. Core Bottlenecks in Research
    • Idea Bottleneck: Difficulty in generating groundbreaking and innovative research ideas.
    • Implementation Bottleneck: Limitations in engineering capabilities and computing resources required to translate ideas into practical outcomes.

Alignment

  1. Perceptual Dilemma of AGI: The capabilities and impacts of AGI are difficult to perceive intuitively in practice, and its future potential is also hard to predict due to the lack of concrete references.
  2. Core Directions for AI Safety
    • Safety is strongly correlated with model capabilities; as capabilities grow, safety risks and the level of attention to safety increase simultaneously.
    • Opposes the industry’s over-focus on “self-improving AI” and proposes a better direction: building AI that is robustly aligned and cares about sentient life. Since AI itself may be sentient, this direction is more achievable than merely aligning with human values.
  3. Deployment and Equilibrium of Superintelligence
    • Definition of Superintelligence: Not a “finished mind” capable of doing all jobs, but an intelligent agent with the ability to learn any job. Its deployment logic is similar to that of human workers joining an organization.
    • Long-Term Equilibrium Vision: Humans may need to become “part AI” through technologies like “Neuralink++” to maintain understanding of and participation in an AI-dominated society. Meanwhile, it is necessary to reasonably limit the capabilities of superintelligence to avoid extreme execution of a single goal.

SSI’s Differentiated Path

  1. Core Advantage: Focuses on research into reliable generalization through a unique technical route, rather than following the industry’s scaling path.
  2. Resources and Goals: SSI has raised $3 billion in funding, and its research computing power is competitive as it does not need to be allocated to inference and product-related tasks. Its goal is to verify forward-looking ideas related to generalization and become a frontier participant in the safe implementation of superintelligence.

Industry Future Forecast

  1. Technological Convergence: As AI capabilities improve, the technical strategies and safety solutions of frontier companies will gradually converge, forming a unified industry consensus.
  2. Time Dimension: Superintelligence with human-like learning capabilities is expected to be realized in the next 5 to 20 years; existing scaling paths will stagnate in growth, but related companies will still achieve considerable revenue.
  3. Benefit Distribution: The benefits of human-like learning models will not be monopolized by a single company. Market competition will drive technology diffusion and price reduction, eventually forming a pattern where multiple companies occupy different professional fields.

Returning to the Era of Research
https://xiyuanyang-code.github.io/posts/Returning-to-the-era-of-research/
Author
Xiyuan Yang
Posted on
December 8, 2025
Updated on
December 8, 2025
Licensed under