0xFunky
0xFunky|Nov 29, 2025 09:28
[Technical Introduction] Nested Learning I have spent a lot of time these days studying Google DeepMind's latest papers on Nested Learning (and related Titans architecture). The deeper I read, the stronger my sense of immersion becomes We may be standing at the crossroads of the most significant paradigm shift in AI architecture since 2017. To clarify why this paper is so important, let me first talk about how AI has developed in the past seven years and why memory concept stocks (such as HBM) have risen so sharply in the market recently? All of this can be explained in three key stages (I will try to introduce it in plain language): ==Origin and Current Status: Attention Is All You Need (Transformer)== The paper published by Google Brain in 2017 is definitely one of the most important papers in the history of AI. The birth of Transformer laid the foundation for all LLMs such as GPT-4 and Gemini today. In plain language, AI nowadays is like an unforgettable genius taking an open book exam. When you ask it a question, although it is smart, it cannot memorize the book. It must 'spread out' all the reference materials (the context you provided) on the table, and then scan with its eyes (Attention) at any time to find the answer. Technical limitations: This architecture is "static" and AI's brain (parameters) are frozen, relying only on a "table (memory)" to temporarily store information. ==Bottlenecks and Market Phenomena: Scaling Laws== In the past few years, giants such as OpenAI have discovered Scaling Laws: 'The bigger the desk and the more books there are, the better you will do on exams.'. '' Using GPT to see how AI's' Context Window 'has grown in recent years GPT-3 (2020): Only 2k-4k tokens are available. The table is very small, and after chatting for a few words, it forgets the previous setting. GPT-4 (2023): Expand to 32k tokens. I can barely put aside a financial report. GPT-4 Turbo (2024): Boosted to 128k tokens (approximately 300 pages of book). Gemini 1.5 Pro (2024): even surged to 1 million+tokens (several Harry Potter complete works). It looks great, but that's the problem. To maintain this' infinitely large table ', we need an extremely large KV Cache. This has led to a bottomless pit for the demand for HBM (high bandwidth memory) in current AI chips such as H100/Blackwell, which is why memory stocks and NVIDIA have skyrocketed. Simply put, modern AI is too dumb to remember things, so it can only rely on brute force hardware (buying oversized desks) to solve problems. Although this path is effective, the cost and CP value are approaching physical limits. ==Perhaps the answer for the next generation: Nested Learning== It was under the anxiety of "memory wall" and "computing power cost" that Google's Nested Learning paper emerged in November and released Gemin1 Pro and Nano Banana Pro, attempting to evolve AI from "relying on hardware" to "relying on the brain". The foundation of all of this lies in a key technological breakthrough: Test Time Training (TTT). In the past, we thought that once AI was trained, it couldn't be changed (Parameters Frozen), but TTT allows models to instantly modify their brain neural connections even during the "exam moment". Based on TTT, Google has also proposed two major architectural innovations: 1. Infrastructure: Titans (neural memory) Google designed an architecture called Titans earlier this year, allowing AI to have an independent 'Neural Memory'. It no longer relies on open book exams. When reading new information, it is not just placed on the table, but uses gradient descent to directly modify its brain neural connections (update parameters). It's like memorizing the content of the book into your mind. 2. Evolutionary Core: HOPE (Self Reference Learning) This is the true black technology of this paper. On the basis of Titans, Google proposed HOPE. If Titans is' able to take notes', then HOPE is' able to improve their own way of taking notes'. Implemented Self Referential Learning, where the model not only learns knowledge but also adjusts its learning algorithm in real-time. This is a 'nested' system: the inner layer is learning knowledge, and the outer layer is learning 'how to learn knowledge faster'. 3. Dimensionality reduction strike: This achieves very low memory consumption. Regardless of reading one million books, it does not require a larger desk (memory) because knowledge has already been compressed and internalized into model parameters by HOPE modules, which is a complete dimensionality reduction blow to the current hardware architecture that relies on HBM. Vernacular metaphor: The Nested Learning model is a master of "learning internal skills and techniques", no longer relying on "open book exams". When it reads new information, it is not just placed on the table, but directly modifies its brain neural connections (updates parameters), and directly "memorizes" the content of the book into its mind. Regardless of reading one million books, it does not require a larger desk (memory) because knowledge has already been internalized into its intuition. ==Summary== If Transformer taught AI how to "see" key points; Nested Learning (combining HOPE and Titans) is using TTT to teach AI how to "remember" key points and "self evolve". This Google paper presents us with the next era, a "dynamic intelligence" era where models can self update and no longer rely solely on brute force stacking of memory. Although Nested Learning still has some difficulties to overcome in terms of training stability and dual loop optimization complexity. But the establishment of this direction will bring profound thinking to the capital market: 'Will this impact the current AI concept stocks?'? '' My opinion is that it may not happen in the short term, but the long-term game rules have changed. The current stock market boom (HBM in short supply) reflects the dividends on hardware (memory) in the Transformer era due to low algorithm efficiency. However, once Nested Learning technology matures and becomes popular, the future dependence of AI on "memory capacity" will significantly reduce complexity, which means that the infinite growth story of HBM may have a ceiling, and competition will return from "whose video memory is large" to "whose chip is fast (supporting TTT efficient computing)" and "whose algorithm is smarter". To be honest, this paper is worth the attention of all AI engineers, and it is a key signal for all those who are concerned about the AI capital market to re-examine whether the "hardware super cycle" can continue. The links to the relevant papers have been left in the comments.
Share To

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads