I think it’s incredibly naïve to think that because we’ve hit a boundary on one particular aspect of LLMs that the technology has peaked as a whole. There are lots of ways to improve LLMs that aren’t just increasing the parameter size, for example there’s been an uptick in smaller models that are optimized to run on client devices without large GPUs. There is probably a future where we have small 3-7B models that are competitive with today’s best 70B models, but can run in real time on any smartphone. We’ll have larger context windows, allowing LLMs to work on larger problems. And we’ll have better techniques for getting high quality information out of LLMs, there are already adversarial methods where two LLMs hold a debate on a subject that have proven more accurate and comprehensive data is possible. They’ll also continue to be embedded into different places in software that make them more useful, not just like a chatbot that lives in its own world.
Improvements are made all the time. You can’t feed a very large SVM the same data as transformer networks and expect it to perform the same. Transformers are used because they can more easily learn complicated patterns with less data.
I think I’ve read somewhere that neural networks with only one hidden layer can theoretically predict anything (if the hidden layer is large enough), but an incredible amount of data is required for it to do so, so it’s not practical.
Over time other models will be discovered that can make better use of the training data.
What you mentioned is assumed video and paper in question.
The main argument being that no matter our computational techniques, the diminishing returns in predictive precision is reached far sooner than we achieve general intelligence.
No the argument is current techniques give logarithmic returns in data size, which is bad. But it said nothing about other potential techniques or made any suggestion that this was a general result.
My personal take is that the current generation of generative models peaked, for the reasons stated in the video (diminishing returns). This current gen will be useful, but progress-wise it'll be a dead end.
In the future however I believe that models with a different architecture will cause a breakthrough, being able to perform better with less training. And probably less energy requirements, too.
I've already thought that in terms of major progression AI has peaked as early as in 2022 when chatgpt and various diffusers were all hyped up. It was kinda obvious, since our silicon tech is already basically maxed out. There are lots of potential optimizations, but they are minor advancements compared to the raw compute power growth we've had till the near past.
And in order to make the next revolution in the AI field, those moneybags will have to spend the colossal amount of money to basically reinvent either computers themselves or the ML architechture.
I don't think that reinventing computers will do any good. The issue that I see is not hardware, but software - the current generative models are basically brute force, you throw enough data and processing power at the problem until it becomes smaller, but at the end of the day you're still relying too much on statistical patterns behind the wrong entities.
Instead I think that the ML architecture will change. And this won't be done by those tech bros full of money burning effigies, who have a nasty/stupid/disgraceful tendency to confuse symbolic representations with the things being represented. Instead it'll be done by researchers in some random compsci or robotics lab, in a random place of the world. They'll be doing some weird stuff like emulating the brain of a fruit fly, and someone will point out "hey, you see this feature? It has ML applications". And that'll be when they actually add some intelligence to those systems, i.e. the missing piece of the puzzle. It won't be AGI but it'll be better than now, at least.
...and aren't making progress on that front: A linear increase in generalisation still requires a more than linear increase in amount of data.
Also it's not btw that we wouldn't know that our current architectures won't lead to proper intelligence, tl:dr: While current architectures can learn, and represent information, they cannot develop learning strategies or decide smartly on how to represent a particular bit of information. All the improvement that are happening are on that "how to learn better" area, we have no idea whatsoever how to make the jump on how to teach an AI to learn how to learn. AlphaZero is able to learn rules of a game, yes, but it can't learn arbitrary information -- once you throw something other than a game at it it has no idea how to make sense of anything.
i think OpenAI more than anyone knows the challenges with scaling data and training. anyone working on AI knows the line: “a baby can learn to recognize elephants from a single instance”. reducing training data and time is fundamental to advancement. don’t get me wrong, it’s great to put numbers to these things. i just don’t think this paper is super groundbreaking or profound. a bit clickbaity and sensational for Computerphile
I am just waiting until it makes the leap to 3D . With that, you will start seeing 3d assets in videogames become cheaper and quicker to make, VR rigging will soon follow, and when the tech reaches its peak - automated design for 3d printing.
On the other hand, if we move from larger and larger models with as much data they can gather to less generic and more specific high quality datasets, I have a feeling there's still a lot to gain. But quality over quantity takes a lot more effort to maintain.
The video is more about the diminishing returns when it comes to increasing size of training set. It’s following a logarithmic curve. At some point, just “adding more data” won’t do much because the cost will be too high compared to the gain in accuracy.