Some in the AI industry have proposed concepts similar to Moore's Law to describe the rapid growth of AI capabilities.
Although there is no universally accepted law or principle akin to Moore's Law for AI, people often refer to trends that describe the doubling of model sizes or capabilities over a specific time frame.
For instance, OpenAI has previously described a trend where the amount of computing power used to train the largest AI models has been doubling roughly every 3.5 months since 2012.
The first AI development was done around 1945, and what we are seeing now is the 3rd AI renaissance. The problem with AI until now was that it showed great potential, but it ran into issues we didn't have then technology to solve, which completely killed the field of AI for decades.
CPU development was very linear, AI development was not. There were decades with very little AI research, and there were decades with explosive development.
The advancements in this space have moved so fast, it's hard to extract a predictive model on where we'll end up and how fast it'll get there.
Meta releasing LLaMA produced a ton of innovation from open source that showed you could run models that were nearly the same level as ChatGPT with less parameters, on smaller and smaller hardware. At the same time, almost every large company you can think of has prioritized integrating generative AI as a high strategic priority with blank cheque budgets. Whole industries (also deeply funded) are popping up around solving the context window memory deficiencies, prompt stuffing for better steerability, better summarization and embedding of your personal or corporate data.
We're going to see LLM tech everywhere in everything, even if it makes no sense and becomes annoying. After a few years, maybe it'll seem normal to have a conversation with your shoes?
We’ve reached far beyond practical necessity in model sizes for Moore’s Law to apply there. That is, model sizes have become so huge that they are performing at 99% of the capability they ever will be able to.
Context size however, has a lot farther to go. You can think of context size as “working memory” where model sizes are more akin to “long term memory”. The larger the context size, the more a model is able to understand beyond the scope of it’s original model training in one go.
That is a pretty wild assumption. There's absolutely no reason, why a larger model wouldn't produce drastically better results. Maybe not next month, maybe not with this architecture, but it's almost certain that they will grow.