The Evolving Landscape of AI: Contrasting Views on Models, Training and Inference
Models, Training, and Inference are all terms that have history in Statistics, Machine Learning, and Applied Probability, but AI has brought them into the mainstream. In this era of rapid AI innovation, there are two contrasting views coming together that are shaping the way we understand and utilize models, training, and inference. This blogpost examines the tradition view and the foundation model era view and their implications on accuracy, cost, enterprise-readiness, and governance.
Introduction
Models, Training, and Inference are all terms that have history in Statistics, Machine Learning, and Applied Probability, but AI has brought them into the mainstream. In this era of rapid AI innovation, there are two contrasting views coming together that are shaping the way we understand and utilize models, training, and inference. In this blogpost, I will examine these two contrasting perspectives and their implications for accuracy, cost, enterprise-readiness, and governance.
View #1: Traditional View
For years, data scientists and academics have taken a view of models, training and inference that can be described thusly:
- Models are representations.
- Training involves algorithms searching for the best representation using available data.
- Inference refers to the appropriate extrapolation using the found/learnt representation.
If one takes this view of the world, cleverness / innovation / research are focused on two aspects:
- The ability to come up with "representations" that are "universal enough" so that when data in a given context is presented, algorithms are able to find the right representation that both "fits" the data well (aka training) and "extrapolates" it well (aka inference). This is where having a formal understanding of what context one is trying to "model" matters.
- The ability to come up with efficient, performant training and inference algorithms. In this paradigm, algorithms are efficient if they utilize minimal computation, and work with as little data as possible in the presence of noise. Algorithms are performant if they can extrapolate or provide accurate inference.
View #2: Foundation Model Era View
With the advent of foundation models, the perspectives on models, training, and inference have been altered.
- As before, models encode all the possible, relevant representations that matter, but now, they are not available formally or succinctly; they are simply encoding all the data they are trained on.
- Training involves encoding all of the available data using clever system architectures.
- The task of inference is about using the new data at hand (typically limited and noisy) to extrapolate findings via the “pre-trained” model.
In this view of the world, cleverness / innovation / research comes in two flavors as well:
- The ability to come up with system architectures that enable the seamless encoding and decoding of the existing, massive data repository of heterogenous data, ensuring that selective decoding can be done using data at hand. This is where very exciting progress has been made -- especially in the last decade – and continues to fuel rapid innovation currently with much more to come.
- The ability to perform accurate inference with as little data (aka tokens) as possible by leveraging the power of giant, pre-trained models. Here, the importance is placed on the ability to perform inference in a computationally efficient and cost-effective manner.
Traditional View versus Foundation Model Era View: Which One?
Now that I’ve outlined the competing views, let’s assess the merits of these two paradigms by considering a simplistic assignment: multiplying 7 x 9.
In View #1, we will say that multiplication is described through a formal arithmetic operation (as we currently do).
In View #2, we will say the multiplication results for every possible pair of numbers are pre-recorded (as we memorized multiplication tables when we were very young).
To multiply 7 and 9 using the first approach, we will run the multiplication algorithm to find 63 as the answer. To multiple 7 and 9 with the second approach, we will retrieve 63 as the answer by recalling the memorized table.
Using this example, let’s evaluate View #1 and View #2 for accuracy, cost, enterprise-readiness, and governance.
Accuracy:
If we know the desired operation well, as we do with multiplication, then View #1 should be the option of choice.
That is, if we know the context well, such as in time series and tabular data, and there are good formal representations available with efficient inference and learning algorithms, use View #1, not View #2. Indeed, Large Graphical Models (LGMs) are example of View #1 for tabular, time series data.
But when we have limited understanding of formal representations, such as in the context of natural language processing or computer vision, it might be better to use View #2, not View #1.
Cost:
If the cost of arithmetic computation is a lot less than the cost of retrieving pre-computed answers from a massive data repository, View #1 is the way to go. In other words, if training and inference can be performed efficiently on the available data, it’s better to leverage traditional methods that don’t require massive infrastructure to simply retrieve inference from a pre-trained model.
On the other hand, when accuracy is an issue, it may make sense to spend more for View #2, especially if View #1 proves inadequate for the data at hand (e.g. images or texts.)
Enterprise-readiness:
When looking to solve problems in the enterprise, understanding and being able to explain the outputs of models (and how results were achieved) is just as important as the accuracy and observability around the results (i.e. continual monitoring.)
With View #1, it is typically easier to achieve explainability since the formal representations guide the training and inference process.
In View #2, it is difficult to explain the reasons for inference without maintaining a complete lineage of data transformation. Indeed, this is a challenge that is a hot topic for everyone bringing AI to enterprises and consumers alike.
Governance:
In an enterprise, the data at hand is unique to the enterprise. Therefore, inference needs to be done using the enterprise environment as the primary context.
Typically, in View #1, both training and inference utilize only the necessary data, simplifying the governance of data, and mitigating data leakage risks.
View #2 introduces data leakage and governance challenges, if not nightmares, unless the enterprise has access to all of the data required and is able to bring the pre-trained model fully within its environment. While Retrieval-Augmented Generation (RAG) style hybrid architectures are attempting to solve this, they are bandages at the best -- unless the entire pre-trained model is controlled and managed within the enterprises’ ecosystem.
Conclusions
The choice between View #1 and View #2 depends on various factors, including the specific context, available data, and the relative importance of accuracy, cost, enterprise-readiness, and governance.
As a summary, View #1 may be better when:
- The desired operation or context is well-understood, and formal representations are available such as tabular, time series data using LGMs
- Training and inference can be performed efficiently on the available data such as using LGMs
- Explainability is crucial, as formal representations guide the training and inference process
In contrast, View #2 may be more suitable when:
- There is a limited understanding of formal representations, such as in natural language processing or computer vision
- View #1 proves inadequate for the data at hand, utilize this expensive approach
- The enterprise has access to all the required data and can fully control the pre-trained model within its environment
As both approaches are being used and considered in the current AI landscape, understanding these contrasting views and their implications will set builders and their business counterparts up for the best outcomes and successes with AI.
Learn more here.