Making Data Simple
Hosted by Al Martin, WW VP Technical Sales at IBM. Making Data Simple provides the latest thinking on leadership, big data, A.I., and the implications for the enterprise from a range of experts.
Making Data Simple
All Things AI: Exploring Large Language Models with Manav Gupta, VP & CTO of IBM Canada
The podcast welcomes Manav Gupta, VP and CTO of IBM Canada. As a frequent collaborator on client visits, we discuss various aspects of large language models. So all things AI models.
- 02:04 Jumping Right into AI!
- 02:59 Meet Manav Gupta
- 08:17 Let's Talk All Things Models
- 27:20 How to Choose the Right Models
- 31:48 Where are the Models Going???
- 46:01 How to Learn AI
Linkedin: linkedin.com/in/mgupta76
Website: https://www.ibm.com/granite
Want to be featured as a guest on Making Data Simple? Reach out to us at
almartintalksdata@gmail.com and tell us why you should be next. The Making Data Simple is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technology, business innovation, and leadership ... while keeping it simple & fun.
#makingdatasimplepodcast #AI #LargeLanguageModels #TechLeadership #ArtificialIntelligence #IBMCanada #ManavGupta #AIInnovation #TechPodcast #AIModels #LearnAI
Want to be featured as a guest on Making Data Simple? Reach out to us at almartintalksdata@gmail.com and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
So you've already got your mic. You're ready to go. This is easy. Right? Oh, we'll see.
We'll see. As as we get started, we'll see how it goes. I'm ready to go. But you gotta get that energy up. See, I mean, there are certain things you just cannot fake.
When you're in front of a client, there is just, you know, energy. All those energy. Sure. Yeah. Well, she don't have as much energy for me.
She's like, oh, it's out. Let's go. You're listening to making data simple when we make the world of data effortless, relevant, and, yes, even fun. Hey, podcast listeners. Al Martin here.
Thank you for joining us. Hopefully you had a a happy holiday, whatever you celebrate and happy new year. Get all your new year's resolution set up and ready to go because, you know, that's how it starts. This year's new new year resolution is gonna be doing transformers by hand. So I'm building this Transformers by hand?
What's that mean? Yeah. Yeah. Or AI by hand. So I'm building this Excel file to teach some of the university students.
And the idea there is that, you know, everything from embeddings to creating a vector database to even understanding that transformer architecture, I'm doing it all through Excel. Nice. So so the kids can learn. I mean, IBMers can learn. The world can learn.
Right? It'll be it'll be fun. It'll be fun little product to do. We were on a client visit recently together, and you were talking about you have an interest in teaching. Is this part of your teaching endeavors?
Yes. Partly. So, I mean, look, I feel the the the rate and pace of innovation that's going on in AI, I sometimes feel that I'm just not running fast enough, But I'm sure many people cannot test it. So this is my attempt to go, maybe to some extent, go back to the basics, but also add some of the more upcoming stuff. And we'll talk about some of this.
Like, as an example, this week, I I came across the model context protocol or MCP by Anthropic. So let me take a step back. So part of the thing that I kept wondering is, well, when these agentic AI takes off and, you know, we're all going to have it like like like, the next couple of years is all gonna be about agents. If you really sit back and think about it, well, how are we really going to consume these agents? How are we going to configure these agents?
Right? Are we just gonna consume agents that one or a few providers provide? What if I don't want their agents? What if I wanna create my own agent? And by the way, your agent might be different than mine.
And how am I gonna give that agent context? And so this seems to be a really promising way of how we can have agent AI made real for everybody. So we'll we'll talk about that in a bit more detail. Alright. Alright.
Like, yeah, we're getting way, way ahead of ourselves. And we gotta we gotta set this up a little bit. In traditional fashion, I wanna make sure I give you the opportunity to introduce yourself. Give us a little of your experience, your background, and what brings you to today. Thanks, Al.
First of all, thank you for having me on. It's a pleasure. I've heard a lot about this podcast. I've I've heard many of the episodes. I know this goes far and wide.
So thank you for having me on this. I'm Anav Gupta, the CTO for IBM Canada. I think that's an over inflated title to some extent. I call myself the chief troublemaker because that's really who I am. So I've got a 25 odd year background in distributed computing all the way to my master's, which I did in India, and I followed it up with a couple of other diploma courses I did in the UK.
Ironically, when I did my, one of my, courses was in was in AI. This was back in 2006. You know, it's interesting. Most of the AI projects, they used to die on the line searching for a business case. I've had a tech industry, distributed computing, did a lot of service assurance and service optimization, especially in the telecom industry, for almost a decade.
So when I was employed with Qualavi, which got acquired by MicroMuse, came to IBM. And then I did cloud, lots of cloud native stuff for many clients globally. And I'm a I'm a technologist, or at least I would like to believe I am. So very hands on all the way from from Cloud Foundry through to Kubernetes to OpenShift to Terraform, Ansible. And then, yeah, the last 3 years is all about AI.
We just can't get enough of it. So that's me. I've written a few books. I've got a few patents. My kids think that I'm still cool, so that's a lot really matters, really.
Wow. That's that's that's the most impressive thing right there. How many kids you have? Three boys. 7, 11, and 13.
So as a family, we are learning Python, and we are into 3 d printing a lot. So we keep also busy. So they're all techies too? Well, I'm hoping they will be. As you guys can hear on the podcast, I think Manav and I are probably too comfortable with one another.
I didn't say that he's in IBM. We're gonna do IBM today. I'm gonna start the podcast this way. I'm a fan of Manav. And Manav, why do you think I'm a fan of you?
I'm a troublemaker. I think that's part of it. I think that's part of it. I am a fan of a book that I just kind of like. It's a sales book.
I lead tech sales now. So as you guys listeners know, and it's the Challenger Sale that talks about teaching, tailoring, and taking control. I think those are easily said hard to do. You've got to know the technology and you've got to also have a business sense. You also have to have a comfortability in front of of clients.
I don't know if I've said this before on the podcast, but try to stay technical as I can too. By the way, I don't know how you ever know if you can stay ahead of the game at this point in time. You can stay, at least me, you can stay up to speed in a lot of different areas, but, I mean, staying ahead, I don't I don't know how that's possible. You have to tell me. But, anyway, I I think the best way to describe enough that I've said this before to the team here is I'm at a conference.
You know where I'm going with this. Right? He we could see one another. Yeah. So I'm at a conference.
I'm walking back to my room because I gotta get ready for the next client meeting I have. Manov stops me, speaking of chief troublemaker. He stops me and said, hey. Hey. Have you tried the new Granite 3 o?
These are IBM models, large language models. I say, no. I haven't tried this. They just got announced today, and I'm with clients all day. It's a conference, Manav, and he's like, hey.
Unacceptable. Unacceptable. You should be trying these. And he's like, at this cafe going crazy looking at these new models. And I'm like, Manav.
And he's like, you need to try it. And so I said, hell with okay. Fine. I'll meet you this afternoon. We'll try these and, you know, I'll get started.
I know I haven't had try time with it, but that's the kind of person he is. He's also that way with our clients. You know, sometimes don't get me wrong. I think you need to listen. But he really has the confidence, and the thing I like about it is, look, he'll tell you if our technology is not where it should be in a certain area or that's not our strength, but it'll also go bullish areas that, you know, we need to go bullish and we need to be more forthright.
IBM tends to be a company, I think, that we're not very good at touting ourselves. We're very humble. And I think sometimes we you've we've gotta talk like it is. People want that teach, tailor, take control. I'd call it a trusted technical adviser, and I think that's where Manav is.
So what I wanted to do today I'm not selling IBM technology in the terms of marketing necessarily, but I wanna talk about technology. I'm gonna break a little bit of the rule today. I wanna talk about IBM technology. I want you to talk to me like I'm a client. I'm going to push back at you, and I want to have that conversation.
So let's have the conversation. We'll start it there, and I'll bring in some personal questions as well. I want you to start like you're going to give me a pitch on IBM Technology as it relates to our AI ambitions. We are a hybrid cloud and AI company. You'll start it off that way, and let's go from there.
Does that make sense? Is that fair? Yep. I think that makes sense. But before we start, did you try out Granite 3.1 yet?
No. I have not, Manav. 3.0 down. No. I have not done 3.1.
So thanks for challenging. Not not not acceptable. $48 all the way to run by. Right? So listen.
Let me start there. Right? So let let me use that as a segue to get in. I'll give you sort of my perspective on where I think, at least for enterprises, where this tech is going and why I would encourage everybody to think about, adoption of AI in a certain way. At a macro level, you know, it's no secret that that the world is getting divided when it comes to AI in at least 2 major buckets.
Right? So you have the, let's call them, state of the art models. They also happen to be or sometimes called frontier models. They are generally in the closed space, the closed ecosystem. You know, you obviously see OpenAI, Anthropic, some of the models.
Some of the other models show up there. Like, Google Gemini shows up there and so on and so forth. If you look at their business model as well, so the model itself is a product. Right? And it's being monetized through API calls and multiples of 1,000 or 1,000 tokens.
And it's being embedded into whether it's the productivity suite, and so on and so forth. So that's one scenario. Quite candidly, most of these models are also delivered by certain cloud providers. Again, from an enterprise perspective, the end game there is once I have or actually even from a consumer perspective, the growing realization is that the AI is going to be only as good as the data. So if the enterprise or if the hyperscaler has your data, you're going to be encouraged or I don't wanna say forced.
But you it probably makes sense for you to use the AI that's embedded there, that's available onto the cloud. Then there is the other end of the spectrum, which is the open community. And really, let's take a step back to January and then February of 2023. So ChargeGPT came out November 2022, or really that's when the world heard about ChargeGPT. They were they were working on by the time 3.5 came out and the world knew about it, OpenAI was working and testing on 4.
So in the open community, so Meta released leaked, you know, pick pick pick a word, what became original llama. And then the open community really got there and got started on the possibilities and the things they could do. Right? And then all of a sudden, we started seeing this rapid advancement into large language models. And we are at what?
Llama 3.2 at this point, 2 years in. So on the open side, you're beginning to now see the open models be very close and in some cases, beating these close state of the art or frontier models. But the other interesting that's going on is it's sometimes it's a matter of the open community of developers actually pushing the boundaries with the limited resources that they have at their disposal on where the larger models are going. So for a brief period of time, the industry was enamored, and they thought that we're going to have the one model to rule them all. Right?
Get the model as big as you can, and then everybody's gonna use it. And then there is, you know, increasing realization that maybe that's not the case. So then you have the community veering towards smaller fit for purpose models that are trained on datasets that an enterprise cares about. So, I mean, the classic example is if you're if you're in a financial services industry, do you really need your AI assistant, your live language model to write shows like Shakespeare? I mean, I would submit not.
Right? I would rather have that thing know all there is to know about me, what kind of services I I'm using, and what they can do for me. Right? I mean, that's what I'm going there for. In that But hold on.
But I think I think if you're one of these, providers of the very large language models and by the way, I got on stage with one of them at one point in time, and I basically said what you're saying. And he all but said, hey. I don't know what I'm talking about. He said it much nicer than that after he was on stage, but I'm with you. We're we're gonna reduce the number of of the size of the model for the reasons you mentioned.
But he would say, yeah. No. You're not gonna use the next poem. Got it. But the greater the model, the more how do I wanna say this?
The greater breadth of answers you can get. So why not go with that larger model if you can go with that larger model? That was his point of view. Right. I think there is some truth to that.
So so let's double click on that. So we certainly want the model to have as much generality as possible. So what are we what do I mean by that? So if you look at the the the most recent, whether it's LAMA 3 1, 3 2, 7, 8,000,000,000 parameter models or the GRANITY 3.1, 3.2 actually, not 3.1. We got a 33.1, 8,000,000,000 parameter models.
These are all models that are trained on over 10,000,000,000,000 tokens of data. That's trillion with a t. At that point, the models have sufficient representation of general human human knowledge in them that you can ask it any question, whether it's a fundamental, mental reasoning question that you're asking it, whether it is a general knowledge trivia question, or writing prose and so on and so forth. At that point, it's less important as to how dense and big the model is. But it becomes a more more a matter of, does the model give you everything you require?
So certainly performance becomes one element. But remember, for a business, there is more than 1. Right? There is cost. There is trust.
There is what footprint is it gonna require to run on? Right? Whether you're running it on premises or, you know, wherever else in in in whatever VPC. So I think there is some truth that, okay, you do want the model to have general enough knowledge. But the predominant use case, whether you are in the financial industry or in the insurance industry or whether you're a retailer and so on and so forth, you want to harness the generality of the knowledge to serve your clients for the products and services that you offer.
So that offers a very different lens to it. Fair enough. Fair enough. So Okay. But I don't know.
How do you know when you met that balance? Okay. Now now we're on the interest, I think. So how do you know where that we've met the balance? So so now let's talk about how our enterprises actually, let's start with how the how are the consumers using it, and then we get to the enterprises.
So over the course of last 2 years, we've all learned how to use these LLMs, and we've all come to a realization that the limits of our imagination are the limits of things that that we can ask these things. Right? So we've done you know, there are cottage industries that have come up with things like prompt engineering, etcetera, which is really a clever way of writing a prompt. So now I can write make it to write recipes. I can write, you know, Python notebooks.
I can do text to image, image to text, so on and so forth. For an enterprise context, if you are a business, small or large, delivering any service whatsoever, what do you truly require? You want this thing to know something about your clients, something about your policy, something about your products. So in other words, you want to marry the general knowledge that the model has with the data that you have in the enterprise. And if your CSO has done their job, less than 1% of the data of your enterprise data should be available on the public Internet, which is the source where all of these models are trained on.
So then we get into how enterprises are using these models. We are then doing things like creating vector representations into things like RAG, or retrieval augmented generation, which is in a way is a suboptimal way of using these LLMs. Because what's going on really is the neural network that's powering this this LLM, which has all this general knowledge, is going to waste. Because what you're really doing is bulk of your application is doing this vector proximity search to find corpus that's closest to the user's question. And you're using that LLM in the last mile to basically take that smaller knowledge space and create an answer that's beneficial, that that that can be human readable.
So the other approach that the world is using is something called fine tuning. Right? It's a it's a it's a technology that's existed for at least 30 years. Now fine tuning comes with its own side effects. Right?
So one of the fundamental drawbacks of fine tuning or side effects in fine fine in fine tuning is the models can suffer from what's called catastrophic forgetting. So this all this generality, these 10, 12,000,000,000,000 tokens of data that the model's trained on, it's gonna forget if you were to take that train that model on, say, customer acquisition or fraud. So then that leads to the problem of the 1,000 llamas. Right? So every time you do you fine tune a llama, you give birth to another one.
So now you need to figure out right? I mean, that's kind of what so if you go to Hugging Face today, then right? If you go to Hugging Face today, there is 800,000 llamas there. How do you know which one to use? How do you know which one to trust?
Like you do. Yeah. Right? So then you need to figure out how are you going to take a model that's of a reasonable size whose supply chain of data you can trust, that hopefully has the right license that you want for your business case, that allows you some visibility or, I mean, as much visibility as possible into the dataset that was trained on that offers you some level of indemnification. And then you need to figure out how can you take that base model and add your enterprise data into it.
I mean, that's the magic. So, again, IBM came out with something called InstaClab. Right? We open sourced that, which is this taxonomy based approach for adding knowledge incrementally into a model. I mean, at this point, it supports the granular models.
Hopefully, over time, it'll support other models as well. I know Amazon's trying to do something similar about model distillation, which they announced at re:Invent a couple of weeks ago. And I think this is where we see the world going. Right? Rather than I mean, I think we're going to see in every company a small number of really large models being used.
That's the copilot as an example. That might be the anthropic, SONNET, etcetera. And then they're going to have a large number of small models that are trained, that that are using clever techniques to incrementally add enterprise knowledge into those models for specific use cases. That's how I see the world going. And then if you allow me a couple more minutes, I think the other other area that we should talk about is how I think we are now beginning to hit the wall of what we can do with these models.
Continue. But let me ask a question. I want you to continue and take those 2 minutes, Manav. But when you're talking about instructLab that uses the taxonomy, It uses essentially made up statistics or data to change the base model. Do you use that in and of itself or would you use that with RAG?
I wanna make sure you talk to are there any downsides? Is there more expense or cost, or does that not come into play? So InstaClab today, allows has this taxonomy based approach, and it comes with what's called a synthetic data generation pipeline. Yeah. So the idea behind insert lab is that you give it a handful of labeled questions or example questions.
So let's call that labeled data. So let's say you have some corpus about your policies. Let's say it's your operations policy. And you give it a handful of questions from your policy, and then you can give it a pointer to your policy documents. So that could be a link to like, the p the policy might be in PDF, could be in SharePoint, etcetera.
So what this synthetic what this data generation pipeline is gonna do is take your exemplar questions, go to the document of the corpus that you have provided it, and it will then synthetically artificially create more questions out of that. So it is all dependent upon the corpus that you have provided it rather than it making these things up by itself. So in effect, you're grounding the model in some truth. And then there is a critic model. The idea behind the critic model is that this model is going to it keeps checking to see how the train how the training is undergoing.
Right? So it ensures that the the knowledge that you're adding is incremental and additive, and then the model is not degrading or not beginning to hallucinate. So at the end of this entire cycle, out comes a model that is now has its generality restored or retained, plus your enterprise corpus enterprise knowledge added to it. The model footprint remains the same, and we are offering insert lab as a service, I think, starting with IBM Cloud and soon to be on other hyperscalers. If you're using rel AI, Red Hat, and press Linux today, you can run it on a single host with GPUs, and you can do that inferencing yourselves.
Amazon's coming out with a similar offer? So so we are in the alphabeta space. So so we're using that as a play playground, and then we'll make that available on Amazon next. What Amazon's doing with SageMaker is trying to do some something similar around model distillation. Right now, you know, this is unique to IBM.
I guess there are 2 parts of that question. Do you see it as not being unique shortly? I mean, in other words, competition and others are gonna use the same kind of approach or methodology. And is this better? Is InsurcLab, as you just described it, better with RAG, or you don't even need RAG at all when you're using InsurcLab?
I think that research is still happening in this space quite candidly. So so your your yours was a 2 part question. So InstruqtLab is something that we at IBM have been working on for almost a year now. Right? Maybe just over a year.
It was originally in alphabeta internally, then we open sourced it, and now Red Hat is building a community around it. So I think this is where I think the world is going of smaller, more cost effective models that are disturbed or trained for performance and knowledge based upon enterprise data. Right now, I think Instructure Lab is well ahead of anything else that's in the market, including the model distillation capability that Amazon provides on Bedrock. So again and and it's a similar idea. Right?
The idea with Bedrock also is with model distillation is clients can, use synthetic data generation and distill the output using a, teacher model whose accuracy they want to achieve. And then they can select a student model that they can fine tune. So you're effectively using the teacher to fine tune the student. So it's a similar idea, and I think this is where we'll see a lot more innovation happening. But did you answer whether RAG, it's best used with RAG, or unneeded?
I think we are divided on that right now, to be quite candid. I have seen use cases where the distal or the tune model because that what ends up happening during tuning is you have taken what was your own data, and you have now added it into the model. The model gets so much more context about your dataset that in many a case, you don't need RAG. In fact, in a couple of projects that we did, we even struggled to monitor the accuracy of the performance of the RAG. So what ends up happening is if you look at our typical RAG use case today so, you know, you you you do some retrieval.
You augment you take that data. You give it to the model, and then you use methods like the root scores or other scores, like from Raga's framework as an example to validate how your model is doing. With an instant lab technique, turns out that that metric itself doesn't work. Because what ends up happening is the model now has more knowledge than necessarily the context you're sending the the model to. So sometimes the the the metrics penalize, the trained model.
So we ended up doing things like manual evaluation to make sure that, you know, we were being fair to the model. So in use cases where you're doing segmentation or classification, you don't need drag at all for in in case of a train model. Even in, even in some cases where we we have used insert lab for things like, KYC, know your customer scenarios, we found that it was outperforming RAG. I think the challenge with RAG, though, or the difference rather, not challenge, I would say. I think the difference with RAG, though, is that what's gonna happen is and again, look, this is still an evolving area.
As clients are building out their RAG architecture that's here. And I mean true architecture, not the Mickey Mouse program that we can all write in 20 minutes. The RAG architectures, they are brittle. They are difficult to maintain. They have lots of moving parts.
So model distillation or the InstaClab based approach, it actually simplifies the RAG architecture. We may still need RAG just to ensure that when the data keeps changing over a period of time so if it's a if it's a customer policy and you're changing the policy on a weekly basis, maybe you don't want to retrain the model, on a weekly basis. So there might be edge scenarios where you still need drag. But by itself, InstortLab is proving that the trained model ends up getting a lot more context within the model itself, and you don't require RAG. So you might require RAG only for putting some controls and guardrails around what answers that the model is getting.
Any downside to instruct lab? I mean, is there inferencing cost that you gotta consider? Why why would you not use instruct lab then today? Almost everybody's using RAG. And you're saying the instruct lab, hey.
It takes the place of, like, 95% of all RAG, and and then you just have one model that has your data essentially embedded. It's your unique model. What's the downside? I think a year from now, every enterprise is going to be using Instructure Lab or a similar technique. I think this thing is Is it hard to use?
It's not hard to use. Is that a doubt? No. No. It's just new.
So we saw a tremendous explosion, or the industry saw a tremendous explosion in RAG. Right? So the 1st year or 6 months of GPT availability, everybody was enamored with, you know, prompt engineering. Then we tried to figure out, okay, how are we gonna use this in in in our organization? So RAG came about as one technique.
Instructure Lab was developed pretty much on heart on the heels of RAG. So I think it's just a matter of timing. So once the world at large begins to understand the power of things like Instructure Lab, I think they will all use and I think Instructure Lab will become the bare minimum. It'll be table stakes. How do you know what model to use to begin with?
I mean, like, model, IBM has its own models, very purposeful driven. Look. I use them. The reason I like our models is because well, there are models. Let's just be honest.
But I also download them to my PC. I use on a daily basis, and I don't know many other models that you can actually put on your your laptop and then go. And I don't have any GPUs whatsoever. It's fantastic. But, you know, there there's Gemini, you know, from Google.
You've got Llama from Meta. I mean, you could go on and on. If you're a customer out there, how do I know which one to use, who to trust? What do I do? I think it goes back to first principles.
Right? So in order to use any model, you probably want a model that you can trust. And how do you trust? So one way of one dimension of trust is openness. Is the model open or not?
Do you either rely on a vendor saying to you, hey. You know what? Go use this model. Don't worry about how what data set it was trained on. Or do you look at a model's pipeline and try and understand what data set the model was trained on, what was filtered in and out, what kind of guardrails they put into it, what indemnification are they offering?
So I think that's one element of trust. I think the second one is around, okay, what kind of license is that model available in? So you could have the best model that might be open source, but maybe the license itself is restricted. So you're not allowed to use it commercially. So the licensing becomes important.
Then it's indemnification. Well, what if so even though the model provider did their best, but turns out that the model ended up being trained on some dataset somewhere, somehow it slipped, and they don't do not have access to or they do not have rights to that. So indemnification becomes another point. At a starting point, in terms of trust, those would be the bare minimum tree. Then then it would be things like cost.
Right? So how much infrastructure do I require to run this model? And what's the what's the throughput of the model? Right? How fast is this does this model going to respond?
And and look, I think the way the world is going, the model itself is going to be commoditized. And they almost are a commodity already today. Right? There is over half a 1000000 models, available on on Hugging Face as I've said. So the world has no shortage of these LLMs.
I think the challenge now is who do you trust? Do you trust the bloke who may have the best performed model on Hugging Face, OpenLLM leaderboard? Or do you trust an organization that has existed for over a 100 years and has a proven history and track record in working with financial services and other clients and regulatory industries. I get you. I get you.
I often ask that question. Do you know where the data is coming from? It's one thing I'm proud of with IBM, of course. I mean, you you know exactly what data, how it was trained, fully transparent, open source, top to bottom. Many don't match that.
But isn't everybody saying they're indemnified now? How do you know what's really fully indemnified? Yes. Yeah. So there is different levels of indemnification and different types of indemnification.
So I think of indemnification really as a insurance policy. Right? So should something happen. And I think when you start reading the fine print, and believe you me, at IBM, we did, what we found was and I'm not gonna name vendors. But there are differences in indemnification itself.
So there are indemnification with clauses around only this and only that or only post back or only if you get litigated or only if you lose and so on and so forth. Quite frankly, if I'm the CSO or the chief compliance officer of an organization, why would I bother about having to worry about, well, let's think about this problem only when we get sued. I would rather go with our technology, with the supply chain of our technology that I can trust. It's it's simple as that. We may have to do chats with Manav or something every so often.
That's what we'll do. We'll have different topics. But my my question is this. Before we walked into a customer visit the other day Mhmm. You know, I had to try to keep you on focus because you were very passionate around where you think the business or the industry is going around models.
And the fact that used to be a getting all the data in and this and that, and now you say, no. I think the models are gonna go here. Where do you think the models and all this is gonna go? Like, if we're a year or 2 years from now, I want you to kinda repeat the lecture that you gave me before we're about to walk in. So what was that lecture?
That's fair. Sure. So listen. Anybody that's following a AI can attest to so for the last at least 3, maybe longer than that, there is this arm that is going on. I mean, one can trace it all the way back to 2017, 2016, but let's just focus on the recent memory.
And there are all sorts of arguments around this. So we had the c CEO, CTO from from Microsoft about AI, posit that we're going that these models are going to get get 10 x more data and 10 x more compute thrown at them every year for the next 5 years. So I think certainly certainly there was one measure of that. So in 2022, OpenAI came out with a paper, and, you know, it was it was almost a bit too technical. And it was around the scaling laws of these LLMs.
And the theory that they were positing was, they looked they did a comparison of a bunch of models available at the time. And their conclusion was that the world at large, Google and others, they were just not training these models with enough data. So they came out with what was known what became known as the Chinchalla scaling laws at the time. Chinchalla was the name of the model that they came out with, at that point. And their theory was that for every parameter, you need to train the model on 1.7 x the number of tokens.
So if you have a, you know, 100000000 parameter model or a 100000000 parameter model, you should be training it on 170,000,000,000 tokens of data and so on and so forth. And then they had some affinity to compute, etcetera. Fast forward now, every single model, IBM and OpenAI and Google and, and and Supercloud and everybody else, all of these models have all followed the same trajectory. Right? So we started to see more and more compute rich environments.
So these are the GPU farms as well as data rich models that were created. Right? So if you look at the Granite family of models, they're trained on 10,000,000,000,000 tokens of data. Right? Because we are still within the limits of what this paper talked about.
In the last, I'll say, month or so, there are lots of interesting debates happening across the technology or the AI industry that the models are no longer scaling, and then they're hitting walls. So there are walls of scalability. And basically, the the the the discourse is that there is just not available data enough anymore. So all the data that was publicly available, that's already been consumed. It's already been collected, deduped, and so on and so forth.
So now if you if if if the world is hitting that that wall of scaling, so you can't just scale organically or horizontally into into that previous law, what do you do next? So now what what you see happening is scaling being considered in different terms. One element of scaling that you can already see that's happening is around context window. So so when we had Granite 3, 3.1 that just came out, all models are now at 128,000 token length token context length. So that's more like a 300 page book.
If you look at the Anthropic Sonnet, that's our 200,000 context length. GPT 4 o is approximately the same and so on and so forth. And you'll see this context window becoming almost infinite, if not infinite. So you'll see long term memory being added to it. So if you can scale horizontally in terms of how big the model can be, maybe you scale think about scaling it differently.
So at some point, I'll be able to ingest in a prompt a library to get a summary if they if they Exactly. Alright. So so that's one. But I think the other way so that's one actually. So let's call it dimension 3.
Right? So we talked about dimension 4. So we talked about compute. We talked about data. We talked about number of model parameters.
So now this is memory. The 5th thing that's come out recently is, when GPT 4 o came out or g or OpenAI came out with 401 mini or 4 o mini or 4 the o one is the name of the model. Where if you notice there, they're actually giving more compute time to the model. So they call it giving the compute they're giving the model more thinking time. I think that's an interesting element.
So back to your point, Al, take a model that has all this context. You throw, say, an entire library or an entire book at it. But we as humans, we obviously want quicker answer. But if you want the answer quicker, the model has to then make certain choices in what areas it picks and what areas it perhaps ignores. But the other dimension that we're trying to scale is with this, we allow the model more compute time or more thinking time so that the mixture of experts that are there in the model that are thinking about the task you have given it to do can actually do the task.
So I think that's another dimension where the world is gonna go. So is it gonna be like a toggle where you can, like, say, hey. I I'll give it 3 minutes. I'm willing to wait because I wanna see what the heck it comes back with and uses all the Well, I think I mean, that'll be one way. I think the other way would be just a different model variant.
So we will have, for example, for critical use cases, for critical scenarios. So let's say it's a it's a, human assistant or or an AI assistant human augmented human augmentation of an AI assistant with for for, say, health care or for, credit card fraud or for, you know, any any this any type of scenario where we want an answer, but we can wait a few minutes. So there, we want real diligence being done by this technology. Right? So I don't need the model to respond quickly for everything.
Like, if let's say I'm planning my vacation, so I have a AI assistant that's vacation that's doing my vacation planning. I don't need this thing to answer quickly. I wanted to go through all possible combinations of all of the trips, all of the capabilities that are there, and make the best package for the best experience that I can in the money that I have. So I think that will be another dimension. And I think the final dimension that at least I can see right now is really going to be around function calling.
I think we have barely scratched the surface. The world has barely scratched the surface on AgenTic AI. So at the start, I I referenced a little bit around this whole MCP or model context protocol that Anthropic came out with. Man, that is amazing. Like, that's the the idea behind that is if you think about any LLM that you're using today, it doesn't matter IBM or third party and so on, what do you need?
What do you need for that LLM to be truly effective? So you need you need to be able to deploy on infrastructure that that, you know, you have access to. So it has to be small enough, fast enough. We talked about other things like, you know, it has to be open. I can trust it.
Blah blah blah. But for this LLM to be truly beneficial, I want it to do more than just just generate text. I want it to do things for me. Right? I want it to execute certain things.
I want this thing to have some agency. Right? Hence, agentic AI. It's gonna have some level of autonomy. Go run a task for me.
Go run this file. I mean, if I'm a programmer, and debug it. Right? Deploy it on, say, a cloud on on Amazon as an example. So create a VPC, deploy my program, execute the task, collect the logs, see if the logs are okay.
If not, go fix those errors. Now that is a complex, multi step agent orchestration. So what do you need for that agent to run? Well, you need certainly a long context window. Guess what?
We already have it. But what else do you need? You need a way, an easy way to give that agent or agents context. Right? Stuff for that agent to understand and do.
Because if an agent doesn't know, doesn't understand what you wanted to do, and you don't have an easy way of giving it to the agent, well, you're gonna have a tough time building hundreds of thousands of agents. So what Anthrobot came out with is this model context protocol. I think it's an ingenious way of creating a quote unquote a they call it a server. And the idea is that you can run-in run this into a, for example, a Visual Studio Code extension, and you can create, or use existing, servers that are available that give it identity capability. And I know this is a complicated way of understanding it, but I swear there is there is method behind this madness.
So I can now create, quote, unquote, a GitHub server. So that's running it's basically a in visual studio code on my laptop or on anybody's laptop. You can run this capability that will allow your GitHub context made available to any LLM that you're using. IBM LLMs from Gold, from Meta, OpenAI, you know, CRAN, and so on and so forth. Okay.
So GitHub is 1. Then I can do other things like, hey. You know what? Writing code, writing automation with Puppeteer, or creating Docker files or deploying it on Kubernetes or doing copywriting. Like, any knowledge task that we do, I can now have an agent do something for me.
So here's a little experiment that you can do. For example, if you go to doesn't matter. Pick any LLM. If you go ask the LLM, hey. Can you tell me how many GitHub repos I have?
Of course, it doesn't know. Like, it wants you know, it it has no connection to your GitHub repo. This is an orchestration engine that's allowing any LLM to connect to an endpoint that you provide, and then the magic happens. Look. I think that's amazing.
Look. I I had, like, a patent attorney on, that I haven't even posted as of our recording, but it will be posted here before too long. Like, whether you're an attorney or not, this is about to get crazy. I mean, like, the guy before that that has been posted, we talked about whether this hyper this is hype or not. And, yeah, I think there is some hype, particularly as it relates to stock and stuff like that.
Man, this stuff is real. It's gonna change the game. But where I was going with this is I was asking a couple questions to the patent attorney who couldn't answer this. And by example, I said, hey. What happens, particularly in agentic AI, when you have autonomous actions being taken, who's responsible?
The best answer he could give me is like, well, it has to be with malintent. If they created this with malintent and then it took action that had a negative effect, with somebody, an individual or it happened with malice, then they could be held accountable, but otherwise, probably not. Man, this is crazy. Pretty soon we're gonna have AI patent in AI. And I haven't read, Yuval Noah Harari's book yet, but I hear he tackles some of this stuff.
It's nexus, so that's on my list. But, man, I'm I'm scratching my head and I'm like, wow. Even for me, I'm in the business. You and I are doing this every day, but I'm like, this is about to get crazy. I don't and I don't know the answers to all this stuff, man.
We don't have a book. We don't have a playbook for this. You know? I don't think it I think it's too early. So one view I have of this is so so let's talk about what agentic AI is.
So it is I think we as technologists are enamored by it. But really, it's automation fundamentally. It's a piece of software that's doing a bunch of tasks on on behalf of a human. Really is what it does. So can that be patented?
I mean, I'm sure the mechanics of what it's doing and how it's doing, you can you can create a utility patent for that. But I think the broader question is, as this thing evolves and AI is and many of these AIs are claiming or, you know, certainly passing the Turing test. I think now we're entering into the realm of, uncanny valley, right, where you are where you can no longer decipher whether you're talking to a human or to an AI bot. So for example, over So you think we're gonna get there soon? Are we gonna be there soon?
I think we're almost I think we're almost there. So I I was reading this article, and I haven't tested it. But apparently, OpenAI came out with with a with a number. You can now call ChargeGPT, and you can WhatsApp ChargeGPT. Now you're already having conversations with AI.
So I think that's one mechanism. Now in enterprises, we're already seeing use cases where I'll give you a practical example. It's a Canadian client, Canadian customer. They were working with IBM to create a AI based system to review the RFPs and the contracts. So RFP comes you know, response comes back to them.
Yeah. And, of course, every vendor has their own I mean, even though they have a template, the vendors give them a response, but a human has to read it and decipher it and so on and so forth. So they were looking for a way to do that. I was like, great. We can certainly help you with that.
Now it turns out in IBM, we are doing the opposite. Right? Like, we want to simplify and automate the way we respond to RFPs. So now you have, you know, AI talking to AI, back to your point. So I think I think People haven't realized yet, but it but it but it's here.
Alright. So look. I gotta end now because, we've already been talking for quite a while. We'll have you on again. See, I knew this was gonna be good.
Right? That's why we spent most time on models. I'll probably have to name this one models. But before I go, I got I always ask, like, a one personal question or 2. The one I have, maybe I'll ask 2 real quick, but, I don't know if they're easy.
The first one's easy. And that's how do you learn, man? Like you talked about earlier, how it is difficult to catch up. And one way I learn is I do these podcasts. I hear from different voices.
That's how this whole thing got started, and that's why I haven't been able to give it up yet because it's self serving, practice. But how do you learn? Where do you go to learn? Give me your cheats. So I've tried it all.
Right? Reading, podcasts, videos, and hands on doing. I finally settled on on the last 2. So I like to experiment. I'll write something.
I'll write code every single day. It is a nonnegotiable math. It took me a long time to get to a stage, but once I set up my ID and etcetera, I am writing code every single day. That's cool. So that's one.
And the second thing is I I mean, YouTube is an amazing resource. So I'll go cherry pick specific talks or specific sections where I feel, you know, you're going to have world's leading thinkers. That's my 2 tips there. No. That's great.
Alright. So here's the more tougher question, but I'm curious as what your answer is gonna be. The question it's my favorite question. It'll continue to be I I'll get a new question someday, but what's true, but nobody agrees with you on? That's a problem.
What's true, but nobody agrees with me on? I think maybe I'll give you a very banal answer around AI. I think what nobody agrees on, in my opinion, but which I think is true, is we are going to invariably create conditions that we are going to be leaving behind lots of humans unless politicians and policymakers intervene. Are you an advocate for more bureaucracy? Is that what you're trying to tell me?
I think we need light touch regulation. So you and I and those that are listening to this, they're probably exceptions to the rule. Right? In the global world population of, what, 8,000,000,000 or so, how many people truly know about AI? Look, I came from I grew up in India, you know, middle class India, however you want to call it.
I've seen really poor people around the globe, not just in India, but other parts of the world. What I'm always terrified is if this tech and the power of this tech is not made available to those humans, is it going to help them, or is it going to hinder them? And I'm not convinced yet that it's going to truly help them unless some regulatory framework is put in place. I do think we've got a real risk of the have and the have nots. Yep.
You you You said it better than I did. Perfect it's well, it's a concern. It's a huge, huge concern because I take it for granted now. You and I work on this every day, and we're getting excited about this or that. To your point, I don't even consider now that there's people out there that don't even consider AI.
They don't even know what exists. Yep. They don't even know what's coming. Alright. That's a scary thought to end on.
Thank you for for agreeing. I asked Manav to be on it just on a whim, and he thankfully said, yeah, let's do it. So there was no prep, guys. Honestly, 0. No notes.
Not like I have notes anyway, but no prep whatsoever. So thank you for just having a chat with me today. Manav, I think it's gonna be good, and I think we're gonna have to have you on more often just to have a chat. We'll have a different topic, and we'll go from there. So thank you for being here, man.
Thank you for having me. Good luck to be back. Alright. And and listeners, thank you. I always thank you, and I'm always gonna thank you.
Hit us on almartin talks data atgmail.com. If you have comments, questions, concerns, rate us wherever you are. Other than that, look, I hope you had a happy holidays. Be good. We'll see you on the podcast.
Bye bye.