Exploring AI in Business Intelligence: Vectorization, LLMs, and Service Providers with Avi Perez

In this episode of The Business of Tech, host Dave Sobel engages in a deep conversation with Avi Perez, co-founder and CTO of Pyramid Analytics, who brings over 16 years of experience in the field of artificial intelligence and business intelligence. The discussion begins with a historical perspective on the evolution of AI, particularly in the realm of business intelligence, highlighting how AI has been a central component of BI for many years, long before the recent surge in popularity due to tools like ChatGPT. Avi emphasizes that AI encompasses a broad range of functionalities designed to assist users in navigating complex software, and he distinguishes between AI, data science, and machine learning.

As the conversation progresses, Avi addresses the current landscape of AI in business intelligence, particularly the integration of large language models (LLMs) and their transformative potential. He notes that while LLMs have significantly enhanced natural language querying capabilities, the real challenge lies in bridging the gap between these advanced technologies and their practical application in business contexts. Avi argues that the responsibility for this integration falls on software vendors and service providers, who must ensure that quality data is fed into these systems to avoid the pitfalls of "garbage in, garbage out."

Avi further elaborates on the emerging role of service providers in the context of vectorization, a process that combines public and proprietary data to enhance the functionality of LLMs. He explains that vectorization allows organizations to query their unique datasets without compromising data security or performance. This process requires skilled professionals who can effectively manage and optimize the data, ensuring that it is structured and semantically rich enough to yield meaningful insights when queried through LLMs.

The episode concludes with a discussion on the future of data governance and the critical role service providers will play in helping organizations navigate the complexities of data quality and management. Avi highlights the need for a thoughtful approach to data structuring and the importance of understanding business use cases to optimize the vectorization process. This evolving landscape presents a significant opportunity for service providers to deliver value by facilitating the integration of advanced AI technologies into business intelligence frameworks, ultimately enabling organizations to make more informed decisions.

💼 All Our Sponsors

Support the vendors who support the show:

👉 https://businessof.tech/sponsors/

🚀 Join Business of Tech Plus

Get exclusive access to investigative reports, vendor analysis, leadership briefings, and more.

👉 https://businessof.tech/plus

🎧 Subscribe to the Business of Tech

Want the show on your favorite podcast app or prefer the written versions of each story?

📲 https://www.businessof.tech/subscribe

📰 Story Links & Sources

Looking for the links from today’s stories?

Every episode script — with full source links — is posted at:

🌐 https://www.businessof.tech

🎙 Want to Be a Guest?

Pitch your story or appear on Business of Tech: Daily 10-Minute IT Services Insights:

💬 https://www.podmatch.com/hostdetailpreview/businessoftech

🔗 Follow Business of Tech

LinkedIn: https://www.linkedin.com/company/28908079

YouTube: https://youtube.com/mspradio

Bluesky: https://bsky.app/profile/businessof.tech

Instagram: https://www.instagram.com/mspradio

TikTok: https://www.tiktok.com/@businessoftech

Facebook: https://www.facebook.com/mspradionews

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

[00:00:02] Obviously we're all talking about AI, and I love learning more about it, particularly from people that have been focused on it a long time. Avi Perez has been doing this for 16 years, and we dive into a bunch of different topics, including Vectorization. Do you not know what that is? You probably should because it may be part of the next big trend on this bonus episode of the Business of Tech.

[00:00:26] Today's episode is supported by Huntress. You want to focus on your clients and are always looking for ways to get more time. Use Huntress' fully managed cybersecurity platform to fight off cyber threats. Huntress is more than cybersecurity software for endpoints and identities. It's a 24 by 7 security operations center. It's security awareness training, community engagement, and dedicated partner support with an average CSAT score, and a

[00:00:54] of 99.3% of 99.3%. Technology can only get you so far. Human expertise is what's needed to truly elevate and protect small businesses, and you get that with Huntress. Secure your clients and help them thrive with the number one rated EDR for S&Bs on G2. Visit Huntress.com slash MSP Radio to find out more.

[00:01:18] Well Avi, thanks for joining me today.

[00:01:21] Avi Perez- Thank you. Thank you. Very nice to meet you today, Dave.

[00:01:24] Now, I got to start with the kind of historical perspective because you've been at the helm of Pyramid Analytics for 16 years now. We all, some of us feel like AI just exploded on the scene. That's mostly because of what OpenAI did with ChatGPT. Give me a sense of your view of what's happened with AI specifically around business intelligence over the past 16 years.

[00:01:48] So, it's an interesting question because I can't think of too many generic technology industries that have had AI as a central component for that long.

[00:02:02] And AI, it happens to be one of them.

[00:02:05] And AI itself has had AI embedded inside its product for eight. And AI is a broad category. It's not just the cool new chat dealer and stuff. We're talking any kind of software or heuristic that can help a user to use software.

[00:02:20] And we differentiate AI from data science and machine learning. I mean, AI is really based on DSML, the abbreviation. But we separate it out because DSML is a slightly different part of the world.

[00:02:33] But AI itself, which is functionality in the product to help use the product, kind of like your, you know, your map, your GPS map is a tool to help you navigate. That's a form of AI and a very good example of it. We take it for granted today.

[00:02:48] In BI, AI has been around for years. It's been anywhere from cool, slick, a wizardry that helps you use very complicated software all the way through to, you know, heuristics that can work on how to build a clever formula.

[00:03:03] And in more cool and modern era is the automated insight. And we arrive to the current moment in time, which is the use of large language models to sort of superpower natural language clearing, which is sort of really at the right point in time, the cutting edge of the whole process.

[00:03:23] But I must add, you know, you know, natural language clearing has been around BI for years.

[00:03:28] Shurement has had it in its product for five, six years, and that predates the current version of LLMs that people know and would like to think love.

[00:03:39] But it's been around for a long time.

[00:03:41] It's just we're in a different era now.

[00:03:43] Sorry, do it.

[00:03:44] But it feels like something is a little different about now that we've supercharged them with LLMs.

[00:03:48] I feel like that's different.

[00:03:50] And in particular, the area I wanted to get your insight a little into is that I'd like to think a lot about how we make sure clients and clients implement this responsibly and effectively.

[00:04:01] And it feels like there's a distinct gap between those product developers that are making the LLM empowered and supercharged products and the end customer.

[00:04:12] Do you think about the role that those service providers are going to play in making sure that that is done responsibly and correctly and specific services that you think are important there?

[00:04:26] So there's no doubt.

[00:04:28] And I think to start at the beginning of your question, is there a gap between – I don't want to call it hype because we've all seen the functionality.

[00:04:38] We've all played with it.

[00:04:39] It's not hype.

[00:04:40] It really does what it looks to do.

[00:04:42] But making that work, certainly in a business context, there's definitely a gap there.

[00:04:46] It's still a bit of a chasm to cross.

[00:04:51] But interestingly enough, at least in our opinion, this is from the premise perspective, it's not necessarily an issue of the LLMs themselves that have to do it.

[00:05:00] It's all the software they want to incorporate the LLMs.

[00:05:02] That's the chasm that needs to close.

[00:05:04] It looks a lot harder and it is a lot harder because there's that gap.

[00:05:12] Then continuing that sort of thought process, who's going to close the gap?

[00:05:16] Well, part of it is the software vendor.

[00:05:18] Again, it's a pyramid story.

[00:05:20] But the entire success of Pyramid's gap-closing strategy is the implementation of the data strategy, which brings us all the way back to the service provider.

[00:05:33] Exactly.

[00:05:34] So to bring it all together, it'll work something like this.

[00:05:37] The LLMs are absolutely phenomenal in terms of what they can do.

[00:05:40] I mean, their interpretive capability is mind-boggling.

[00:05:43] It's just amazing.

[00:05:44] It's a game-changer for what was there, let's call it, two years ago or three years ago.

[00:05:49] Then you need the vendor and the vendor software to use them properly and to make good use of them.

[00:05:54] This is a case of Pyramid and in our industry of all the other vendors.

[00:05:58] But none of it matters.

[00:06:00] None of it will really go anywhere without somebody feeding Pyramid and ultimately the LLMs of good quality structure, good quality data,

[00:06:10] and a deployment that is meaningful and reasonable.

[00:06:14] And from that perspective, it all sits on top of, to your point, service providers to make that happen.

[00:06:21] It's otherwise garbage in, garbage out.

[00:06:23] And that's ultimately the story.

[00:06:25] And that's ironically the same problem that we had last year and five years ago and a decade ago.

[00:06:31] All the cool tools in the market are useless without the right people knowing which buttons to push to turn them on.

[00:06:39] It seldom is random people in a company.

[00:06:43] It requires a degree of expertise.

[00:06:45] It could be very sophisticated uses in the company.

[00:06:49] And by and large, it's normally service providers and consultants who come in to make that happen.

[00:06:54] That's pretty much the story.

[00:06:56] It's been like that for years.

[00:06:57] And now we have the next era.

[00:07:00] It still requires that kind of lifting to make it work.

[00:07:05] All right.

[00:07:05] So that's the opportunity, right?

[00:07:07] That's for this audience.

[00:07:08] That is exactly the highlight of the opportunity.

[00:07:11] Correct.

[00:07:11] Define for me what you think that looks like.

[00:07:13] What does that service offering look like?

[00:07:15] What's required to pull it together?

[00:07:17] Like, you know, if you were building a managed services provider or a services provider today, what would that look like?

[00:07:26] So I'll tell you what it is right now.

[00:07:28] And I'll tell you what I think it will be in the next.

[00:07:31] I don't want to give a time frame because this thing is moving so quickly.

[00:07:33] If I said to you, yes, that would be stupid because it should be mud.

[00:07:38] So let's call it the immediate to short term.

[00:07:41] Right now, the real lift is, again, is in the data space and turning on quality data and exposing it, which means good quality structures, semantics, well-ported out designs that are good for analysis.

[00:07:58] In our space, that's a requirement.

[00:07:59] Once you have that, then Primaid can itself, and I thought about our own, I would say this is probably a generic feature for any vendor that's in the equivalent space.

[00:08:09] They can come along, attach their own semantic capabilities to their data structure and then expose them to the LLM and book.

[00:08:18] The end-end user is going to talk to the LLM to get their answers.

[00:08:22] We'll see a very seamless experience.

[00:08:24] So right this moment in time, the core services offering is that.

[00:08:28] And like I said a second ago, it's very similar to what it was yesterday prior to the LLM.

[00:08:32] And if you were doing LLM-centric stuff, it's the same for you.

[00:08:36] However, here's what's coming next.

[00:08:40] What's coming next is the next upgrade in the LLM deployment or the winter grade.

[00:08:47] And it requires the next skill set, which is around two things.

[00:08:52] The first is vectorization of data.

[00:08:55] So I don't know if any of your listeners have heard of something called RAG.

[00:09:00] RAG is basically to ultra-simplify the business of taking proprietary corporate datasets, let's call it information, and being able to query it and use it as part of the LLM flow without taking that information and putting it in the LLM itself.

[00:09:16] So let's assume I wanted to ask the question about my company, Acme Corporation, you know, give me five points about Acme's success.

[00:09:25] It's not like the LLM has the details on Acme.

[00:09:29] What's probably happened is someone's taken all of Acme's documents, put them into some kind of a data link that's typically something like a graph database, and then using RAG techniques, when I ask the question as an interviewer, it takes a piece of the database,

[00:09:44] it's been vectorized, and the LLM and merges it together and gives me an intelligence.

[00:09:49] So that was probably a super over-centrification of the story.

[00:09:53] But roughly speaking, it's blending these two things together.

[00:09:56] So vectorization, in the broader sense, is the grand gluing together of public open source information that the LLM has been trained on, that it knows about.

[00:10:08] You know, what is the cost of pizza in Fifth Avenue on a Monday?

[00:10:12] It knew that because some website somewhere captured that, and so did the other $10,000 website.

[00:10:18] And then something specific about my business, which is not public.

[00:10:22] And I ask a question like, you know, can you tell me my top five performing salespeople?

[00:10:28] Well, the LLM has no idea who that is.

[00:10:31] What does it know about my sales team?

[00:10:33] It's not a public document.

[00:10:34] And the vectorization is going to be the glue that brings in that private and public thing together.

[00:10:40] And that's where the service providers kick in.

[00:10:42] So it's being part of the process to support the vectorization of the data, creating a very, very high-quality semantic layering on top of that to make their vectorization work.

[00:10:54] And then ultimately using that to – people are using the word fine-tune the LLM, but I don't think it'll ever be the same – the idea of fine-tuning, but more about fine-tuning the vectorization process.

[00:11:05] So that your end-users can ask really, really idiosyncratic, specific, bespoke questions about themselves and use the power of the LLM to get an answer.

[00:11:17] I don't know if that was confusing what I just said, but this is the next era.

[00:11:22] And that's where the service providers are going to find a huge cornucopia of activity.

[00:11:28] Because to make all those bits and pieces of work is going to be quite a mission.

[00:11:32] And it's not something that stop-waves-solving are going to require people to sit down and make it work.

[00:11:38] So I'm going to repeat this back there to you because I think I get it, and I want to make sure that I've done a good job of processing this.

[00:11:44] So the idea here is that the LLMs themselves, both open source that would be available to everyone, or a closed source one, like something provided by an open AI, is trained on publicly available information.

[00:11:56] And we need to combine that with the private information that an organization has.

[00:12:00] A super simple example of that is when I'm interacting with a chatbot, and I say, hey, I want to talk about these three documents, and I've now provided them to you.

[00:12:12] That puts some context for the chatbot so that it's combining its public knowledge with the private one.

[00:12:17] And the vectorization is just this at a much larger scale.

[00:12:21] Have I gotten it?

[00:12:22] Yeah, you nailed it.

[00:12:24] And I would give you the real problem is vectorizing or using your idea of three documents is one problem.

[00:12:32] What would you do if I handed you my corporate operations database with a billion rows of data in the fact that it would just be impossible?

[00:12:41] And you can't really fine-tune your LLMs.

[00:12:44] People have misunderstood what fine-tuning means.

[00:12:46] So yeah, this is exactly what needs to be now.

[00:12:48] You got it.

[00:12:49] Cool.

[00:12:50] Now, I like complex problems because in mystery, there is margin, and we can make money here, right?

[00:12:55] So I want to say, where do you think there's a role here?

[00:13:00] Is it about getting the data operation first?

[00:13:03] What's the role, given all the problems we have with data quality and governance, what's the role of the service provider helping customers get to the point of being able to do vectorizing?

[00:13:15] So, again, it's early days yet.

[00:13:19] Pyramid, for example, is working on prototypes right now to solve this headache, and it's a tremendously big headache.

[00:13:25] There's lots of pieces of technology equation that are still not clear yet.

[00:13:31] But let's call it the productionized capabilities in this space, especially around RAG for documents.

[00:13:37] And we're not talking about documents.

[00:13:38] We're talking about databases.

[00:13:39] It's a slightly different problem with that.

[00:13:42] As powerful and as clever as they are, they don't really solve one of the issues you just mentioned, governance, data security, performance on such a vast scale.

[00:13:54] So all of those issues have yet to be solved.

[00:13:57] And speaking of ourselves here, I'm sure somebody out there has got some magical capability already in the oven cooking away.

[00:14:06] But as it stands now, there's going to be a tremendous amount of input required on how to optimize the vectorization.

[00:14:15] Let me give you some examples.

[00:14:18] Now, it's not unusual for us, again, about three minutes, to come to a customer with a gigantic database.

[00:14:25] It's a data lake.

[00:14:26] It's a data warehouse.

[00:14:27] It's a data something.

[00:14:28] It's what I like to call somewhere in the data state.

[00:14:31] It could be notionally 1,000 tables, and each table's got, I don't know, 10 columns.

[00:14:37] Well, that could be 10,000 columns of data, regardless of depth.

[00:14:41] That's a lot of different things to consider.

[00:14:44] If I said to you we're going to vectorize the whole thing, it could be a massive, massive process, a huge job.

[00:14:50] And vectorization is not cheap.

[00:14:53] It's not necessarily easy.

[00:14:54] And more importantly, basically, you almost need the doubling of it, literally, in a functional sense, right now doubling.

[00:15:00] So you're going to, first of all, need someone who can come in and pair things back and cut the noise out.

[00:15:07] Someone really understands the business problem and can understand the technological answer to the business problem without giving up on cool technological leap that this would offer.

[00:15:18] The second thing would be, for example, an ability to sit down and work with vectorization program, projecting here what it would be like to optimize the way they work so that we combine different fields and facts and whatnot into intelligent groupings such that they're most useful to the use case.

[00:15:41] An example could be such, let's assume you have a customer table that glues on to your sales data warehouse.

[00:15:52] And the customer table's got 100 columns.

[00:15:55] And then there's another 100 in the adjunct table here and 100 in another adjunct table there.

[00:16:00] And you're capturing every last detail about the customer, their address, their gender, their children, their this, their that, and these thousands of facts.

[00:16:09] It could be impossible to load that up into a vector database and get the performance and the scale you want at the right cost.

[00:16:17] It's not going to be cheap.

[00:16:19] There's going to be something to it.

[00:16:21] So someone has to come along and pare it back intelligently, understand the use case, and give not only just a prescription, but also enact that prescription to make it work.

[00:16:31] And I think it's going to require heavy lifting or a lot of very skilled, intelligent people to make that work.

[00:16:37] And in some respects, it's no different to me coming to you two years ago and saying, how about I build a reporting solution or an analytic solution or a company data warehouse?

[00:16:46] And someone tearing back the noise to something that is functional and usable to you.

[00:16:50] If I gave you everything, you get overwhelmed with just too much junk.

[00:16:55] Same idea, but just a completely different league of functionality.

[00:16:59] And I would argue with a different implication, because if it's too big, too fat, you can be too slow.

[00:17:07] And no one wants to wait three minutes for an answer to ask that question.

[00:17:11] You want them to get out.

[00:17:13] So I'm not asking you for a timeline because I understand what you're getting at, that this is where it's going.

[00:17:18] But I think you've just properly identified the next portion of what data governance is going to look like for organizations.

[00:17:25] And I want to end there because that's the opportunity.

[00:17:28] Avi Perez is the co-founder and CEO of Pyramid Analytics, where he's led the integration of AI-driven business intelligence solutions for over 16 years.

[00:17:37] His work focuses on combining generative AI and business intelligence to simplify corporate decision making, pioneering the concept of GenBI to deliver actionable insights from natural language queries.

[00:17:49] Avi, this has been great.

[00:17:51] Thanks for joining me.

[00:17:52] Thank you, Dave.

[00:17:53] Much appreciated.

[00:17:56] Looking to reach an audience of thousands of MSPs and IT service providers?

[00:18:01] Put your ad right here on The Business of Tech and be on the show that 64% of MSPs report having listened to.

[00:18:09] A recurring top 50 tech news podcast.

[00:18:12] There are affordable options for you to reach our audience, and we can support any budget.

[00:18:17] Podcast listeners are more engaged, have a higher level of brand retention, and are more willing to listen to ads here than any other avenues.

[00:18:28] Want to know more?

[00:18:29] There's information at mspradio.com slash engage, including a button to book a time to talk.

[00:18:36] I'm looking forward to that discussion.

[00:18:40] The Business of Tech is written and produced by me, Dave Sobel, under ethics guidelines, posted at businessof.tech.

[00:18:48] If you like the content, please make sure to hit that like button and follow or subscribe.

[00:18:53] It's free and easy and the best way to support the show and help us grow.

[00:18:58] You can also check out our Patreon, where you can join the Business of Tech community at patreon.com slash mspradio or buy our Why Do We Care merch at businessof.tech.

[00:19:11] Finally, if you're interested in advertising on this show, visit mspradio.com slash engage.

[00:19:18] Once again, thanks for listening to me, and I will talk to you again on our next episode of The Business of Tech.

[00:19:27] Part of the MSP Radio Network.