Data Security, AI Governance, and Data Management in SaaS and AI Applications with Yasir Ali

One of the big topics discussed on The Business of Tech is data management and its relationship to data security, particularly in the context of AI technology. The guest, Yasir Ali, founder and CEO of Polymer, shares his journey from Wall Street to founding a platform focused on data loss prevention for SaaS and AI applications. He highlights the complexity and importance of data migrations, emphasizing the need for efficient tools like Movebot to streamline the process.

Ali delves into the evolving landscape of data security, focusing on the shift towards tracking data in cloud SaaS applications and addressing potential data leakage. He emphasizes the need for organizations to adapt their security measures to protect valuable information residing within various business workflows and applications. The discussion highlights the challenges posed by the changing nature of data storage and sharing in modern cloud environments. The conversation then shifts to the role of AI in data governance and security, with Ali underscoring the significant impact of AI on data management practices. He explains how AI technology can enhance data classification, monitoring, and governance, particularly in the context of unstructured datasets and diverse data sources. The discussion underscores the critical importance of AI governance in safeguarding sensitive information and mitigating risks associated with AI adoption.

Ali concludes by emphasizing the need for organizations to understand and classify their data assets effectively to leverage AI technologies successfully. He highlights the role of managed service providers in assisting organizations with data scans, risk assessments, and data classification to enhance security and enable AI-driven solutions. The episode provides valuable insights into the intersection of data security, AI governance, and the evolving data management landscape.

Supported by: https://movebot.io/

All our Sponsors: https://businessof.tech/sponsors/

💼 All Our Sponsors

Support the vendors who support the show:

👉 https://businessof.tech/sponsors/

🚀 Join Business of Tech Plus

Get exclusive access to investigative reports, vendor analysis, leadership briefings, and more.

👉 https://businessof.tech/plus

🎧 Subscribe to the Business of Tech

Want the show on your favorite podcast app or prefer the written versions of each story?

📲 https://www.businessof.tech/subscribe

📰 Story Links & Sources

Looking for the links from today’s stories?

Every episode script — with full source links — is posted at:

🌐 https://www.businessof.tech

🎙 Want to Be a Guest?

Pitch your story or appear on Business of Tech: Daily 10-Minute IT Services Insights:

💬 https://www.podmatch.com/hostdetailpreview/businessoftech

🔗 Follow Business of Tech

LinkedIn: https://www.linkedin.com/company/28908079

YouTube: https://youtube.com/mspradio

Bluesky: https://bsky.app/profile/businessof.tech

Instagram: https://www.instagram.com/mspradio

TikTok: https://www.tiktok.com/@businessoftech

Facebook: https://www.facebook.com/mspradionews

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

[00:00:02] One of the big topics on the show, data management. How is that a service offering? How does it relate to data security? And how does AI change the whole thing? Well, Yasir Ali joins me today. He's the founder and CEO of Polymer, which is a platform focused

[00:00:17] on data loss prevention for SaaS and AI applications on this bonus episode of the Business of Tech. Data migrations are complex and irritating, creating days of frustration from setup to cutover. Movebot was built from the ground up to fix that.

[00:00:35] Movebot is the simplest and fastest data moving tool there is, fully hosted with no infrastructure, no virtual machines, none of that. Sign up, connect, scan, and you'll be moving data in minutes. Techs of all levels can now move terabytes per day with Movebot.

[00:00:52] The magic lies in how Movebot simplifies and autoscales your migration with modern cloud technology, handling proprietary doc types, file name sanitization, permissions, and cutover with detailed reporting and alerting at every step. Start moving data like a pro at movebot.io. Well, Yasir, thanks for joining me today.

[00:01:17] Thank you for having me, David. So I'm going to start with a little bit about the journey because you've gotten to get to Polymer. It was a bit of an unusual journey, and I want to understand that a little bit before we

[00:01:29] dive into the data security pieces. Tell me about starting on Wall Street to going to founding Polymer. What motivated you to move from finance to data security? Well, I think the decision was forced on me by the great financial crisis, but essentially,

[00:01:45] I started off as a quant developer writing C++ code at Bear Stearns Mortgage Group, moved on to trading pretty quickly. I was a mortgage-backed securities trader all the way through financial crisis, hedge funds, and various prop desks.

[00:02:03] And after that, kind of like, okay, this trading gig, there are less of them around. What's there to do? I love tech. I was always kind of involved in it during my trading days also

[00:02:14] to measure risk. I was like, okay, let's kind of start some consulting work and see where that goes. This is circa 2010-11, right around when Facebook was gaining traction and startup was a thing

[00:02:26] again, and tech was kind of coming in vogue. And didn't know any better and just got into some consulting work, building, architecting out data management solutions at large banks, master data management programs, cloud migrations, touching upon governance, security,

[00:02:45] and privacy, and saw a problem which kind of led to getting started with building Polymer with my co-founder. Now, with the conversion to everybody being online and most moving application data from what felt like the secure walls of a closed land to living out

[00:03:04] in the world, I'm intentionally noting that security is a relative thing. But by moving outside the organization, you focus really on sort of this idea of tracking the data more in cloud SaaS apps and looking at leakage. What's the guiding philosophy there in terms of what you're

[00:03:20] looking for to enforce data security? So since the days of Oracle when first started databases, the idea for technology vendors, partners, CIOs, CTOs is anything valuable will get transformed and saved into my database or my analytics layer. However, what I was seeing in reality was

[00:03:46] there was a lot of information which was sitting within business workflows itself. And it just stayed there within the systems, within these applications. And more and more with the cloud, like your human resource system will just stay within Workday. Your ticketing system,

[00:04:04] your CRM will stay within Salesforce. You might be dumping into analytics layer, but you might not necessarily be even dumping this into a database of any sort. So when kind of more and more workflows are residing end to end within this application

[00:04:19] stack, my chat is in Slack or Teams. My file is in OneDriver, SharePoint, all my files. That's how I'm going to enable share files directly from there. My NDAs, my customer agreements, they're all sitting in these platforms. So the world has changed, but the way it's being protected

[00:04:38] still seems it's a little bit lagging. People think about building firewalls, securing firewalls or endpoints. But the most porous endpoints are sitting within this application stack by organizations and companies and employees of those organizations to share

[00:04:57] information with a link, download it locally. That's kind of where most of the exfiltration is going on unnoticed and not many technology teams or security teams know what's even within these business applications sitting and at risk.

[00:05:15] Now, I would think that AI kind of supercharges the problem a little bit in terms of the fact that because you've got data that employees are using in various chatbots, generative tools, they could also potentially be used in models. The value of AI for most organizations comes in

[00:05:34] organizing. So tell me a little bit about how you're thinking about AI governance and how it relates to data security. Oh, it's massive. I mean, AI is... So when we move from on-prem to the cloud,

[00:05:48] the amount of endpoints probably went 100x, I would say. It used to be only endpoint and firewall and that's it. Everything else was stored within our environment. With the cloud, it kind of went 100x. And with AI, it's becoming 100x, maybe even 1000x problem from there on.

[00:06:07] So it's a massive increase. And how companies, when we talk to our clients, are adopting AI is right now it's a very slow adoption cycle. Maybe a chatbot for help desk, your customers to interact with your Zendesk environment within your ticketing system maybe,

[00:06:26] or your customer service agents having a bar on the right to help them guide around the history with this employee. But reality is most of the information, what was sitting within your database environments is already fairly easy to extract out with a very well-defined query or a button

[00:06:44] basically. But the real power of AI is going to be of data, which is not quite homogenous. It's just sitting within your ticketing systems, within your file system, within email systems. Your company knowledge really sits within SaaS applications. And how you extract out and enable

[00:07:02] AI in terms of making that usable is going to be a seismic shift for most organizations, not just for internal use cases, but external use cases. And then AI governance starts becoming a very relevant topic around what information classification do you have that is going into

[00:07:22] third-party tools at one point. On the other side, how the prompt is being used by the end user to get that information when they ask a question, are we at risk of disclosing more than we should

[00:07:35] have? And what we've seen with our GPT prompt, engineering, prompt attacks, that's not necessarily a safe bet that once you have a prompt, it's safe. There is a lot of hidden risks available and it'll

[00:07:50] take some time for those to be resolved. But even without those, understanding what information, what people have access to from a business context or risk context or sensitivity context is a problem where data security and DLP is going to be forefront of any organizations

[00:08:08] looking to adopt AI. Now, it feels like that in a way that we're almost having a conversation that most organizations aren't even ready for the step before. It's my general experience that most organizations are still almost struggling with data management alone, much less being ready with

[00:08:26] data organized in a managed way that can be applied with AI. Give me a little bit of a sense of the data journey that you think about, what customers have to do in what order to be ready to

[00:08:37] actually maximize on their data? So I ran or architected out multiple master data management programs in my past life, which were basically constructing data lakes, data warehouses. So this feels very, very similar to that. And that was a journey that most

[00:08:54] organizations never finished, to be honest. Partly because challenges around legacy systems, extracting information, building ETLs that don't break every other day, schema changes day two after putting a data governance program out. And they were literally these programs run under

[00:09:12] chief data officer programs really were glorified spreadsheets in terms of what information sits and have a lookup value or dictionary value in terms of being able to access that if you want. Like a glossary. We saw some modern kind of takes on master data management programs where

[00:09:30] it was somewhat more machinified in terms of your logical information with the physical location of the information, maybe that link is available. So organization looking to look for AI, what do

[00:09:44] they need to do? First of all, they need to label the information. They need to have uses of their case study in terms of what questions to ask and what information is best suited for it. Forget AI,

[00:09:55] you need as a human to have a general idea where to even point the AI. So there, what information is sitting within your database, you might be able to extract out much more easily. But unstructured

[00:10:06] data sets, raw files, your backups, your legacy information, your SaaS data, your app data, all that data is unclassified right now. So getting a handle on that is going to be the

[00:10:19] first step before even building any kind of AI model. Now, I want to layer on the other piece of this is we have to be thinking about privacy regulation and compliance on top of that. What

[00:10:30] does that add to the process that particularly those implementers are going to need to think about? So, I mean, we see that within the SaaS application workflow, even currently SaaS applications have followed a very similar framework method of how data is moving around. So in a chatbot,

[00:10:49] like Slack, for example, where people are interacting with the bot or themselves, it's a real-time information movement going on. You can't stop anyone from posting a patient data file, for example, but you can have observability and governance around like if

[00:11:04] something is posted, maybe remove it right away, redacted right away, warn the user not to do it. So there's a training program related to awareness, which is not there right now in most organizations. There is a concept of governance and real-time remediation that's

[00:11:19] not there. So all these pieces need to be available with any kind of AI construct in terms of observability, data loss prevention frameworks, and then awareness framework. And that's kind of the three pieces which we built our product around when we were trying to solve the SaaS

[00:11:39] field are very apt for AI use cases. And also tell me a little bit how you think that that match works. So first thing, for example, you don't know what you're going to protect or what needs to be

[00:11:53] private if you don't know what it is. So understanding the data assets, understanding the classification of what information is sitting within your OneDrive, SharePoint, MyS3 buckets, or so on and so forth, or my email systems, and classifying that information.

[00:12:13] What do you even consider to be sensitive? Most organizations might not even have an answer to that to be honest. Yeah, names, social security number, anything else. There could be 10 other things which they don't even know articulation for. So getting the observability out,

[00:12:27] classifying the information is the first step towards that. And then monitoring how that data is moving around by the users, by the systems, in and out of the systems. That's a second step in terms of your any governance layer you're going to build over AI. Are there particular

[00:12:45] industries or customer profiles that are really great matches for this need? Like they're really lean into this and adopt it quickly? Top of mind, obviously, healthcare financial services is kind of where that comes in. Financial services though, the top 100 banks, top 50 banks, generally have

[00:13:05] decent frameworks already in place. Either they're Chinese walls because of SEC, FINRA, OCC. I joke around sometimes, the idea of this product came around from sitting in a finance land where data was so restricted. You can't have like Chinese wall issues around customer data moving

[00:13:27] to a trading deck, for example. So I feel that finance in a way, at least the investment banks and the more regulated banks are already there in some ways. But healthcare obviously is a little

[00:13:39] bit behind technologically and data privacy, data governance side in spite of HIPAA. But we're seeing from an AI perspective, the risk of leaking information is so high that this is a cross vertical problem. Everyone is worried at a certain level, anyone who's collecting customer data,

[00:13:57] patient data or partner information. Now, how many understand, I've got a bit of a premise and I'd like you to tell me either where I've got it right or where I'm horribly wrong. Because it feels

[00:14:09] like we have a lot of organizations with a lot of data that isn't well classified as you've described. But in a way, AI and machine learning feel like the perfect solution to help classify that. But at

[00:14:20] the same time, we know that the data has to be classified for AI to be useful. Am I wrong in thinking that we can use AI and machine learning to properly classify the data so it's ready for

[00:14:33] other kinds of AI usage? Oh, for sure. I mean, we use AI to classify the information ourselves. Usually where this situation starts becoming hairy is there is in unstructured data sets, NLP has been around for 10 years, extracting of the entities. They're very well-defined methods

[00:14:52] in terms of what can be considered to be a person name. If David shows up, it's probably a person name. If Jeff K. Boulevard shows up, it's probably a street name. Those constructs AI already understands

[00:15:04] quite well and there's machine learning techniques already doing that. But the issue ends up becoming these programs fail historically is a lot of false positives. How do you make sense from noise to signal? How do you keep construction on that? But at the basic premise, you're thinking about

[00:15:24] the right way. Okay. Well, that's good to know. And everything we've talked about, of course, is super great opportunities for solution providers and particularly managed services providers to dig in and help their customers be ready to take advantage of that. So if you

[00:15:38] were to isolate the one big customer need in this space, what would you say that is? Just understand what you have. Get a scan done, get a risk scan done, get something done to understand, at least get a basic classification done of your given environment, which is a

[00:15:56] representative of your entire organization's workflow. Typically a file storage system, ticketing system, email system, chat system. They're pretty good proxy for what overall your organization is dealing with from types of information. That's a good step in terms of

[00:16:13] managed service providers and others to offer to customers in terms of just, hey, what do you have? And that could not only be from a security angle, that could be an AI enablement angle also. Hey,

[00:16:25] you might be able to use this for AI chatbot later. Let's kind of help the CIO, not just the CISO, in terms of getting a sense of where information resides and both from a secure governance

[00:16:37] perspective, but make the CTO, CIO's job easier. How do you think about what questions AI should be answering and how do you build a case around that within the organization? So you could be a

[00:16:47] revenue generating kind of party in this. Well, I think you've just outlined what the opportunity is. Yasser Ali is the founder and CEO of Polymer, which is a cutting edge data loss prevention platform for SaaS and AI applications. And he brings a rich background as a former Wall

[00:17:03] Street trader and a data security expert to this evolving landscape of data security. Yasser, thanks for joining me today. Thanks for having me, Dave. This was fun. The Business of Tech is written and produced by me, Dave Sobel, under ethics guidelines,

[00:17:18] posted at businessof.tech. If you like the content, please make sure to hit that like button and follow or subscribe. It's free and easy and the best way to support the show and help us grow.

[00:17:31] You can also check out our Patreon where you can join the Business of Tech community at patreon.com slash MSP radio, or buy our Why Do We Care merch at businessof.tech. Finally, if you're interested in advertising on the show, visit mspradio.com slash engage.

[00:17:50] Once again, thanks for listening to me. I will talk to you again on our next episode of the Business of Tech. Part of the MSP radio network.