Data protection is tough when you don't know where your data is or who might have access to it. Join me as I sit down with Prandar Das, cofounder of Sotero, as we discuss the challenges and the opportunities that AI and LLM bring as we continue to look at better ways to protect data. Stick around for the four tips to follow on your own journey to protect your data.
[00:00:00] Welcome to MSP 1337. I'm your host Chris Johnson, a show dedicated to cybersecurity challenges, solutions, a journey together not alone.
[00:00:17] Welcome everybody to another episode of MSP 1337. I'm joined this week by Pronder Goss with the tarot. Welcome to the show.
[00:00:32] Thank you Chris, my pleasure.
[00:00:35] I appreciate you taking the time out for our audience. So one of the things that we always want to do is anytime we have a guest, especially someone that hasn't been on the show before, as if you could just tell the audience a little bit about yourself kind of your past to where you are today.
[00:00:50] I would love to hear you talk a little bit about so tarot. I'm very interested in what you guys are doing over there and then we'll jump into your topic and I'll let you share that with the audience.
[00:01:01] Thanks Chris, thank you. As you said, I'm Pronder Goss, I'm co-founder and CEO of Sotero. We are a Boston based startup and I like to tell people that we are in broadly in the cyber security space but more specifically focused on data security.
[00:01:19] Obviously, the question is how did I get here? How did we get here? I've actually, the bulk of my career has been in marketing and marketing technology.
[00:01:30] I have been the CTO at two of the largest marketing services providers in this country and one of them is actually one of the largest in the planet.
[00:01:39] What that did for me was expose me to two things. One was data management at scale when you talk about building marketing solutions for the largest brands in the country or in the world, we're talking about tens of billions of transactions or records that store sensitive information.
[00:01:59] The flip side of that is how do you collect store but protect this information while the company is using them to drive their marketing, their customer relationship building when you put those two things together, we're talking about data security and privacy management at massive volumes.
[00:02:16] That challenge is what got me and my co-founder started in the data security space because what we saw was a, was limitations around how to achieve security and privacy at scale that was meaningful but didn't become an obstacle to enabling companies to operate and access the data.
[00:02:36] So we can understand that anyway, they're going to send the data exactly.
[00:02:41] But with all the regulations coming on board and the administration, privacy groups becoming more stringent and focused on how to protect the consumer, addressing the challenge was what got us started and we said, hey there's got to be a better way of doing this.
[00:03:00] So we should be able to figure out a better way of achieving data security and privacy. That's what got us started on this thing.
[00:03:08] So we looked at this problem as Sotero and said most security, if not all of security is focused on the outside in meaning locked on the network, locked on access which kind of ironic because the one thing it didn't really protect was the data or the information itself.
[00:03:27] So we said, why don't we look at it as an inside out problem and come up with a solution for facilitating this process and we kind of came up with a patented ability to query encrypted attributes without the need to decrypt them.
[00:03:43] So it kind of sounds very technical and complex, but if you think about it, what it does is elevate data protection from at rest, meaning protected for nobody's using it to protect the data while it's being used.
[00:03:57] So that was an initial initial for into this space.
[00:04:02] I mean, we could just go so many different directions with this, but I think the one that pops out at me.
[00:04:08] I think about it and you're going to have to tell the audience what this topic is, but I'm going to throw this out there.
[00:04:14] I think one of the challenges that we have in protecting data is so many people have become almost desensitized to what is sensitive data like they understand what data is.
[00:04:27] So I think that's why we're going to have to understand when you know like a medical record should probably be protected.
[00:04:33] We still see it all the time, like sign this document, put your social security number on it and then scan an email back to me, like, well, how do I ensure that I'm sending you something that is going to get there securely let alone.
[00:04:47] Why do I like so many steps are being taken to do something that should not be happening.
[00:04:51] Yeah, I mean, the hacker aspect of it or the data that that aspect of it has become extremely complex because the talent and the skills being deployed on that side matches the talent skills.
[00:05:06] On the traditional product development or the data management space right so no, it's not just it's one data asset that has your critical numbers.
[00:05:15] They are collating data from abroad, basal sources that actually makes that hack or that attack or that criminal attempt so much more sophisticated where they're actually crafting programs built on the ability to link your data not just your credit card number but potentially a medical history or travel history and all of this and build a build a hack attempt that's so much more sophisticated that it continues to fool people.
[00:05:44] But it's all based on the fact that they have access to data and the ability to profile so they're not just saying like say I got your credit card number, they're not like hey, yeah, that's a win.
[00:05:54] We would have said that's a win if I was a bad guy having someone's credit card number without them knowing about it.
[00:05:59] That's at least a win right because I can do something with that but with tier point we're talking about like that's just part of the breadcrumb trail that they're using to build something that allows for a much larger impact when they finally do deploy their exactly.
[00:06:13] Even the monetization of that right there's the whole data collation aspect of it then the sales aspect of it then the people that are building platforms to leverage those things providing providing people code or even the the fishing code which is the starting point.
[00:06:28] So yeah, but it is reality that's what we're facing right I mean every what's the thing that you that's consistent every day when you listen to the news or read the news it's about new hack attempts or new breaches that have been.
[00:06:42] That have occurred that have occurred or the I think the ones that are worse than the actual breaches to me right now is the the the doom and gloom of what's going to happen if we don't do something but then no one says what should we do they just kind of leave it out there like yeah,
[00:06:57] if we don't protect critical infrastructure bad things are going to happen it's like that's probably true so what's your suggestion right like give me something to do and then I can at least take action let me be tactical.
[00:07:10] So that's where we can look at this and said hey this this like a multitude of point solutions right network access management endpoint monitoring log monitoring is if you look at this the one thing that still has not protected is the data and that's really an operational challenge because no company wants to introduce a data product because they're under the belief that it's closed down commerce or process.
[00:07:38] The moment this friction in the process companies are scared because that means an impact on the revenue and they go like oh we're not going to do anything at the data level because that's going to slow things down.
[00:07:48] We said wherever we come up with has to operate at the speed of business meaning no impact on the fric all know impact on the business process or don't create any additional friction so that the business can continue to function but still be able to provide this level of security and protection that data truly needs.
[00:08:07] Which is one perspective the second perspective is you can look at it as the ultimate back stop if everything else fails your endpoint monitoring fails they find a new API or a vehicle a third party software vehicle that they piggyback on.
[00:08:21] You want to be able to protect your data in the event something bad happens we took that approach as an inside out requirement and that's what we've done is provide a layer of protection at the data asset level very simply if you think about all the things that have been done.
[00:08:36] But all the things that happen at the network level we achieve all of those at the data level with no impact to business that's what beta.
[00:08:43] Those in you didn't tell them to say that this topic is all about protecting your data is that a fair pretty much I mean it's all about protecting your data what can you do to keep your data protected without impacting your business and that kind of goes into as you said it can go in a multitude of directions right there is not one single data asked right.
[00:09:05] I think we struggle with that a lot right I think you look at anybody that's been in business for more than a few years you suddenly go do you know where all your data is and of course the question is like yeah it's all on the server and it's like what about the data.
[00:09:16] You sent to other people what about the data you shared with fill in the blank what about and all of a sudden like it's in a few places like well are you on any social media platforms do you you know like the information's out there and it's probably everywhere.
[00:09:30] So you want to know and then maybe knowing where it is is less important than knowing how to protect it where it is I think it would probably be a more comfortable feeling and just saying don't necessarily care it's there I just care that it's protected there.
[00:09:45] Yeah let me switch you actually bring up a good point right that's also multi pass it or two dimension issue right knowing where the data is and knowing what form the data isn't because you have database as you have files so do your cloud stores you have small.
[00:09:59] Data assets are a positive reason that thing about so there's the discovery aspect of it and providing a single platform for protecting data in all of its forms everywhere all the time if you look at our tag language is a plug here but it's also relevant we say we protect data everywhere all the time.
[00:10:19] But it raises a question back to what you said earlier about how front actors are not just getting a credit card and that's it they're done it's all the other pieces and I think this is an interesting made me think like my think about data and I think about the different types of data not all data is equal right so I'm thinking like credit card that's kind of a big deal one of its like I drive a blue charger well there's lots of blue chargers but and blue chart.
[00:10:48] But and blue charger is a data type that is attached to my car but it's not really that important until you start using that to build that bigger package because now you've piece together information about me and you know we're talking about this before the show like you know when you can get access to that information very rapidly.
[00:11:10] And I think that's why I think we're seeing this new explosion of the to you know the doom and gloom of why things like critical infrastructure suddenly being talked about in the news and the potential for bad things happen is because of the way in which they're collecting the information to create the attack it's not just little it's all of the little things adding up to a big problem exactly I mean yeah the it's the question are like how does security and privacy keep up with innovation right companies organizations are focused on coming up.
[00:11:40] Socialization or bringing bring bringing to market new products that drive new revenue and I kind of site this all the time a simple thing like a sprinkler control right we're all excited because that minimizes your water bill kind of gives you control we can turn it on turn it off from wherever but think about the potential data that's some a small device like that could be collecting right it could essentially predict your routine.
[00:12:05] It could tell them when you think that when they think you're going to be at home when they think it's going to get it's going to be vulnerable you attach that to the fact that you drive a car it gives you a level of credibility when somebody has a conversation with you and says hey I know the charter that you drive right it kind of makes you more amenable to start giving them more information then you would if the context wasn't there.
[00:12:31] I don't want to say IOT is bad it's not inherently bad but I think about devices like a thermostat or to your point the sprinkler system you start piecing those together and all of a sudden I know an awful lot about your.
[00:12:45] Patterns like you watch the movies in the like someone gets kidnapped because they've been following them for weeks they know they make the same path the same coffee shop the same you know every time well what if it's not it doesn't require any sort of physical tracking we just are literally taking the data.
[00:13:01] The white that are coming in and saying oh we know that they leave the house every day at this time because that's from the lock it's turned on we know and we can see through the ring cameras that no one's home when that happens like suddenly you built out a pretty extensive and you haven't done anything or gone anyway.
[00:13:16] Yeah and so that to that point right I mean the company that selling that information doesn't realize the value of the potential challenge that it's going to raise right because they're selling it to monetize it to another business that says hey I'm going to leverage it.
[00:13:31] So I'm going to leverage this data to sell them more services and you get a cut of the revenue fine ship them the data right some somebody that gets their hands on the data said collates it with someone else something else and then immediately like you said be able to develop a profile that essentially tells them what you do when you do it and when you're vulnerable.
[00:13:51] And I kind of think about like if we want to take this to scale this is not really about say attacking an individual or a household or even residential commercial doesn't necessarily matter but like what if you're if you were to monitor that level of data set for an entire neighborhood or subdivision or a small city or a large city and start recognizing patterns with things like water consumption or electricity consumption and then you can start going well if we can manipulate those things.
[00:14:19] Then we can control the commerce happening for that city where that you know hey we suddenly cost of water just went through the roof why is everybody talking about a water shortage well there isn't one we're just talking about one maybe think that it's happening so people panic.
[00:14:35] It's funny bring that up because that's so relevant with this notion of our
[00:14:42] the notion of charging price based on the demand right it's something to your point that with the data data set in with the right analytics you can create artificial demand and drive up prices.
[00:14:58] Absolutely or at least create confusion in the market so that you don't rely on it anymore and that can be just bad too.
[00:15:05] I mean so it's crazy that's where this is that that's where this starts right.
[00:15:11] So going back in time thinking about how we would have protected data I remember I had I had I think it was the co founder of tenable on and we were talking about security back in the late 90s early 2000s and how security was basically do it you're told and if you get to do some security that's better than no security so yes the marketing department says you're absolutely going to expose the folder to the market.
[00:15:35] So I'm going to talk about the public internet because otherwise you've created friction fast forward to today obviously a lot of those problems are minimized we tend to do some of those things better than before but we've now also introduced things like AI and large language models that allow for that really fast data collection that data joining data sets together in fact in some cases and love your thoughts on this on what data sets are talking about their data sets.
[00:16:05] Right like so you use an LLM and maybe whether it's chat GPT and not saying that just as an example but like you use it and you're like what other data sets is that talking to that it should or shouldn't be using to help generate what it's going to give me back as an answer.
[00:16:20] I think we're at the very very starting point of data sets with an LLM right I mean I think you you mentioned that LLM has been around for a while or models have been around for a long time what's given them the power is available in the computing right.
[00:16:35] That's kind of became the foundation or collating and digesting processing a massive amount of data which was not even asked right I mean you can even think that there was possible a couple of years ago and suddenly it's there right and that is kind of already started concerns about what's in there
[00:16:55] right in that data set that's already publicly publicly available as these evolved to drive more organization or organizational specific needs they're going to see the mingling of truly sensitive data about a consumer's interaction with the company right and that data with the behavior that they've exhibited in the broader open community.
[00:17:19] So think about all the conclusions and inferences that can be drawn when you start mixing all this data together with really no safeguards of no control rules no rules yeah.
[00:17:32] Which also means that you know garbage and garbage out and if you don't have any control over the data sets then you have literally no control over which ones are garbage and which ones are not as it would state a back into the output.
[00:17:46] And unfortunately in today's world garbage out also has a massive impact and more you quite usually a negative impact on individual or the organization that's being referenced in the garbage output.
[00:18:02] So we've really talked a lot about the negativity well I shouldn't say negativity the power of AI the power of of compute to get data get answers.
[00:18:12] What we really haven't talked about is is there a path forward that isn't just us speculating on you know if we were to do a better job protecting critical structure we'd be less worried about bad things happening to it.
[00:18:25] So absolutely there is certainly a path forward right and I mean that's what we're happy to be part of that thing and you're not the only one this multiple organizations trying to solve this problem.
[00:18:37] So the first thing is keep data secure and it's stop the hackers of the criminals from accessing your data that's the number one thing and that relates not this data but it also relates to the infrastructure challenge right making sure that they're secure and making sure that secure is preventing them from being contaminated attack or being accomplished.
[00:18:58] That's a question of real time monitoring threat detection and that prevention right that involves real time massive well I mean I won't say massive but really quick detection of threats on even the smallest devices whether it's an IOT or think as they normally become the gateway for somebody getting into a network.
[00:19:20] So to rephrase what you just said you're essentially saying we've got a monitor everything and act on the things that are out of the ordinary which as you know Sataro you know doing something like that but you know like you said there's others you think part of this is like getting organizations to recognize that you have to turn them up you got a monitor like I think yeah yeah there's a lot of still not doing it like just not doing it at all or they don't know where their infrastructure is spread out so they're monitoring one network but they're not monitoring all the network.
[00:19:49] Monitoring all the networks in the organization.
[00:19:51] Monitoring and in I we can relate that to saying monitor all of the points of access to a network don't just focus on the laptops that you give it to your employees this.
[00:20:01] I'm multi do the more entry points to your network that you're not thinking about whether it's API gateways third party software that has access it's not just the laptops and devices and people say we've done what we need to do for ransomware protection because we've got our endpoints protected your endpoints translates to usually the laptops.
[00:20:18] There's a lot more right monitor all of those in real time from a vendor perspective I think it's equally critical that vendors realize that creating the new software product that requires organizations to completely upgrade their
[00:20:33] environments for this to be effective is not a solution.
[00:20:37] Oh it's very cost-per-hype.
[00:20:39] Yeah and it's not practical or feasible to expect an organization to completely rip apart everything so we've got to build something you've got to build products and solutions that work in their existing environments.
[00:20:52] I think a good example of that would be like die-com imaging systems and hospitals where they're running you know legacy operating systems but the technology still works just like it did when they bought it it's still doing what it's supposed to do but the ability to secure it
[00:21:07] obviously is requiring things to change we always called it compensating controls but you know compensating controls can get extremely expensive too when that legacy but to your point you can't just go and uproot a $15 million
[00:21:23] solution just because the OS is antiquated in that model I mean it's just you know hospitals would be going out of business if they had every three to five years they have
[00:21:34] created that. I mean I'll give you a more practical example we talk to enterprises as part of outreach and we've had enterprises tell us I mean obviously want to mention the name these are multi-billion dollar companies that say half of our processing is still down on mainframes does your product work on the entry
[00:21:53] and we're yeah absolutely I mean that's part of what we are about because we come from the world where organizations use a lot of different technologies to achieve their business outcomes and we understand that.
[00:22:05] Yeah and the product platform we build will always make sure that we work with everything that an organization has and not for and not expect you to either protect a portion of fraction of what you have or say upgrade everything for it to be a bit.
[00:22:19] I mean I will say this about you know some of the critical infrastructure stuff I was listening to you I think it's the guys behind run zero I think that's the company he was talking about how you know like seeing something in critical infrastructure that's now connected to the internet.
[00:22:34] The only thing you can see is that I can make it do this or that but I don't know what either of those things are and I don't know what it is but now I can see it so I can toggle this switch that opens a damn or closes a damn but I have no idea.
[00:22:47] So in those cases we're talking about like operational you know technology that's older than most of the IT or the IoT stuff that we use today like it was created before the idea of the internet 50 plus years ago.
[00:22:59] Do you think we need to upgrade that or do we need to do a better job of providing better naming convention so I think there are people out there that end up in having done something out of curiosity like causes bad things to happen because.
[00:23:16] We see we see kids get into systems and go all this is cool click no idea what clicked did they just wanted to find out what would happen had it said this is this you know if I had a little bit more information to it you know the double digit sort of like sometimes we just don't have enough information.
[00:23:32] At our fingertips to make better decisions because what we are monitoring isn't telling us very much.
[00:23:39] So we spoke about the negative aspects of AI or LLMS and it this is where that becomes so so much more useful right it's going to be hard to come up with standard naming conventions and enforce them that we're talking about a bit long process right so we are part of what.
[00:23:56] Part of what we do for our discovery and classifications is leverage elements right because there's so much more powerful in looking at the data asset and looking at the context and saying hey.
[00:24:08] This is what this most likely is going to be right so there is there certainly an opportunity to be able to do that at scale on a more reasonable timeframe using technology to achieve where we need.
[00:24:22] So you're getting into or that LLMS are being built in such a way that they maintain the integrity.
[00:24:30] Yeah yeah and I mean so again I mean LLMS it's not a people can attempt to focus on one aspect of LLMS right I mean if you think about the entire process into end it starts from data collection data processing data storage data persistence.
[00:24:46] The cleansing and sanitization of a prompt the sanitization of the output validation that there's no copyright or sensitive confidential information in there and then pushing that for consumption.
[00:24:58] So we're talking about an end to end a process right like everything else.
[00:25:04] We are happy to say that we're up there early building that into and platform that enables what we've able to as trusted interactions with LLMS.
[00:25:13] So you have to teach people to think differently right because I think about like if I'm a great Python programmer that doesn't mean I understand what's going to happen when you know you go and log into your chat GPT prompt and you start asking questions.
[00:25:28] It may not know how to process that but you keep asking the questions and suddenly it's like oh that does make sense I can totally give you an answer for that and the reality is it shouldn't have given you it should have been like alarms going off telling people like.
[00:25:39] What is being asked is not okay for anybody to ask exactly it's allowed to continue well now it's also corrupting the model.
[00:25:48] The model yeah I mean think about this we trained an entire generation of developers right encoders do not worry about security and privacy because everything was automated in the thing it just became simpler where they said I just need this answer i'm going to get it I don't I'm not going to think about it so in that context you want to be able to make sure that all of this is automated.
[00:26:09] Yeah operating within an environment that takes care of all of that because expecting every developer or code are out there to be cognizant in a way that it's not practical.
[00:26:21] But you remember if you go back in time the days when we use like C sharp and C++ and you wrote it wrong and it caused the computer to crash when you went.
[00:26:29] Like they don't have that doesn't happen anymore right like exactly slow down you might actually use something that's like wow this is taking really long time to give me the answer that i'm looking for where you don't see a reboot you don't see.
[00:26:41] Yeah computers on crash anymore no not like that and we're using computers on sale so 10 computers can fail and the other 90 are still up and running and can.
[00:26:50] Up and running and generating either the wrong or the or a potentially right right danger is output.
[00:26:56] So we have just a few minutes left we've talked a lot about you know how we should go about doing this I think differently than what many of us have and I can tell you i've got.
[00:27:07] I'm doing a workshop for a group this month on on what how to approach AI how to approach the risks of AI and like one of them is like you know what are the questions that you ask when you work with any vendor like you know one question would be like just because.
[00:27:23] The vendor doesn't out that they're an AI company like chat GPT open like we know those are AI companies like that's what their model it's their business but like we could say you know if you said Microsoft you don't automatically say Microsoft.
[00:27:37] They don't get an AI in the same statement but the idea here is like with risk profiling and things along those lines you might ask questions like do you use AI in your products do you use AI and how does it interact with.
[00:27:51] Fill in the blank what data of mine are you storing up.
[00:27:55] Give me some ideas of some suggestion maybe you've got something to share that would help.
[00:28:00] There are a few basic questions you ask yourself.
[00:28:03] If I'm going to interact with an LLM model,
[00:28:05] what is my objective?
[00:28:07] Am I just trying to get a general or a generic response
[00:28:10] from the data that already exists there?
[00:28:14] If that's the thing, that's one thing.
[00:28:15] Then am I worried when I do that,
[00:28:17] that I'm going to potentially extract and consume
[00:28:20] copyrighted or confidential information
[00:28:23] that could put me in legal jeopardy
[00:28:24] because I inadvertently consumed it.
[00:28:27] First one is like if I was asking about
[00:28:29] the FAQ, it could be a knowledge base
[00:28:32] that I'm asking questions.
[00:28:33] It's going to be much more results will be much better
[00:28:37] than just trying to search the text phrase
[00:28:40] in a database, right?
[00:28:41] In a database, yeah.
[00:28:42] Using its engine to, yeah.
[00:28:44] So then the second one, you're talking about
[00:28:47] making sure that when you are using these prompts
[00:28:50] to understand where it's potentially pulling data
[00:28:53] from so that you don't get in legal jeopardy.
[00:28:55] In legal jeopardy?
[00:28:56] For example, you ask for some images,
[00:28:58] it gives you a copyrighted image from somebody
[00:29:00] that you're not aware of, you use it.
[00:29:02] Hey, guess what?
[00:29:03] You just potentially violated a copyright
[00:29:06] but you were not even aware of, right?
[00:29:07] That's like similar to what we just saw recently
[00:29:09] with the New York Times where we had a chat
[00:29:13] or a language model that literally went behind their gate,
[00:29:17] scraped all of that data.
[00:29:18] And of course now and every way to be able to do that.
[00:29:21] Yeah, yeah.
[00:29:22] Yeah, yeah.
[00:29:23] But Jimmy Dave, to your point,
[00:29:25] they have to be upset because they spent millions
[00:29:28] of dollars paying and sourcing this information
[00:29:31] that now anybody can consume, right?
[00:29:32] Well, not only that, but we're consuming it
[00:29:34] without knowing where it came from either.
[00:29:36] It's not like it says at the bottom,
[00:29:37] New York Times copyright.
[00:29:39] Like, yeah.
[00:29:41] The third aspect of it is potentially saying,
[00:29:43] okay, now that's fine for the generic
[00:29:45] or the public publicly available information.
[00:29:49] What do I need to do if I need to use
[00:29:51] leverage LLM for driving a business process
[00:29:55] or optimizing a business process within me?
[00:29:57] Does that include exposing my sensitive data?
[00:30:00] How do I prevent my sensitive data
[00:30:02] or confidential information from being integrated
[00:30:06] with the open source information
[00:30:08] or the publicly available information?
[00:30:10] And we see this a lot with like in the IP space
[00:30:13] where maybe you're building software
[00:30:14] you ask a chat prompt for help,
[00:30:17] you're struggling with getting your code's not working right.
[00:30:20] And then of course suddenly it works great.
[00:30:23] And you're like, okay, but then what happened
[00:30:25] to what you put in there and what you did?
[00:30:26] Well, now has that entire piece of code
[00:30:30] having for bed to get into the wrong hands.
[00:30:31] Like that's not even being about your software
[00:30:33] but just being able to use that to come after
[00:30:36] your consumers in the future
[00:30:38] because now I've got this piece of code
[00:30:39] I understand you're writing software.
[00:30:41] Yeah, you throw in all your data assets
[00:30:43] and there to generate some kind of a report
[00:30:45] or an output guess what?
[00:30:47] All of that's in there, right?
[00:30:49] So this really gets into what does it mean to say
[00:30:51] have private LLM's that you can leverage
[00:30:54] for the compute function.
[00:30:57] And a lot of cases, if you're a big organization
[00:30:59] you don't necessarily need to go check
[00:31:01] with the rest of the world and the LLM's
[00:31:03] that might be out there
[00:31:04] you needed to just compute well.
[00:31:06] Compute what you have.
[00:31:07] Yeah.
[00:31:08] And so that leads to the fourth issue here,
[00:31:11] which is bias and skewness in data.
[00:31:15] How good is your data to give you
[00:31:17] an accurate statistically valid answer?
[00:31:20] If you throw all the data in there
[00:31:21] and have it train your model, right?
[00:31:24] The proprietary model for you
[00:31:26] that doesn't mean that the data set is clean
[00:31:28] from a bias perspective.
[00:31:31] Yeah, they're too true.
[00:31:32] So we saw this in Times and Newsweek
[00:31:34] where you show the bar chart or a pie chart
[00:31:36] and by the percentage are so close
[00:31:38] but they made it look really big.
[00:31:39] So when it's blown up and scale
[00:31:41] it looks like it's way different.
[00:31:43] Yeah, I mean just because your prior practices
[00:31:48] have resulted in you skewing your interactions
[00:31:50] towards a specific audience segment
[00:31:52] that doesn't mean that you want to do it in the future
[00:31:54] but if you rely on your data,
[00:31:56] that's where you're going to end up.
[00:31:57] So there's a statistics aspect of it
[00:31:59] that also needs to be implemented
[00:32:01] while you're training or preparing data sets
[00:32:04] to make sure that at least that you're aware
[00:32:07] that there's a bias in the data
[00:32:08] so you can account for it
[00:32:09] or be cognizant that your data is biased.
[00:32:12] In some degree it's kind of like saying
[00:32:14] you have to do maintenance
[00:32:15] just like you went on a car.
[00:32:16] Like you can't just let the data
[00:32:17] just continue to evolve on its own.
[00:32:20] Without somebody doing an integrity check,
[00:32:22] very hard to get.
[00:32:22] Exactly.
[00:32:23] And what's a good data set for one organization
[00:32:25] or for one object is not necessarily
[00:32:27] a good data set for another.
[00:32:29] Sure.
[00:32:30] Sure.
[00:32:31] And in fact one wouldn't necessarily know
[00:32:32] unless they understand what's actually
[00:32:34] the makeup of that data set too.
[00:32:37] It's very easy to get excited about it.
[00:32:39] And so here's the biggest problem
[00:32:41] that underlies all of this stuff, right?
[00:32:43] Resources of skills needed to make this happen.
[00:32:46] You're taking what an organization exists has
[00:32:49] within it and saying,
[00:32:50] now we've got a new set of objectives,
[00:32:53] technologies and capabilities can they adapt to it?
[00:32:57] Or do you go look for a platform
[00:32:59] that normalizes all of this
[00:33:01] but puts this information in a consumable form
[00:33:04] in front of the people that are sitting there
[00:33:06] and saying,
[00:33:07] this is what the data says it is.
[00:33:08] Here's the potential bias in there.
[00:33:10] Here's the security problems.
[00:33:12] Here's how do you protect yourself?
[00:33:14] That's very good.
[00:33:15] I think what you just said is kind of our reality, right?
[00:33:17] Like how many people could make a great pie chart
[00:33:19] in an Excel spreadsheet yesterday?
[00:33:21] Now they have co-pilot
[00:33:22] and they can make pie charts all day long.
[00:33:24] Hey, so they should be making pie charts.
[00:33:26] Exactly.
[00:33:28] Wow, this was great.
[00:33:30] He really opened my eyes to a few things
[00:33:32] on the risks of AI and LLMs
[00:33:35] and also the potential that it brings with it.
[00:33:37] So for those of you listening,
[00:33:39] this has been an episode of MSP1337.
[00:33:42] Thanks and have a great week.
[00:33:44] Music

