host Dave Sobel welcomes Ori Rafael, CEO and co-founder of Upsolver, to discuss the emerging concept of lake house architecture in data management. The conversation begins with an exploration of how lake houses compare to traditional data warehouses and data lakes. Ori explains that a lake house is essentially a modern data warehouse architecture that allows customers to manage their own data layers, providing flexibility and control over their data storage and processing.
Ori delves into the evolution of data management architectures, highlighting the transition from on-premise data warehouses to cloud-managed solutions. He discusses the challenges faced by database administrators (DBAs) in the past, such as vendor lock-in and the limitations of traditional data warehouses. The lake house model addresses these issues by decoupling storage and compute, enabling organizations to utilize multiple query engines and platforms without being tied to a single vendor.
The discussion also touches on the significant advantages of lake house architecture, particularly in terms of cost reduction and operational efficiency. Ori emphasizes that organizations can save a substantial portion of their data warehouse budgets by eliminating the need for expensive ETL processes tied to specific warehouse vendors. Additionally, the ability to leverage various engines for analytics and AI applications empowers businesses to innovate without the constraints of traditional data management systems.
As the conversation progresses, Ori highlights the importance of optimizing storage for improved query performance and efficiency. He explains how Upsolver manages the file system layer to ensure that organizations can achieve performance levels comparable to traditional warehouses while maintaining high storage efficiency. The episode concludes with a discussion on the evolving role of data engineers, emphasizing the need for them to transition from developers to platform managers, enabling greater independence and efficiency in data operations.
💼 All Our Sponsors
Support the vendors who support the show:
👉 https://businessof.tech/sponsors/
🚀 Join Business of Tech Plus
Get exclusive access to investigative reports, vendor analysis, leadership briefings, and more.
👉 https://businessof.tech/plus
🎧 Subscribe to the Business of Tech
Want the show on your favorite podcast app or prefer the written versions of each story?
📲 https://www.businessof.tech/subscribe
📰 Story Links & Sources
Looking for the links from today’s stories?
Every episode script — with full source links — is posted at:
🎙 Want to Be a Guest?
Pitch your story or appear on Business of Tech: Daily 10-Minute IT Services Insights:
💬 https://www.podmatch.com/hostdetailpreview/businessoftech
🔗 Follow Business of Tech
LinkedIn: https://www.linkedin.com/company/28908079
YouTube: https://youtube.com/mspradio
Bluesky: https://bsky.app/profile/businessof.tech
Instagram: https://www.instagram.com/mspradio
TikTok: https://www.tiktok.com/@businessoftech
Facebook: https://www.facebook.com/mspradionews
Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
[00:00:02] This one's new to me, What's a Lake House?
[00:00:05] Let's talk a little bit about Lake House Architecture's how it manages to compare against Data Lakes and Data Warehouses
[00:00:12] and learn a little bit more about what it means for AI and how to connect this all to our customers.
[00:00:18] Ori Rafael is the CEO and co-founder of Upsolver and joins me today on this bonus episode of the Business of Tech.
[00:00:26] Thread has declared death to the ticket.
[00:00:29] Did you know tickets date back to the 1800s and customers' eight tickets?
[00:00:35] Thread uses an approach for connecting, communicating, and ultimately collaborating.
[00:00:40] Thread allows people to come together around a topic.
[00:00:44] They can discuss and make decisions.
[00:00:46] They can share and invite the right people.
[00:00:48] The future of service is collaborative.
[00:00:51] Supercharge your service experience by seamlessly integrating AI and automation to meet your customers
[00:00:56] where they are in teams, Slack and desktop not in tickets.
[00:01:02] With instant updates to and from Connectwise Autotask and Halo PSA, keep all communication
[00:01:07] in one place where it should be with people.
[00:01:11] Decrease time to resolution 30% with chat-based support by visiting GetThread.com slash MSP Radio
[00:01:19] to declare death to the ticket.
[00:01:24] Well, Ori thanks for joining me today.
[00:01:27] Thank you very much, I'm excited to be here.
[00:01:29] Well, I'm going to dive right in because you started out with a concept that I'm not familiar with.
[00:01:35] This is the idea of the Lake House concept.
[00:01:37] You know, this is we often hear about data warehouses but you're introducing the idea of the Lake House.
[00:01:43] Talk to me what that is and how it fits into the overall theme of data management.
[00:01:48] Sure.
[00:01:48] So the short answer is that the Lake House is a type of data warehouse architecture.
[00:01:55] That's the one lineer that explains it and if you look at what were the previous architecture,
[00:02:00] so you had the enterprise data warehouse that the old-prem data warehouse that people were using,
[00:02:05] the Oracle's, the terraform, the tenor data's of the world then came the cloud.
[00:02:10] And now you just did the same thing, you did on premise but did it on a cloud management instance.
[00:02:14] And that would be registered, but you still had a lot of the challenges that BBAs had.
[00:02:20] For example, you could run out of space in your database because it was using local disk.
[00:02:25] Then came the next generation which was the the couple data warehouse.
[00:02:29] So the storage and compute were separated, you pay for them separately.
[00:02:33] Data warehouses became easier to use and more scalable compared to what they were used before.
[00:02:38] A lot of talk about the last disk came with the version of the couple data warehouse.
[00:02:44] But still there was a problem because the decoupling was inside the data warehouse vendor.
[00:02:50] So that problem, like if you were a BBA and I was a BBA, you always hated being locked to Oracle.
[00:02:55] You hated it because it cost you a lot of money and it hated you because you needed to use Oracle for the things it wasn't really built for.
[00:03:02] I think today the the the modern way that you see this is that you have a company or customer saying hey, I want to do B. I would not like and I want to do a I would data for example.
[00:03:15] So I want to use different products, lots of them but once I locked my data to specific format is a specific provider, I can't really do that.
[00:03:23] And then came the idea of the late house with late house, you Mr. Mrs. customer, I charge of building and maintaining your data layer.
[00:03:32] That means not just your files, you're actually your files and your metadata.
[00:03:37] So basically you create tables and then you use your data warehouse, more of a pure compute it only does really does where it does read only for it.
[00:03:47] Well, it's mostly read only queries in the idea of this is that you can use any number of query engine or warehouse says on top of the same data that you as the customer share sharing.
[00:04:01] It's stored in your account and you can share that it was more complicated but engine, but you manage the security for that layer.
[00:04:08] Now the concept of the late house was introduced a few years ago I can say that I invented that in the entire market is adopting late houses, the latest survey of scene is the 70% of the enterprises are going to want at least 50% on their of their analytics on the late house within the next three years that kind of a massive shift.
[00:04:28] There was a lot of people call this the open tables format was what would be the format in which you're going to store this data or three formats that came into light, would it Delta Lake and IceVed and going into the opinion grounds for a second.
[00:04:48] IceVed seems to be like the clean winner there's certain things that cause the market to be in the that iceville is going to be the format and right now everywhere house on earth that I know is going to support Apache iceville is the format you can write and Apache iceville based late house and you can read that data for pretty much everywhere else is the that you want.
[00:05:11] That's the idea of the the latest.
[00:05:13] Okay, so how many understand like what do you think consider like the real competitive advantage here of why the lake house architecture is going to be the big change in data management.
[00:05:24] The number though I think that the two very strong news and the first is cost reduction cost reduction can happen both in the way in the sense that you're not using the engine that you don't want for use case you're not locked in you have more negotiation file but in addition to that and I come from the.
[00:05:42] TL space from data movement space and when you move data into the warehouse you have to communicate with the warehouse so you have to have a warehouse clusters at the running in order to do your ingestion that's going to cost your other money expensive.
[00:05:56] Where they compute use for it here when I write I felt I don't need to communicate with a specific well I can just write iceville so all of that budget that you're currently staying to whatever warehouse vendor.
[00:06:09] For ingesting data into it in case is also transforming data on top of it all of that budget just goes away from my customer base it many cases more than 50% of your warehouse budget that would go away.
[00:06:22] So there is a substantial cost reduction this is why the earlier doctors that we are seeing are customers that have a high scale of data because they're really feeling that cost reduction.
[00:06:32] Benefit benefits that would be benefit number one and benefit number two is to ability to use multiple engines so many enterprises you're going to talk to are going to come with the agenda I don't want to marry specific warehouse vendor.
[00:06:46] I want to allow a neighbor the AI revolution by creating an open data layer using using it so that would be the business strategy reason and that would be the task.
[00:07:01] And one of the things you've been really focused on is the optimizing storage to improve speed and efficiency around data queries tell me a little bit more about what you've been doing there to make the difference and why it matters.
[00:07:17] Yes, I can do that so imagine that you want to move from a warehouse to a late house so you're doing your POC and you built your own late house layer you built your own storage and then you go and you run your queries and you then you find out that all your queries are two times though.
[00:07:33] So there is a lot of objection from within your organization to actually move to the letters because you're heard in user experience.
[00:07:40] In addition, there is quite a lot of variation that I'd want to be your storage station and you can start finding the findings of things a lot more for storage in some cases more than double the storage budget just because you moved from a late house into a warehouse.
[00:07:56] The reason these things happen is because you need to manage the file system you own the the data layer, do you need to manage the file system so that's an additional burden on you as the customer and that burden is very familiar every company that has ever been data data lake knows that burden and in many cases that's the reason companies says hey I have a data swap to complicated I can't really get any value out of that.
[00:08:21] So we take real what you would call an ice field manager we manage your file system layer making sure you're going to get a query performance that's similar to what you get in the warehouse and we're going to make sure your storage efficiency is going to be very good.
[00:08:35] And then maybe the last thing that we are adding, we're also making your warehouse to build time compatible.
[00:08:40] So we're allowing you to write update and delete into the warehouse at the high velocity which is something that would otherwise is to be impossible on top of open table formats.
[00:08:52] You combine real time and delete an update together those things don't make together very well but we manage the file system in a way that you are actually able to do it.
[00:09:02] The most common example for that is CDC change data capture the concept of popping data from a transaction on database into your analytic database whenever you want to do those use it when will be this use case you need to apply updates and delete to your analytical warehouse.
[00:09:16] That's going to be a challenge and doing it once a day the malicious a lot of the business value connecting that data into your warehouse.
[00:09:24] So those are the main three things that we had we had performance we had storage efficiency we had the ability to do read time on your on your warehouse that would be with up so our ice building management capabilities we also have ingestion capabilities but right now we're focusing more on the latest and now we are kind of finishing that piece help solving the customer that piece that will really make your life.
[00:09:46] house performance of warehouse interesting so what one of the other things up so for his done is focus on the idea of making data engineering skills more optional particularly as the related data lakes tell me a little bit about where where you how would you transform that and what you view are the important skills then that that a data engineering team should be thinking about.
[00:10:09] Well data engineering team there is quite a lot of variety but that's tied to very lovely separate data engineering team into the traditional warehouse people yes to help people of the world and you have the spark people of the world those are two different profiles regardless if you come from group A or from group B we can accelerate and improve the efficiency of your warehouse.
[00:10:34] It either way going to be hard work the difference is that the spark people would be able to try to craft some kind of DIY solution I have asked some of the companies of talk to an eventually did that themselves what does it think if we take and they said we think it's a six to nine months journey to really learn how to do it well monitor go to production.
[00:10:55] So we're going to save you that six to nine mind journey if you're coming from the S to all space it's not very likely that you're actually going to do it.
[00:11:04] Do it well you could buy it from a different vendor maybe if I compare the IY and and and what we do then we're just going to save you in the six to nine months and we're going to get better performance right off up so far has the best performance with I know in market.
[00:11:21] Well managing a like that we benchmark that release the benchmark to this is based on results and finally the website.
[00:11:28] So as platforms like yours become more and more part of the way solutions are rolled out from a data perspective how do you see the role of data engineers evolving over the next few years.
[00:11:39] They're they're not going to go anywhere they did engineers are definitely here to stay and for them I can tell you that my perception of a data engineer and I was the DBA and the data engineer can I know that the job well.
[00:11:53] I think that the data engineer would like to be a platform manager and not developer for our indeed.
[00:12:01] So if the R and D team needs to come to the DBA or to the data engineer and ask them to build to write codes for support the workload and basically everything is being funneled to the engineering.
[00:12:13] So you're creating this bottleneck and that's exactly the reason people didn't like data like for example this is why we have a genders in the world like data match today talking about freeing up all the people that are domain expert not necessarily data engineer to build their own workload.
[00:12:29] So I think platforms like the software and in general the layout are helping data engineers be more independent from software engineers they're going to give them the platform they're going to support it they're going to monitor it.
[00:12:42] They're going to make sure the security is maintained they're going to make sure data users data privacy is maintained so data engineers are the shaperone.
[00:12:51] But they don't need to be the developers and I think we are seeing that then very strongly in the market over the last few years like not something that you would tell an enterprise they would say no we've never heard of it everyone.
[00:13:05] So you're a guy who spends a lot of time thinking about this and it's thought a lot about these various roles one of the next big trends you think in sort of data integration in database management that business owners should be thinking about and preparing for.
[00:13:18] Well, I'm going to repeat the same one again because if you're going to look at.
[00:13:25] Practically as we are the ones being covered in the warehouse open table format is going to be number one the use what does it mean for you as a manager in an enterprise.
[00:13:34] You need to understand how a lake does look let house looks like it's no longer one box for a warehouse so it's a this you took the database you ripped it apart.
[00:13:44] So now you have a catalog now you have an open table format you're using maybe an ice dog manager so it files this manager like up so you have three pieces to what used to be.
[00:14:00] You need to understand what catalog would I want to work with so I want to buy it from a well of something do I want to buy it from the cloud provider.
[00:14:09] So all of that education something that manager in an enterprise working with data need to do.
[00:14:15] I said that the second thing they need to do and talking from a data engineering perspective is how to support the eye in the better way so open table format or not enough of you want to be able to.
[00:14:27] Bring additional tools around the eye best of database great example new category of database that you didn't exist in the past but it's very relevant today in the jen AI in the jen AI future and a lot of people are talking about simplifying data modeling or the haven't seen a test of only seeing you I support the process but even today I.
[00:14:49] To replace data modeling because it's a job that requires a lot of accuracy and it's very sensitive to for hallucinations but I think we're going to see more coppilot type of behavior there and I think that the last would be governor.
[00:15:04] Now also related to the data how are you going to manage the your governance across multiple engines the number of data engines you're going to use is going to increase.
[00:15:14] I was governance going to be maintained that's the question that the market is also there is still a lot of innovation that happened in the market this year so you should kind of catch up on the stand where things are going and choose what you want.
[00:15:28] Interesting because that's exactly where my last sort of question you brought up a eye at we managed to get most of the interview without bringing it up but that's sort of the last question I think a lot right now about governance and I think there's a lot particularly as we get into the smaller end of the market.
[00:15:41] There's a lot of organizations that have not done a good job of thinking about their data governance you know particularly before they can even take advantage of artificial intelligence what are the kind of guidance and direction that you with somebody thinks about data a lot.
[00:15:55] Give to organizations as they put together their data governance frameworks.
[00:16:00] Well, first of all there is your policy that there is the answer to AI and there is the answer for data in general is that's a you use to work with stuff like and your access to control policies weren't not like but now if you're using two other engines for AI those policies would not translate.
[00:16:19] So kind of the open data layer is composed of files metadata and governance and access control that piece of access control is getting solved right now the market of snowflake released an open source called Polaris data catalog and AWS for other companies other cloud over supported.
[00:16:42] And they have actually released their catalog unity into open source Amazon is doing work around that kind of need to choose where is your catalog going to be because now you're going to actually have the option of having policies that would translate was multiple engine.
[00:16:57] That would be peace number what was who actually has access to the data with the idea there are additional reasons to them maybe I don't know how to tell you the full story but how are you preventing data.
[00:17:09] That your sensitive data to be you know part of a model might expose your organization in inverted that's not my area that something that you should definitely think about but not my area.
[00:17:23] Well, a perfect place to end because that is exactly what I want our listeners to focus on is that's the area that listeners should fill in for their customers.
[00:17:32] Ori Raffella is the CEO and co-founder of up solver a pioneering company in the lake house data management space with nearly 20 years of experience or he has focused on revolutionsizing how businesses handle complex data to drive efficiency and cost savings or he thanks for joining me today.
[00:17:48] Thank you very much.
[00:17:51] Looking to reach an audience of thousands of MSPs and IT service providers, put your ad right here on the business of tech and be on the show that 64% of MSPs report having listened to a recurring top 50 tech news podcast.
[00:18:06] There are affordable options for you to reach our audience and we can support any budget podcast listeners are more engaged have a higher level of brand retention and are more willing to listen to ads here than any other avenues.
[00:18:22] Want to know more?
[00:18:24] There's information at MSPradio.com slash engage including a button to book a time to talk.
[00:18:31] I'm looking forward to that discussion.
[00:18:35] The Business of Tech is written and produced by me Dave Solbel under ethics guidelines posted at business of dot tech.
[00:18:43] If you like the content, please make sure to hit that like button and follow or subscribe.
[00:18:48] It's free and easy and the best way to support the show and help us grow.
[00:18:53] You can also check out our Patreon where you can join the Business of Tech community at patreon.com slash MSPradio or by our wideaway care merch at business of dot tech.
[00:19:06] Finally, if you're interested in advertising on this show visit MSPradio.com slash engage once again thanks for listening to me and I will talk to you again on our next episode of the Business of Tech.
[00:19:21] Part of the MSPradio Network.

