What Is a Data Lake, and How Does This Work with Big Data, and Where Is My Burger?

Ok, believe me these three items are connected, and here is why:

Last week, I was in one of our training classes in Crotonville, New York, and was lucky enough to talk with folks from our GE Software Center of Excellence. Now, I get the Big Data principle that’s all about being able to collate and archive inputs from multiple sources to get a big picture, and then apply smart analytics to it. That way you get smarter about how different inputs affect outputs (cause and effect), and you also can use pattern recognition to determine outputs before they occur; in other words, predictive analytics.

Now, the downside to this is that as you add more and more inputs, the archive of data continues to grow, and that impacts the storage costs and speed and performance of analytics running against the dataset. So, how do you avoid this?

That’s where the Data Lake comes in. In a lake, you have water flowing in—a reservoir of water in its core and water flowing out. In a data lake, the flow in is from multiple data archives (historians, data bases, spreadsheets etc.); the core is the current dataset you want to run analytics on and the flow out is the data once you have run your analytics.

Using this process, you’re copying small chunks of decentralized data to the core, centralizing them for a time to run your analytics, and then deleting the copy once you’re done. Thus, you get all the benefits of Big Data analytics without the issues of storage costs or performance loss. Pretty clever, eh?

So, how does this relate to my burger? Well, I flew through LaGuardia Airport Terminal 3 on my way home, and they have adopted a big data structure to optimize their productivity and reduce costs. Classically, LaGuardia Airportat an airport, you have multiple eating places, each with their own kitchen and staff. LaGuardia has turned to a centralized Big Data model where each eating location has an iPad® at the seat and a credit card swipe.

You order food and drinks through the system from centralized kitchens around the terminal, and you can also order items from any of the shops in the terminal, and they will deliver it to your seat within 15 minutes. Not only are they optimizing costs, but they are learning about me—my eating and buying habits from one Big Data system. And they now know I like blue cheese on my burger.

If you want to chat with me or some of our experts on the Industrial Internet, Big Data and Analytics, Data Lakes or just which cheese you prefer on your burger, come say hello at our User Summit in Orlando in October or at PACK EXPO 2014 in Chicago in November.


Barry Lynch

Barry, Global Marketing Director – Automation Hardware at GE's Automation & Controls business, passionately believes that connected machines, mobile data analytics and workforce enablement don’t have to be hurdles in business today. He leads the strategic direction of the company’s automation and information systems programs to help customers apply the power of the Industrial Internet to their businesses. By connecting machines, data, insights and people, our technology solutions deliver critical insight for greater operational efficiency, effectiveness and optimization. Learn more about how Barry works at GE on LinkedIn or follow him on Twitter at @BarryLynchGE.

More Posts