“Big Data’ and ‘data lake’ only have meaning to an organisation’s vision when they solve business problems by enabling data democratisation, re-use, exploration, and analytics.” – Carlos Maroto: Technical Manager at Search Technologies.
A data lake is a storage repository that acts as the central source of all your organisation’s current and historical data, both structured and unstructured. This data is transformed as it moves through the pipeline for things such as analysis, creating quarterly and annual reports, machine learning and data visualisation. The information contained in a data lake can be highly valuable asset, however, without the right structure, your data lake could turn into a data swamp.
Here’s three strategies for getting the most value from your data lake.
#1 – BUSINESS STRATEGY & TECHNOLOGY ALIGNMENT
“It’s important to align goals for your data lake with the business strategy of the organisation you’re working to support.” – Bizcubed.
What are the business goals you’re trying to achieve with your data lake? Operational efficiency? Better understanding of your customers? Will your current infrastructure help you achieve this while also maximising your profits? Aligning your goals with the technology you’re planning to implement will not only help you articulate what problem you’re trying to solve, but also improve your chances of gaining executive buy-in and winning the support of your team. The better the plan, the easier it is to identify possible roadblocks and the higher the chance of success.
“As technology teams continue to be influenced by the hype and disruption of Big Data, most fail to step back and understand where and how it can be of maximum business value. Such radically disruptive new business processes can’t be implemented without knowledge gathering and understanding how Big Data technology can become a catalyst for organisation and cultural change.” – Thierry Roullier: Director of Product Management at Infogix, Inc.
#2 – INTEGRATION & ARCHITECTURE
“You need to be able to integrate your data lake with external tools that are part of your enterprise-wide data view. Only then will you be able to build a data lake that is open, extensible, and easy to integrate into your other business-critical platforms.” – O’Reilly.
Technology is moving at a rapid place.The tools you use in your business may not cooperate well with your data lake, and may not support the data architectures of tomorrow. During the implementation process, one of the first things to look at is how adaptable your long-term technology investments are.
Big Data architectures are constantly evolving, and it’s important to select flexible data processing engines and tools that can handle changes to security, governance and structure without being too costly to the organisation. Before implementing anything, you need to have a clear vision of what you want the end technical platform to look like, and what components you will need to make that happen.
“Modern data onboarding is more than connecting and loading. The key is to enable and establish repeatable processes that simplify the process of getting data into the data lake, regardless of data type, data source or complexity – while maintaining an appropriate level of governance.” – Bizcubed.
#3 – DATA VIRTUALISATION & DEMOCRATISATION
“ Data virtualisation involves abstracting, transforming, federating and delivering data from disparate sources. The main goal of data virtualisation technology is to provide a single point of access to the data by aggregating it from a wide range of data sources.” – TechTarget.
Data lakes and data virtualisation tools work well together to solve different problems and provide a layer of intelligence that results in more agility and adaptability to change.
“ As an example, a virtual layer can be used to combine data from the data lake (where heavy processing of large datasets is pushed down) with golden records from the MDM that are more sensitive to stale copies. The advance optimisers of modern data virtualisation tools like Denodo make sure that processing is done where it is more convenient, leveraging existing hardware and processing power in a transparent way for the end user. Security and governance in the virtual layer also add significant value to the combined solution.” – datavirtualizationblog.com.
Data democratisation is the ability for information in a digital format to be accessible to the average end user. The goal of data democratisation is to allow non-specialists to be able to gather and analyse data without requiring outside help.
“Data must be freed from its silos. Today, it resides in a variety of independent business functions, such as HR, manufacturing, supply chain logistics, sales order management and marketing. To get a unified view of this data, businesses are engaging in a variety of ad-hoc, highly labor-intensive processes.” – Computer Weekly.
For more resources, please see below: