Harness the Value of Exploding Data Volumes
Data Lakes have emerged in recent years in response to organizations looking to economically harness and derive value from exploding data volumes. New data sources such as web, mobile, and connected devices along with new forms of analytics such text, graph, and pathing have necessitated a new Data Lake design pattern to augment traditional design patterns such as the Data Warehouse.
Companies are beginning to realize value from Data Lakes in the areas of:
- New Insights from Data of Unknown or Under-Appreciated Value
- New Forms of Analytics
- Corporate Memory Retention
- Data Integration Optimization
Yet confusion regarding the definition of a data lake abounds in the absence of a large body of well understood best practices. Drawing upon many sources as well as on site experience with leading data driven customers, a data lake is defined as a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon.
Data Lake Design Pattern
A design pattern is an architecture and set of corresponding requirements that have evolved to the point where there is agreement and best practices for implementations. How you implement it varies from workload to workload, organization to organization. While technologies are critical to the outcome, a successful data lake needs a plan. A Data Lake design pattern is that plan.
The data lake definition does not prescribe a technology, only requirements. While Data Lakes are typically discussed synonymously with Hadoop – which is an excellent choice for many Data Lake workloads - a Data Lake can be built on multiple technologies such as Hadoop, NoSQL, S3, RDBMS, or combinations thereof.
"Data lakes can be based on HDFS, but are not limited to that environment; for example, object stores such as Amazon Simple Storage Service (S3)/Microsoft Azure or NoSQL DBMSs like HBase or Cassandra can also be environments for data lakes." — Gartner, 2015
Data Lake Architecture
As the trusted advisor to the world’s leading data driven organizations, Teradata can help with the design, implementation, and support to ensure your organization avoids the typical pitfalls and realizes maximum value from your Data Lake initiative by ensuring critical capabilities and design principles based on best practices.
Products & Services for Data Lakes
Expert implementation and customization services provided by Think Big helps organizations successfully implement Data Lake initiatives to gain optimal business value, offers advanced implementation services and sophisticated integration of open-source technologies, including:
- Big Data Strategy
- Data Lake Implementation
- Data Engineering
- Analytics and Data Science
- Managed Services
- Big Data Training
Performance hurdles, prolonged implementation periods, and reliability issues – are solved by the Teradata Appliance for Hadoop when compared to solutions that are not preconfigured. Teradata does the hardware and software integration plus plenty of testing so you don’t have to do it. The Teradata appliance is delivered ready-to-run and optimized for enterprise-class big data storage and discovery.
Presto is an open source SQL-on-Hadoop query engine designed for running interactive analytic queries against data sources of all sizes. Through a single query, Presto allows you to access data where it lives, including in Apache Hive™, Apache Cassandra™, relational databases or even proprietary data stores. Presto was created by Facebook for the analytics needs of extremely large data-driven organizations.
Easy to use, multi-genre advanced analytics at scale to enable business analysts and data scientists to quickly discover insights in their Hadoop data lake. Aster delivers over 100 pre-built parallel analytic functions that runs natively on Hadoop to analyze data directly on HDFS. Aster Analytics is also YARN integrated to support multiple instances of Aster from sandboxes to production use cases in the same Hadoop cluster.
Get deep strategic insights from massive amounts of data with Teradata Database software & utilities. Analyzing multi-structured data began with Teradata Database 14.0 when name-value-pair functions and regular expressions enabled Teradata sites to process web logs using popular business intelligence tools. The Teradata Integrated Big Data Platform supports workloads such as deep history analytics, storage of massive amounts of multi-structured data, and a raw data landing zone for transformations.