Home » Web Development
Category List

Featured Sponsors

Greenplum Technology Speeds Data Loading for nterprise Data Warehouses

Mar 17, 2009, 19:45

Data warehousing vendor Greenplum is claiming to enhance data loading speeds to the tune of as much as four terabytes an hour.

Greenplum  , a leading provider of database software for the next generation of data warehousing and analytics, today announced new technology designed to accelerate data loading for companies dealing with exponential data growth. Greenplum's new "MPP Scatter/Gather Streaming" technology eliminates the bottlenecks associated with other approaches to data loading, enabling lightning-fast flow of data into the Greenplum Database  for large-scale analytics and data warehousing. Greenplum customers are achieving production loading speeds of over four terabytes per hour with negligible impact on concurrent database operations.

The technology is part of the company's bid to challenge players such as Teradata, Oracle and Netezza. Customers are running into cost and performance constraints with competing solutions, and are looking for scalable software solutions to meet their needs, opined Paul Salazar, vice president of marketing.

 

Greenplum's SG Streaming technology ensures parallelism by "scattering" data from all source systems across 100s or 1000s of parallel streams that simultaneously flow to all nodes of the Greenplum Database. Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations. Data can be transformed and processed in-flight, utilizing all nodes of the database in parallel, for extremely high-performance ELT and ETLT loading pipelines. Final "gathering" and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed. This technology is exposed to the DBA via a flexible and programmable "external table" interface and a traditional command-line loading interface.

  According to Greenplum, this is different from traditional bulk loading technologies used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels. The aforementioned situation can result in bottlenecks and higher load times.

“With our approach we hit fully linear parallelism because we take all the source systems and we essentially do what we call scatter the data," explained Ben Wether, director of product management at Greenplum. "We break it up into chunks that are sprayed across hundreds or thousands of parallel streams into the database and received…by all the nodes of the database in parallel. The essence of it is we eliminate all the bottlenecks.”

Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns, company officials said. Data can be transformed and processed in-flight, leveraging all nodes of the database in parallel.

Final gathering and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed, Greenplum officials explained.



© 1999-2008 TamilStar.com. All Rights Reserved throughout the World.