Saturday, 4 June 2016

Netezza Architecture

Netezza is a data warehouse and analytics appliance owned by IBM. It uses Asymmetric Massively Parallel Processing (AMPP) architecture, which combines an SMP front end with a shared MPP back end for query processing. It is an integration of database, processing engine and storage in a system. Netezza has four major components:

                               Fig 1 Netezza AMPP Architecture

Netezza hosts: The SMP hosts are high-performance linux servers that are set up in an active-passive configuration for high availability. The passive host will take over the processing tasks in case of active server failure. The active host acts as an interface to external tools and applications such BI, ETL , JDBC tools. The host receives SQL requests from clients connected to Netezza via ODBC/JDBC, compiles them into executable code segments called snippets (C codes) , creates optimized query plans and distributes the snippets to all the nodes for execution.
The data distribution and query execution is almost similer in cloudera impala Hadoop distribution.

Field programmable gate arrays(FPGA): The FPGA is the Netezza propriatry hardware tool developed to filters out unwanted data as early as possible in the data stream. The data will be elimited as early as when reading from disks. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory and network from processing extra data. The FPGA makes use of the zone maps to elimite the unwanted data. Zone maps are created to every column in the tables during certain Netezza operations. I will explain all these possible scenarios in coming posts.

Snippet Blades(S-Blades): S-Blades are intelligent processing nodes that make up the MPP engine of the netezza data warehouse appliance. Each S-Blade is an independent server that contains powerful multi-core CPUs, multi-engine FPGAs and gigabytes of RAM, all working in parallel to deliver high performance.

Disk: The disk enclosures contain high density, high performance disks that are raid protected. Each disk contains a slice of the data in a database table. The host distributes the data across all the disks using either hash or random algorithm. A mirror copy of each slice of data is maintained on a different disk drive. The disk enclosures are connected to the S-Blades via high-speed interconnects that allow all the disks simultaneously stream data to the S-Blades at the maximum rate possible. The data distribution and the storage is based on the distribution key which we use while creating table



Related Articles:

1 comment:

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in NETEZZA TRAINING, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Sangita Mohanty
    MaxMunus
    E-mail: sangita@maxmunus.com
    Skype id: training_maxmunus
    Ph:(0) 9738075708 / 080 - 41103383
    http://www.maxmunus.com/

    ReplyDelete