1.
How nzload works in Netezza?
A query usually goes through
plan generation, optimization and transaction management. But Netezza bypass
all these steps, loads are done in terms of sets and is based on the underlying
table structures i.e. if the two tables are distributed on different columns
then loading time is different for these two tables. Data format is verified
and distribution of records calculated very quickly, fills out the set
structure and writes to storage structure.
While the nzload job is running, it sends binary records to the
SPUs along with the current transaction ID. When a SPU receives new binary records, it immediately allocates resources and
writes the records to the database or the table on the disks that the SPU owns.
Read more On Netezza here:
Read more On Netezza here:
- Netezza Data Loading Guide
- FPGA-Accelerated Streaming Technology (FAST) Framework in Netezza
- Netezza Architecture
- Slowly Changing Dimensions (SCD)
- Netezza Skew and how to avoid it
- Importance of right Distribution key in Netezza
- Design and Implement Slowly Changing Dimension Type 2 (SCD Type2) in SQL
- Netezza Zone Maps
2.
What are cluster base tables (CBT’s) in Netezza? And what are
benefits of CBTs?
A clustered base table
(CBT) is a user table that contains data which is organized using one to four
organizing keys. An organizing key is a column of the table that you specify
for clustering the table records.
Netezza uses the
organizing keys to group records within the table and save them in the same or
nearby extents. Netezza also creates zone maps for the organizing columns to
accelerate the performance of queries on that table that restrict using the organizing keys.
Benefits of CBTs
·
CBTs support
“multi-dimension” lookups where you can organize records by one, two, Three or
four lookup keys. if your queries commonly restrict on transaction type and
store ID, you can organize records using both of those keys to improve query
performance.
·
CBTs improve query
performance by adding more zone maps for a table because the Organizing key
columns are also zone mapped (if the organizing column data type supports zone
maps).
·
CBTs increase the
supported data types for zone-mapped columns, thus allowing you to Improve
performance for queries that restrict along multiple dimensions.
·
CBTs allow you to
incrementally organize data within your user tables in situations where data
cannot easily be accumulated in staging areas for pre-ordering before
insertions/loads. CBTs can help you to eliminate or reduce pre-sorting of new
table records prior to a load/insert operation.
·
CBTs save disk space.
Unlike indexes, materialized views and other auxiliary data structures, CBTs do
not replicate the base table data and do not allocate additional data
structures.
3.
What is FPGA and how it is useful for query performance in Netezza?
Field Programmable Gate
Array (FPGA) is Netezza proprietary hardware installed on each SPU. The FPGA on
each SPU filters out unwanted data, when it reads data from the disk. This
process of data elimination removes IO bottlenecks and frees up downstream
components such as the CPU, memory and network from processing extra data and
hence the query performance increases.
4.
What are zonemaps? How are they created and updated?
Zonemaps is internal
mapping structures to the extents that take advantage of the internal ordering
of data to eliminate extents that do not need to be scanned. An extent is the
smallest unit of disk allocation on a SPU. Zonemaps avoid scanning of
unreferenced rows. Zonemaps are created for every column in the table and
contain the minimum and maximum values for every extent. Zonemaps are also created
for the columns which are used in organizing on clause.
Zonemaps are created and refreshed for every SPU when you Generate statistics, Nzload operation, Insert, update operations, nzgroom operation.
Zonemaps are created and refreshed for every SPU when you Generate statistics, Nzload operation, Insert, update operations, nzgroom operation.
5.
What is the use of materialized views?
A materialized view
reduces the width (number of columns) of data being scanned in the base table
by creating a thin version (fewer columns) of the base table that contains a
small subset of frequently queried columns.
6.
What are the best ways of creating and using materialized views?
There are some best way
to make best use of materialized views
·
Create materialized views
with few columns which are frequently queried.
·
Specify order by clause
on the most restrictive columns i.e. columns used in where clause.
·
Periodically or manually
refresh the materialized views.
7.
What are the limitations of materialized views?
There are limitations
associated with the use of materialized views
·
You cannot insert,
update, delete or truncate a materialized view. Any changes on the base tables
will reflect into materialized views.
·
You can specify only one
base table in the from clause.
·
Base table can't be a
external table, system table or a temporary table.
·
You cannot use a where
clause in the materialized view.
·
Expressions are not
allowed as columns.
8.
What are the partitioning methods in Netezza?
There are two
partitioning methods available in Netezza
·
Random partitioning: Netezza used round
robin method and distributes data randomly
· Hash Partitioning: Netezza use hash algorithms
on key specified on distribution on clause and data is distributed on that
columns.
Thanks for sharing the information.
ReplyDeleteoracle sql interview questions experienced
Hello Mate,
ReplyDeleteLove it absolutely! So crystalline. No mumbo jumbo. No non-sense. Straight and simple. You guys need a standing ovation for your good work.
I have 2 servers and 2 tables with few similar columns between eachother.I have to build a tabular model with both as the source.The column called PDCAID is common between the two tables and unique to each table separately but may have some same IDs is both tables.Basically,the PDCAIDs are auto increment identity columns in each table.One table logs data for internal quality issues and other for external quality issues.
If I were to union the 2 tables ,how do i differentiate PDCAID between internal and external.
Also,pls suggest a best data model design in this case where:
• internal: columns depicting defects,open and closure dates of each PDCA,root cause,problems etc.
• external: columns depicting supplier issues,open/closure dates of each PDCA,root cause,counter measures etc.
• KPI: aggregates data for a region,site,per month,per year
the grain level expected by the user is Per PDCA Per day.
Super likes !!! for this amazing post. I think everyone should bookmark this.
Kind Regards,
I like your articles so much. Please keep writing . Do US people need a visa to Turkey from USA? Yes, of course, Because the US is not included in the list of those countries which do not require a visa to visit in turkey.
ReplyDelete