Netezza Data Warehouse: Netezza Interview Questions and Answers

1. How many columns you can specify in distribute on clause?

You can specify upto four columns in distribution on clause. Be careful when using multiple distribution keys, that may degrade the query performance.

Read More On Netezza here:

2. Is it possible to update distribution key column?

You cannot update column used in distribution on clause. You have to perform the re-distribute table on other column to update distribution column. Re-distribution can be done by using CREATE TABLE AS SELECT * FROM TABLE distribute on new_column;

3. If you miss to specify column in distribute on clause, what distribution Netezza uses?

If you miss to specify the distribution column while creating tables. Netezza distributes data on the first column using Hash partition.

4. What is collocated join in Netezza?

When you join two tables which are distributed on the same key and when used those key columns in the join condition, then each SPU in Netezza works 100% independent of the other, as the required data is available within itself. This type of joins is called collocated joins.

5. When Netezza redistributes and broadcasts a table?

Whenever it is not possible to do a collocated joins, Netezza will either redistributes or broadcasts the table. When the table is a small one, then netezza broadcasts the table. Otherwise redistributes the table. Broadcast is the process of sending copy of small table to each SPU in the Netezza and redistribute is the process of re-distributing the table on the column which is used in Netezza joins.

6. What are the different ways to load data from file to table?

There are couple of ways to load data from file to table

· Create external table and load Netezza table using that external table

· Use nzload command to load data to table

7. What are the different ways to unload data from table?

There are couple of ways to unload data from table

· Create external table

· Use nzsql command with –o option

8. What are the constraints on a table are enforced?

The only constraint Netezza supports is Not null and default. There are no primary key and foreign key constraints in Netezza. The default constraint is enforced when you add new column to the table with default value to inserted.

9. How the NOT NULL specification on a column improves the Netezza performance?

Netezza tracks the NULL values at rowheader level. Having NULL values results in storing references to NULL values in header. If all columns are NOT NULL, then there is no record header. hence specifying Not Null on each column in table results in better performance

10.Can you insert duplicate rows in Netezza tables?

Yes, we can insert duplicate rows. Remember!! There are no primary or foreign key on Netezza

11.How to remove logically deleted records?

Whenever you delete a row in a table, it is not physically deleted. It is logically deleted by flagging the deletexid field in the table with current transaction ID. Nzgroom command utility is used to remove the logically deleted records from Netezza.

12.How Netezza updates records? Explain with an example

In Netezza, records contains two slots, one for createxid and other is deletexid. Netezza does not update records in database tables, it marks current records with delete flag. i.e. deletexid is updated with current transaction ID. Deletexid allows us to mark record with current transaction for deletion, up to 31 transactions are allowed in Netezza for all tables. Only one update is allowed on table at any given point of time, concurrent updates are not allowed on same table. In Netezza, if the deletexid holds value ‘0’ then it is a new record. All records with non-zero value in deletexid are deleted records. FPGA takes only records with deletexid contains value zero before delivering data to client software’s.

For example:

[ROW id][Create xid][Delete xid] à Netezza record slots
[Rec1][Trx1][0] à First time a record is loaded, record Rec1

Let’s say we have updated above record

[Rec1][Trx1][Trx22]                         à Record Rec1 is updated. Current transaction id Trx22 is updated
[Rec22][Trx22][0]                             à New update record Rec22. Similar to a new record this has zero for Deletexid

13.What are data types most suited for zonemaps?

Zone maps are typically useful for integers, date and time, variations of this data type. Zone maps are useful for ordered that that are usually built into data that is loaded; for example, phone call logs

14.What is a snippet?

Snippet is smallest unit of work carried out in SPU. Snippet is a small block of database operation, typically three to four operations (based on type of Netezza box), that are carried out on all SPUs where data is location. If a query results in these snippets: Snippet A, Snippet B, Snippet C, …. Snippet Z; they are carried out in a sequential manner.

15.What integer data type is preferred in Netezza?

Below are some of reasons to this

· Better and efficient joins.

· Zonemaps are based on integer data types

· Netezza compression works best with integer data types

16.Is there any way to stop NZSQL command, if one of the SQL commands fail?

In this case, ON_ERROR_STOP=true on nzsql command so that other commands do not get executed. For instance, we like to create a new table (CTAS) and later drop old table. If a new table creation fails, we certainly do not want to drop old table. In this case, this option is very useful.

Netezza Interview Questions and Answers - Part-1

Netezza Interview Questions and Answers - Part-2

Netezza SQL Interview Questions and Answers - Part 1

Netezza Data Loading Guide
FPGA-Accelerated Streaming Technology (FAST) Framework in Netezza
Netezza Architecture

Slowly Changing Dimensions (SCD)
Netezza Skew and how to avoid it
Importance of right Distribution key in Netezza
Design and Implement Slowly Changing Dimension Type 2 (SCD Type2) in SQL
Netezza Zone Maps

Wednesday, 15 June 2016

Netezza Interview Questions and Answers - Part-3

No comments:

Post a Comment