Saturday 23 April 2016

Debug Hung Backup and Restore in IBM Netezza/ Hung NZBACKUP and NZRESTORE IBM Netezza


Debug Hung Backup and Restore in IBM Netezza/ Hung NZBACKUP and NZRESTORE IBM Netezza

########################################################################
Here is the step by step of handling hung session of backup and restore in IBM  netezza .
Hung /stuck NZBACKUP and NZRESTORE
###############################################################################

1. Check status of backup

[nz@sherlock-host-1 ~]$nzrestore -history -db PMFIDW |grep "2016-03-08"
PMFIDW     PMFIDW    20160206210001 32    INCR:RESTCOMPLETED Yes       2016-03-08 05:36:41 restoresvr.28184.2016-03-08.log
2. checked still restore is running 

[nz@sherlock-host-1 ~]$ ps -ef|grep nzrestore

nz       27981 31708  0 10:54 pts/1    00:00:00 grep nzrestore
nz       28173 28170  0 05:36 ?        00:00:00 /nz/kit/bin/nzrestore -db pmfidw -sourcedb "PMFIDW" -connector netbackup -connectorArgs DATASTORE_SERVER=edp-nbu-101-ap -npshost10.192.204.35 -increment REST -lockdb T

3. Check If there is any session of bnr in Type and i found it.
[nz@sherlock-host-1 ~]$ nzsession
ID      Type      User        Start Time              PID   Database  State  Priority Name Client IP      Client PID Command
------- --------- ----------- ----------------------- ----- --------- ------ ------------- -------------- ---------- ------------------------
5545200 sql-oledb MFIDSS      08-Mar-16, 02:21:04 PST 32265 PMFIDW    active normal         10.192.136.43       2552 select * from f_order_li
5545372 sql-oledb MFIDSS      08-Mar-16, 02:26:00 PST 32666 PMFIDW    active normal         10.192.136.43       2552 select * from f_order_li
5545440 sql-oledb MFIDSS      08-Mar-16, 02:28:35 PST 21821 PMFIDW    active normal         10.192.136.43       2552 select * from f_order_li
5545457 sql-oledb MFIDSS      08-Mar-16, 02:29:25 PST 24165 PMFIDW    active normal         10.192.136.43       2552 select * from f_order_li
5552655 bnr       ADM_USR     08-Mar-16, 05:36:02 PST 28184 PMFIDW    active normal         10.192.150.20          0
4. check log file for session and process id of restore.

[nz@sherlock-host-1 ~]$ vi /nz/kit/log/restoresvr/restoresvr.28184.2016-03-08.log
[nz@sherlock-host-1 postgres]$ cat /nz/kit/log/restoresvr/restoresvr.28184.2016-03-08.log|grep "Postgres client pid"
2016-03-08 05:36:41.368352 PST Info: Postgres client pid:32398, session 5552669

5. checked log file restore is still not committed.

[nz@sherlock-host-1 ~]$ tail -f /nz/kit/log/backupsvr/restoresvr.28184.2016-03-08.log
tail: cannot open `/nz/kit/log/backupsvr/restoresvr.28184.2016-03-08.log' for reading: No such file or directory
tail: no files remaining
[nz@sherlock-host-1 ~]$ tail -f /nz/kit/log/restoresvr/restoresvr.28184.2016-03-08.log
2016-03-08 05:42:59.634337 PST Info: Restoring ATL_ATG_PROMOTION
2016-03-08 05:43:00.265610 PST Info: Restoring ATL_ATG_PROMOTION_REJECTED
2016-03-08 05:43:00.731310 PST Info: Restoring L_PROMOTION
2016-03-08 05:43:01.149427 PST Info: Restoring F_ORDER_LIFECYCLE_EXP_DT
2016-03-08 05:43:01.266068 PST Info: Restoring ATF_OMS_ORDER_PRICE_DTL
2016-03-08 05:43:03.160407 PST Info: Restoring F_ORDER_PAYMENT
2016-03-08 05:43:05.662640 PST Info: Restoring F_ORDER_MISC_DISCOUNT_DTL
2016-03-08 05:43:06.262526 PST Info: Restoring F_OMS_ORDER_DTL
2016-03-08 05:43:12.022260 PST Info: Restoring ATF_OMS_ORDER_RMK
2016-03-08 05:43:14.350368 PST Info: Restoring views, users, groups, permissions
#################waiting for last line of success############################


6. check session of session id 5552669(restore session but not found)
[nz@sherlock-host-1 ~]$ nzsession|grep 5552669
##################### no result###############

7. check processes for process id of restore ( found its waiting on create view)
[nz@sherlock-host-1 ~]$ ps -efw --forest | grep ^nz | grep 32398
nz       32701 31708  0 11:13 pts/1    00:00:00  |                       \_ grep 32398
nz       32398 29810  0 05:36 ?        00:00:04  |   \_ postgres:ADMIN PMFIDW 127.0.0.1 CREATE VIEWwaiting
nz       32413 30302  0 05:36 ?        00:00:15  |   \_ dbos event C1047,U1277861,S5552669,[32398]


8. check pg.log for process id 32398
[nz@sherlock-host-1 ~]$ cd /nz/kit/log/postgres/
[nz@sherlock-host-1 postgres]$ vi pg.log


[nz@sherlock-host-1 postgres]$ grep "\[32398\]" pg.log > /tmp/32398.txt
[nz@sherlock-host-1 postgres]$ vi /tmp/32398.txt

##########checked log of this proc id.( its was at creating of views)########
9. check if there is any locks and found locks on the same views and belonging tables.
[nz@sherlock-host-1 postgres]$ nz_show_locks
User objects that currently have lock(s) associated with themwhich was used in view creations
 Database Name |        Object Name        | Object Type | Object ID
---------------+---------------------------+-------------+-----------
 PMFIDW        | F_ORDER_LIFECYCLE         | (table)     | 235686790
 PMFIDW        | F_ORDER_LIFECYCLE_ARCHIVE | (table)     | 235687516
 PMFIDW        | F_ORDER_LIFECYCLE_HT      | (view)      | 235700073
(3 rows)
For specific details, invoke
     nz_show_locks <database> <object>  [-schema <schema>]

10. check the session belong to those tables and views holding locks. and found four first session.

[nz@sherlock-host-1 postgres]$ nzsession
ID      Type      User        Start Time              PID   Database State  Priority Name Client IP      Client PID Command
------- --------- ----------- ----------------------- ----- -------- ------ ------------- -------------- ---------- ------------------------
5545200 sql-oledb MFIDSS      08-Mar-16, 02:21:04 PST 32265 PMFIDW   active normal         10.192.136.43       2552 select * from f_order_li
5545372 sql-oledb MFIDSS      08-Mar-16, 02:26:00 PST 32666 PMFIDW   active normal         10.192.136.43       2552 select * from f_order_li
5545440 sql-oledb MFIDSS      08-Mar-16, 02:28:35 PST 21821 PMFIDW   active normal         10.192.136.43       2552 select * from f_order_li
5545457 sql-oledb MFIDSS      08-Mar-16, 02:29:25 PST 24165 PMFIDW   active normal         10.192.136.43       2552 select * from f_order_li
5552655 bnr       ADM_USR     08-Mar-16, 05:36:02 PST 28184 PMFIDW   active normal         10.192.150.20          0
5563898 sql-odbc  BHAYDEN     08-Mar-16, 09:16:25 PST 28550 GCRTST   idle   normal          10.160.76.12       6636 select * from ( SELECT S
5564657 sql-odbc  QMACDWDS    08-Mar-16, 09:41:03 PST  6738 QMACDW   idle   normal        10.192.148.127       8124 select distinct DAX_GLAC

11. Kill those session of select statement.( after confirmation of users or App Team)

[nz@sherlock-host-1 postgres]$ nzsession abort -id 5545200
Are you sure you want to abort the session (y|n)? [n] y
You have new mail in /var/spool/mail/nz
[nz@sherlock-host-1 postgres]$ nzsession abort -id 5545372
Are you sure you want to abort the session (y|n)? [n] y
[nz@sherlock-host-1 postgres]$ nzsession abort -id 5545440
Are you sure you want to abort the session (y|n)? [n] y
[nz@sherlock-host-1 postgres]$ nzsession abort -id 5545457
Are you sure you want to abort the session (y|n)? [n] y

11. check session of them again and found no session.

[nz@sherlock-host-1 postgres]$ nzsession
ID      Type     User        Start Time              PID   Database State  Priority Name Client IP      Client PID Command
------- -------- ----------- ----------------------- ----- -------- ------ ------------- -------------- ---------- ------------------------
5563898 sql-odbc BHAYDEN     08-Mar-16, 09:16:25 PST 28550 GCRTST   idle   normal          10.160.76.12       6636 select * from PMFIDW..R_
5564657 sql-odbc QMACDWDS    08-Mar-16, 09:41:03 PST  6738 QMACDW   idle   normal        10.192.148.127       8124 select distinct DAX_GLAC
5564809 sql-odbc BWYATT      08-Mar-16, 09:46:34 PST  5458 GCRTST   idle   normal          10.160.72.94       5656 SELECT DATE_TIME_SCRAPED
5564844 sql-odbc AARAKELYAN  08-Mar-16, 09:48:17 PST 11903 GCRTST   idle   normal          10.160.76.27       2892 select identifier_case,
5564854 sql-odbc AARAKELYAN  08-Mar-16, 09:48:26 PST 11990 GCRTST   idle   normal          10.160.76.27       2892 Select SKU from threesix
5565295 sql-odbc QMACDWDS    08-Mar-16, 09:59:51 PST  5262 QMACDW   idle   normal         10.164.100.82       8124 SELECT * FROM ATL_ORGANI

12. check log file hopefully restore get done.

[nz@sherlock-host-1 postgres]$ vi /nz/kit/log/restoresvr/restoresvr.28184.2016-03-08.log

1 comment:

  1. Hi Abbas,

    This is Imad here, I have similar situation where my backup gets in hung state. Initially it was taking 6-7 hours but now after running with good throughput at start it will start slowing down, can you please suggest possible causes for this.

    ReplyDelete