Thursday, November 27, 2014

SYSAUX growing rapidly... What can be done..

A nice post from
https://aprakash.wordpress.com/2014/09/21/sysaux-growing-rapidly-what-can-be-done/
https://aprakash.wordpress.com/tag/vsysaux_occupants/

====================================================================
Recently i have been working on cleaning up SYSAUX tablespace for few of clients, so thought to put down my steps which might be helpful to some of you out there.
Why does SYSAUX tablespace grows much larger than expected?
There could be number of potential reasons:
1. ASH data has grown too large (SM/AWR)
2. High Retention Period
3. Segment Advisor has grown too large
4. Increase in older version of Optimizer Statistics (SM/OPTSTAT)
5. Bugs Bugs Bugs!!!!!
How do we identify the SYSAUX space Usage?
There are basically 2 ways to identify that i know of
1. Running @?/rdbms/admin/awrinfo.sql –> Detailed Info like Schema breakdown, SYSAUX occupants space usage etc.
2.
COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
   space_usage_kbytes/1048576 "Space Used (GB)",
   schema_name "Schema",
   move_procedure "Move Procedure"
   FROM v$sysaux_occupants
   ORDER BY 2
   /

 col owner for a6
 col segment_name for a50
  select * from
 (select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX' ORDER BY BLOCKS desc) where rownum < 11;

In my case, below 2 were occupying most of the space :-
1. SM/AWR
2. SM/OPTSTAT
SM/AWR — It refers to Automatic Workload Repository.Data in this section is retained for a certain amount of time (default 8 days). Setting can be checked through DBA_HIST_WR_CONTROL.
SM/OPSTAT — Stores older data of optimizer statistics.Setting can be checked through dbms_stats.get_stats_history_retention. This is not a part of AWR and is not controlled by AWR retention.
When looking at the top segments, i saw WRH$_ACTIVE_SESSION_HISTORY occupying most of the space. Sometimes AWR tables are not purged to settings in sys.wrm$_wr_control.
As per Oracle :-
Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.
If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.
Diagnose and Reduce Used Space of SYSAUX.
Once the major occupants and top segments is identified as discussed above, we can start with the steps to rectify it.
Expecting SM/AWR occupying most of the space , i think we can follow 3 methods. In this blog i will be posting one of the method only :)
To check Orphaned ASH rows :-
 SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
  WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

Check minimum snap_id in ASH table and then compare to the minimum snap_id in dba_hist_snapshot.
select min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;
select min(snap_id) from dba_hist_snapshot;
Example :-
select min(snap_id),MAX(snap_id) from dba_hist_snapshot;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
       17754        18523

select min(snap_id),MAX(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
           1        18523
Above as per the retention period, we should have data from snap_id 17754 till 18523, but the WRH$_ASH table has data from snap_id 1.
From Oracle MOS Doc :-
A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.Youwill still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:
alter session set “_swrf_test_action” = 72;
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS'
group by table_name order by 1;

TABLE_NAME                                           COUNT(*)
-------------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                                 2
WRH$_DB_CACHE_ADVICE                                        2
WRH$_DLM_MISC                                               2
WRH$_EVENT_HISTOGRAM                                        2
WRH$_FILESTATXS                                            11
WRH$_INST_CACHE_TRANSFER                                    2
WRH$_INTERCONNECT_PINGS                                     2
........................
25 rows selected.

SQL>  alter session set "_swrf_test_action"=72; 

Session altered.

SQL>  select table_name,partition_name from dba_tab_partitions where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                                         PARTITION_NAME
------------------------------  -------------------------------------------------------
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_0
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_18531  --> New Partition created 
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_SES_MXDB_MXSN

col table_name for a80
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS' group by table_name order by 1

TABLE_NAME                                   COUNT(*)
------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                     3
WRH$_DB_CACHE_ADVICE                            3
WRH$_DLM_MISC                                   3
WRH$_EVENT_HISTOGRAM                            3
......................

25 rows selected.

In the above example, WRH$_ACTIVE_1798927129_18531 is the new partition created where 1798927129 being the DBID and 18531 is the max(snap_id) when it was partitioned. So, now we can start dropping the snapshots range,which in my case is from 1 to 17753 as 17754 is the min(snap_id) in dba_hist_snapshot.
SQL> EXEC dbms_workload_repository.drop_snapshot_range(1,17753,1798927129);
It can generate good amount of redo and use undo. So keep monitoring undo tablespace and make sure you have sufficient space.
So, what happens when run the above :-
SQL> @sqlid ft7m07stk3dws
old   9:        sql_id = ('&1')
new   9:        sql_id = ('ft7m07stk3dws')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ft7m07stk3dws                            857847704 delete from WRH$_SYSTEM_EVENT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id  @sqlid 854knbb15976z
old   9:        sql_id = ('&1')
new   9:        sql_id = ('854knbb15976z')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
854knbb15976z                           3260325087 delete from WRH$_SQLSTAT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id <= b.end_snap_id))

So, internally oracle runs delete command which cause high redo and undo generation :)
Once the procedure is completed successfully, check the min(snap_id) in WRH$_ACTIVE_SESSION_HISTORY and perform shrink space cascade.
elect owner,segment_name,round(sum(bytes/1024/1024),2)MB, tablespace_name from dba_segments where segment_name = upper('WRH$_ACTIVE_SESSION_HISTORY') group by owner,segment_name,tablespace_name

OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        3538.06          SYSAUX

SQL> alter table WRH$_ACTIVE_SESSION_HISTORY shrink space cascade;

Table altered.


OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        46.75          SYSAUX

In similar fashion, other WRH$ tables can be shrink ed to free up space in SYSAUX.
Hope this helps!!!
Reference :-
WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (Doc ID 387914.1)
Suggestions if Your SYSAUX Tablespace Grows Rapidly or Too Large (Doc ID 1292724.1)
===================================================
the second method of purging sysaux.
Basically i tried to perform the steps as mentioned in previous post and drop_snapshot_range was taking too long (> 24hrs) and still running on test db.AgainWRH$_ACTIVE_SESSION_HISTORY was in top of the list occupying most of the SYSAUX space.
SYS01> EXEC dbms_workload_repository.drop_snapshot_range(25155,26155,3179571572);


From Another session after some time 

SYS01> @asw

USERNAME        SID    SERIAL# SPID EVENT          LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE     SQL_ID  PLAN_HASH_VALUE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------
SYS        7654 8641 47879 db file sequential read    28     -1       0 WAITED SHORT TIME   fqq01wmb4hgt8       763705880

SYS01> @orax fqq01wmb4hgt8
old   7:       sql_id  = '&&1'
new   7:       sql_id  = 'fqq01wmb4hgt8'

SQL_ID
-------------
SQL_FULLTEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fqq01wmb4hgt8
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and     tab.snap_id = b.start_snap_id) and     (tab.snap_id <= b.end_snap_id))

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('fqq01wmb4hgt8',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fqq01wmb4hgt8, child number 0
-------------------------------------
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and
    tab.snap_id =
b.start_snap_id) and         (tab.snap_id <=
b.end_snap_id))

Plan hash value: 763705880

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation    | Name        | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |        |        |       |   325K(100)|        |       |       |       |       |   |
|   1 |  DELETE     | WRH$_FILESTATXS    |        |       |     |        |       |       |       |       |   |
|*  2 |   FILTER    |        |        |       |     |        |       |       |       |       |   |
|   3 |    MERGE JOIN ANTI   |        |    494M|    23G|   325K  (1)| 01:05:08 |       |       |       |       |   |
|   4 |     PARTITION RANGE ITERATOR  |        |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |   |
|*  5 |      INDEX RANGE SCAN   | WRH$_FILESTATXS_PK |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |   |
|*  6 |     FILTER    |        |        |       |     |        |       |       |       |       |   |
|*  7 |      SORT JOIN    |        |      1 |    33 |     2 (50)| 00:00:01 |       |       | 73728 | 73728 |   |
|*  8 |       TABLE ACCESS BY INDEX ROWID| WRM$_BASELINE      |      1 |    33 |     1  (0)| 00:00:01 |       |       |       |       |   |
|*  9 |        INDEX RANGE SCAN   | WRM$_BASELINE_PK   |      1 |       |     1  (0)| 00:00:01 |       |       |       |       |   |
---------------------------------------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 0
   2 - (NUMBER): 95781
   3 - (NUMBER): 3179571572

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:BEG_SNAP=:BEG_SNAP AND "TAB"."SNAP_ID"="B"."START_SNAP_ID" AND "TAB"."SNAP_ID"=:BEG_SNAP AND "B"."START_SNAP_ID"<=:END_SNAP))
   9 - access("B"."DBID"=:DBID)

SYS01> col name format a10;
col VALUE_STRING format a30;
select name, position, datatype_string, was_captured, value_string,
anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid';
SYS01>   2  Enter value for sqlid: fqq01wmb4hgt8
old   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid'
new   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = 'fqq01wmb4hgt8'

NAME      POSITION DATATYPE_STRING         WAS VALUE_STRING
---------- ---------- ------------------------------------------------------------ --- ------------------------------
ANYDATA.ACCESSTIMESTAMP(VALUE_ANYDATA)
---------------------------------------------------------------------------
:BEG_SNAP     1 NUMBER          YES 0


:END_SNAP     2 NUMBER          YES 95781


:DBID      3 NUMBER          YES 3179571572

Interestingly, looking at the bind values shows value_string 0 and 95781 for BEG_SNAP and END_SNAP respectively, though the input range for drop snapshot was between 25155 and 26155.
The database was refreshed by client (so my session was no more) and so i thought not to take drop_snapshot_range approach. After going through few blogs and MOS documents, i thought we had 2 approaches :-
1. “Recreate the AWR tables as in the MOS note 782974.1″ , which would basically drop all WRH$* table and then recreate. The AWR tables contains wealth of important performance data which can be very useful in performance tuning trend analysis and also in comparing performance between two separate periods of time.Hence recreating AWR,I believe should be the last resort. The activity needs to be done in startup restrict mode so requires downtime.
And if you plan to go forward with it, I would recommend to export the AWR snapshot data using @?/rdbms/admin/awrextr.sql and keep the dump. In future it can used by simply importing to some other repository db to get the AWR data.
2. Simply delete the Orphaned rows from WRH$_ACTIVE_SESSION_HISTORY table and perform shrink space cascade.
I went ahead with 2nd approach and performed the below steps (Note: – DB was a single instance db)
SYS01> SELECT COUNT(1) Orphaned_ASH_Rows
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
        301206452

SYS01> alter table wrh$_active_session_history parallel 4;

Table altered.

SYS01> alter session force parallel dml;

Session altered.

SYS01> DELETE /*+ PARALLEL(a,4) */
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );


From Another session :-

SYS01> @asw

USERNAME        SID    SERIAL# SPID EVENT          LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- -------------------
SQL_ID       PLAN_HASH_VALUE
------------- ---------------
SYS         921 1329 107213 db file sequential read    60      0       0 WAITING
144bpj4qg68m1    2217072169

SYS        1227  889 107215 db file sequential read    60      0       0 WAITING
144bpj4qg68m1    2217072169

SYS        9181 3277 107211 db file sequential read    60      1       0 WAITED KNOWN TIME
144bpj4qg68m1    2217072169

SYS        3370  455 107727 SQL*Net message to client    0     -1       0 WAITED SHORT TIME
8tfjp8cd2xtd1     193683216

SYS        1840  809 107217 PX Deq Credit: need buffer   60      0       0 WAITING
144bpj4qg68m1    2217072169

SYS        8875 3889 107209 db file sequential read    60      1       0 WAITED KNOWN TIME
144bpj4qg68m1    2217072169

SYS        8266 3139 90257 PX Deq: Execute Reply  60      0      60 WAITING
144bpj4qg68m1    2217072169


SYS01> @parallel_sess

Username     QC/Slave SlaveSet SID     Slave INS STATE    WAIT_EVENT     QC SID QC INS Req. DOP Actual DOP
------------ -------- -------- ---------------------------------------- --------- -------- ------------------------------ ------ ------ -------- ----------
SYS      QC         8266     1   WAIT    PX Deq: Execute Reply   8266
 - p000      (Slave)  1        8875     1   WAIT    db file sequential read   8266  1        4   4
 - p003      (Slave)  1        1227     1   WAIT    db file sequential read   8266  1        4   4
 - p001      (Slave)  1        9181     1   WAIT    db file sequential read   8266  1        4   4
 - p002      (Slave)  1        921     1   WAIT    db file sequential read   8266  1        4   4
 - p004      (Slave)  2        1840     1   WAIT    PX Deq Credit: send blkd   8266  1        4   4
 - p007      (Slave)  2        2757     1   WAIT    PX Deq: Execution Msg   8266  1        4   4
 - p006      (Slave)  2        2450     1   WAIT    PX Deq: Execution Msg   8266  1        4   4
 - p005      (Slave)  2        2147     1   WAIT    PX Deq: Execution Msg   8266  1        4   4

SYS01> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('144bpj4qg68m1',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 144bpj4qg68m1, child number 0
-------------------------------------
DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a WHERE
NOT EXISTS   (SELECT 1  FROM wrm$_snapshot   WHERE snap_id   =
a.snap_id   AND dbid  = a.dbid   AND instance_number =
a.instance_number   )

Plan hash value: 2217072169

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation    | Name      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop | TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |      |    |    |   106K(100)|    |    |    |     |    |  |
|   1 |  PX COORDINATOR    |      |    |    |  |    |    |    |     |    |  |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001     |    298M| 11G|   106K  (1)| 00:21:14 |    |    |  Q1,01 | P->S | QC (RAND) |
|   3 |    DELETE    | WRH$_ACTIVE_SESSION_HISTORY   |    |    |  |    |    |    |  Q1,01 | PCWP |  |
|   4 |     PX RECEIVE    |      |    298M| 11G|   106K  (1)| 00:21:14 |    |    |  Q1,01 | PCWP |  |
|   5 |      PX SEND HASH (BLOCK ADDRESS)| :TQ10000     |    298M| 11G|   106K  (1)| 00:21:14 |    |    |  Q1,00 | P->P | HASH (BLOCK|
|   6 |       NESTED LOOPS ANTI   |      |    298M| 11G|   106K  (1)| 00:21:14 |    |    |  Q1,00 | PCWP |  |
|   7 |        PX PARTITION RANGE ALL  |      |    298M|  7404M|   106K  (1)| 00:21:14 |  1 |  3 |  Q1,00 | PCWC |  |
|   8 |  INDEX FULL SCAN   | WRH$_ACTIVE_SESSION_HISTORY_PK |    298M|  7404M|   106K  (1)| 00:21:14 |  1 |  3 |  Q1,00 | PCWP |  |
|*  9 |        INDEX UNIQUE SCAN  | WRM$_SNAPSHOT_PK    |  1 | 15 |  0   (0)|    |    |    |  Q1,00 | PCWP |  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - access("DBID"="A"."DBID" AND "SNAP_ID"="A"."SNAP_ID" AND "INSTANCE_NUMBER"="A"."INSTANCE_NUMBER")

The deletion of rows (301206452 rows) completed with elapsed time of 12:59:38.44.
301206452 rows deleted.

Elapsed: 12:59:38.44

SYS01> alter table wrh$_active_session_history noparallel ;

Table altered.

SYS01> select degree from dba_tables where table_name=upper('wrh$_active_session_history');

DEGREE
----------
         1

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
    309984

SYS01> DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );  

309984 rows deleted.

Elapsed: 00:00:19.08
SYS01> commit;

Commit complete.

Elapsed: 00:00:00.07

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
  0

SYS01> alter table wrh$_active_session_history shrink space cascade;

Table altered.

Elapsed: 06:47:21.36

Before this activity SM/AWR was occupying 339Gb which reduced to 209Gb. Also had SM/OPTSTAT occupying 143Gb space and after confirmation from client purged the stats as it was test db.
SYS01> exec DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL);

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.54

SYS01> COLUMN "Item" FORMAT A25
 COLUMN "Space Used (GB)" FORMAT 999.99
 COLUMN "Schema" FORMAT A25
 COLUMN "Move Procedure" FORMAT A40

    SELECT  occupant_name "Item",
    space_usage_kbytes/1048576 "Space Used (GB)",
    schema_name "Schema",
    move_procedure "Move Procedure"
    FROM v$sysaux_occupants
    WHERE occupant_name in  ('SM/AWR','SM/OPTSTAT')
    ORDER BY 1
    /23:47:31 EMTSYS01> 

Item                      Space Used (GB) Schema                    Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             209.16 SYS
SM/OPTSTAT                          19.72 SYS

Saving in SYSAUX
TABLESPACE_NAME                  TSP_SIZE USED_SPACE FREE_SPACE   PCT_FREE
------------------------------ ---------- ---------- ---------- ----------
SYSAUX                             505856     496310       9546       1.89  --> Before Size

SYSAUX                             505856     237833     268023      52.98  --> After Size
Hope this helps :)

SYSAUX GROWING RAPIDLY!!! WHAT CAN BE DONE

Recently i have been working on cleaning up SYSAUX tablespace for few of clients, so thought to put down my steps which might be helpful to some of you out there.
Why does SYSAUX tablespace grows much larger than expected?
There could be number of potential reasons:
1. ASH data has grown too large (SM/AWR)
2. High Retention Period
3. Segment Advisor has grown too large
4. Increase in older version of Optimizer Statistics (SM/OPTSTAT)
5. Bugs Bugs Bugs!!!!!
How do we identify the SYSAUX space Usage?
There are basically 2 ways to identify that i know of
1. Running @?/rdbms/admin/awrinfo.sql –> Detailed Info like Schema breakdown, SYSAUX occupants space usage etc.
2.
COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
   space_usage_kbytes/1048576 "Space Used (GB)",
   schema_name "Schema",
   move_procedure "Move Procedure"
   FROM v$sysaux_occupants
   ORDER BY 2
   /

 col owner for a6
 col segment_name for a50
  select * from
 (select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX' ORDER BY BLOCKS desc) where rownum < 11;

In my case, below 2 were occupying most of the space :-
1. SM/AWR
2. SM/OPTSTAT
SM/AWR — It refers to Automatic Workload Repository.Data in this section is retained for a certain amount of time (default 8 days). Setting can be checked through DBA_HIST_WR_CONTROL.
SM/OPSTAT — Stores older data of optimizer statistics.Setting can be checked through dbms_stats.get_stats_history_retention. This is not a part of AWR and is not controlled by AWR retention.
When looking at the top segments, i saw WRH$_ACTIVE_SESSION_HISTORY occupying most of the space. Sometimes AWR tables are not purged to settings in sys.wrm$_wr_control.
As per Oracle :-
Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.
If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.
Diagnose and Reduce Used Space of SYSAUX.
Once the major occupants and top segments is identified as discussed above, we can start with the steps to rectify it.
Expecting SM/AWR occupying most of the space , i think we can follow 3 methods. In this blog i will be posting one of the method only :)
To check Orphaned ASH rows :-
 SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
  WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

Check minimum snap_id in ASH table and then compare to the minimum snap_id in dba_hist_snapshot.
select min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;
select min(snap_id) from dba_hist_snapshot;
Example :-
select min(snap_id),MAX(snap_id) from dba_hist_snapshot;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
       17754        18523

select min(snap_id),MAX(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
           1        18523
Above as per the retention period, we should have data from snap_id 17754 till 18523, but the WRH$_ASH table has data from snap_id 1.
From Oracle MOS Doc :-
A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.Youwill still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:
alter session set “_swrf_test_action” = 72;
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS'
group by table_name order by 1;

TABLE_NAME                                           COUNT(*)
-------------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                                 2
WRH$_DB_CACHE_ADVICE                                        2
WRH$_DLM_MISC                                               2
WRH$_EVENT_HISTOGRAM                                        2
WRH$_FILESTATXS                                            11
WRH$_INST_CACHE_TRANSFER                                    2
WRH$_INTERCONNECT_PINGS                                     2
........................
25 rows selected.

SQL>  alter session set "_swrf_test_action"=72; 

Session altered.

SQL>  select table_name,partition_name from dba_tab_partitions where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                                         PARTITION_NAME
------------------------------  -------------------------------------------------------
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_0
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_18531  --> New Partition created 
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_SES_MXDB_MXSN

col table_name for a80
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS' group by table_name order by 1

TABLE_NAME                                   COUNT(*)
------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                     3
WRH$_DB_CACHE_ADVICE                            3
WRH$_DLM_MISC                                   3
WRH$_EVENT_HISTOGRAM                            3
......................

25 rows selected.

In the above example, WRH$_ACTIVE_1798927129_18531 is the new partition created where 1798927129 being the DBID and 18531 is the max(snap_id) when it was partitioned. So, now we can start dropping the snapshots range,which in my case is from 1 to 17753 as 17754 is the min(snap_id) in dba_hist_snapshot.
SQL> EXEC dbms_workload_repository.drop_snapshot_range(1,17753,1798927129);
It can generate good amount of redo and use undo. So keep monitoring undo tablespace and make sure you have sufficient space.
So, what happens when run the above :-
SQL> @sqlid ft7m07stk3dws
old   9:        sql_id = ('&1')
new   9:        sql_id = ('ft7m07stk3dws')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ft7m07stk3dws                            857847704 delete from WRH$_SYSTEM_EVENT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id  @sqlid 854knbb15976z
old   9:        sql_id = ('&1')
new   9:        sql_id = ('854knbb15976z')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
854knbb15976z                           3260325087 delete from WRH$_SQLSTAT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id <= b.end_snap_id))

So, internally oracle runs delete command which cause high redo and undo generation :)
Once the procedure is completed successfully, check the min(snap_id) in WRH$_ACTIVE_SESSION_HISTORY and perform shrink space cascade.
elect owner,segment_name,round(sum(bytes/1024/1024),2)MB, tablespace_name from dba_segments where segment_name = upper('WRH$_ACTIVE_SESSION_HISTORY') group by owner,segment_name,tablespace_name

OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        3538.06          SYSAUX

SQL> alter table WRH$_ACTIVE_SESSION_HISTORY shrink space cascade;

Table altered.


OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        46.75          SYSAUX

In similar fashion, other WRH$ tables can be shrink ed to free up space in SYSAUX.
Hope this helps!!!
Reference :-
WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (Doc ID 387914.1)
Suggestions if Your SYSAUX Tablespace Grows Rapidly or Too Large (Doc ID 1292724.1)

No comments:

Post a Comment