Tuesday, December 3, 2013

Undo Usage in Oracle



In this blog post I will talk about the basic workings of Automatic Undo Management, which can cause ORA-01555 and ORA-30036 issues.
The scope is Automatic Undo Management used in 10g and 11g, but has to be explicitly set for 9i (UNDO_MANAGEMENT = AUTO). Manual Undo Management is out of scope for this blog.
The Undo tablespace is a normal tablespace like any other, but only Oracle is controlling what is happening inside it.

Undo something

The Undo tablespace is used for several features: ROLLBACK, READ CONSISTENCY and FLASHBACK technology.

Rollback

Rollback is easy to understand, if you are not happy with some data modifications, you want to ‘undo’ it: Rollback.
The original (non modified) information within a transaction is stored in a separate Undo tablespace, because the database is designed for COMMIT to be fast, not rolling back.

Read Consistency

Another mechanism Undo information is used for is Read Consistency, which means if you run a query at 9:00 for 10 minutes, you want all the data to be from 9:00. You don’t want it to read data that has been modified at 9:02 and 9:06 or data that hasn’t been committed yet.
So, to support Read Consistency, Oracle must keep the original data (committed or not) for these 10 minutes until the query is finished.
The problem is, you actually don’t know how long the query will run for, so the general rule is to set this ‘keep-old-data-period’ to the longest running query. This is because you also want your longest running query to read consistent data.
This ‘keep-old-data-period’ is called ‘UNDO_RETENTION’ and defaults to 900 seconds, which means the database tries to keep all old changed information for 900 seconds.

Flashback

Some Oracle features are build based upon using Undo information, meaning undo is more utilized.
Because ‘old’ data is stored for a certain time (UNDO_RETENTION), one can access this information to have look at data back in time by using FLASHBACK features: ‘How did the contents of this table looked like ten minutes ago?’. This information can be used for recovery from user-errors.
Flashback features using Undo are:
  • Flashback Query (based on time)
  • Flashback Versions Query (based on SCN)
  • Flashback Transaction Query (based on period)
  • Flashback Table (based on time)
Flashback Drop and Flashback Database do not use Undo information. Flashback Drop is using ‘not yet recycled segment and extents’ and Flashback Database is a separate mechanism using the Flash/Fast Recovery Area, by taking ‘snapshots’ and redo information.

Undo Lifetime

Undo information has different states during it’s lifecycle, depending on running transactions and retention settings.
There are three states or types of extents in the Undo tablespace: ACTIVE, EXPIRED and UNEXPIRED. Oracle is still using Rollback segments, but with Automatic Undo Management these are completely controlled by Oracle.

ACTIVE

Active undo extents are used by transactions and will always be active, because they are needed for Rollback. The UNDO_RETENTION setting is not used here, because one can not say something like: ‘after 900 seconds you are not allowed to rollback anymore…’
You will get ‘ORA-30036 unable to extend segment in Undo tablespace‘ errors when no more space is left to store ACTIVE Undo. This will automatically rollback the transaction causing it. TheNOSPACEERRCNT column in V$UNDOSTAT is a good indication how many times this has occurred.

EXPIRED

Expired extents are not used by transactions, the data in these extends is committed and the UNDO_RETENTION time has passed, so it is not needed for Read Consistency.

UNEXPIRED

Unexpired extents are non-active extents that still honour UNDO_RETENTION. The transactions belonging to these undo extents are committed, but the retention time has not passed: You still want/need these for Read Consistency!
When the Undo mechanism requires more extents for ACTIVE extents, it is allowed to steal UNEXPIRED extents when there are no EXPIRED extents left for reuse and it can not allocate more free extents (autoextend maxsize reached or fixed tablespace size). One can check the steal-count inUNXPSTEALCNT in V$UNDOSTAT.
You will get ‘ORA-01555 snapshot too old‘ errors if no Read Consistency information for a query is available. The SSOLDERRCNT in V$UNDOSTAT will show a count of these errors.

Summary

  • Active undo is used by active transactions: rollback and read consistency.
  • Expired undo is old and can be reused.
  • Unexpired undo is used for read consistency, retention time has not passed yet.
  • Unexpired undo can be stolen for Active undo. If this is happening you can get ORA-01555 before Undo retention has passed.
  • Unexpired undo can be secured by setting the RETENTION GUARANTEE option when creating the Undo tablespace (see Undo Sizing).

Undo extent status examples

With the next query you go through the contents of the Undo tablespace and sum the extent types:
select status,
  round(sum_bytes / (1024*1024), 0) as MB,
  round((sum_bytes / undo_size) * 100, 0) as PERC
from
(
  select status, sum(bytes) sum_bytes
  from dba_undo_extents
  group by status
),
(
  select sum(a.bytes) undo_size
  from dba_tablespaces c
    join v$tablespace b on b.name = c.tablespace_name
    join v$datafile a on a.ts# = b.ts#
  where c.contents = 'UNDO'
    and c.status = 'ONLINE'
);
It will sum the three types of extents and shows the distribution of them within the Undo tablespace. ‘Free’ extents are not shown.

‘Normal’ operation

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE            10          4
EXPIRED          110         43
UNEXPIRED         25         10
This is an example of ‘normal’ contents of the Undo tablespace. The system is using ACTIVE extents, some are UNEXPIRED used for read consistency and there are EXPIRED extents which can be reused.

Out of Free/EXPIRED extents

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE           230         90
EXPIRED            0          0
UNEXPIRED         26         10
When the system is under load and the EXPIRED extents are near 0%, the total of ACTIVE and UNEXPIRED is near 100% and the Undo tablespace is not able to extend, Oracle will steal UNEXPIRED extents for ACTIVE extents. If this is the case you might expect ORA-01555 errors, because Undo retention can not be met.

Out of Undo space

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE           255        100
EXPIRED            0          0
UNEXPIRED          1          0
When the system is under load and the ACTIVE extents are near 100%, the total of EXPIRED and UNEXPIRED is near 0% and the Undo Tablespace is not able to extend, Oracle is not able to allocate free extents or steal UNEXPIRED extents for ACTIVE extents. If this is the case you might expect ORA-30036 errors.

Retention to large or UNDO to small?

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE             2          1
EXPIRED            0          0
UNEXPIRED        254         99
In this case, all undo extents are used for the retention period. It might be the retention is to large, or the UNDO tablespace is to small. A DBA must investigate this and take a decision!

Undo Sizing

Storing undo data for a certain amount of time will need space and based on the activity on the database system, it is written at a certain ‘rate’.
From this you can deduct an equation: RATE x RETENTION = SPACE. Some overhead must be added, but that varies between database versions used and data types stored.
If you look at the undo equation, the Undo tablespace size or the retention time can be fixed. A fixed rate can not be set, because it depends on database load.
Since Oracle 10g, the database will be more efficient if the same record is updated more than once in a transaction, it will re-use those ACTIVE extents.

Fixed Size

When the Undo tablespace size is fixed (datafile autoextend=NO), Oracle tunes the Retention Time for the amount of Undo data it is generating to fit into the Undo tablespace. The UNDO_RETENTION parameter will now be used as a minimum, but may automatically be tuned larger when enough space is available.
One can check the tuned Undo retention time in V$UNDOSTAT, using the TUNED_UNDORETENTIONcolumn.
In Oracle 9i, it seems Oracle is not actually tuning this, but is only trying to maintain the Undo retention time. Also the TUNED_UNDORETENTION column is absent in 9i.
When you choose the Undo tablespace to be fixed, you can use the Undo Advisor to estimate the needed sizing.

Fixed Size, out of UNEXPIRED extents? Check TUNED_UNDORETENTION!

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE             2          1
EXPIRED            0          0
UNEXPIRED        254         99
Because Oracle is able to extend the retention time, more UNEXPIRED extents are created. In this case, if the Undo tablespace is full, check the TUNED_UNDORETENTION against UNDO_RETENTION. If the tuned retention is much larger, 99% full does not mean a problem!
Take a look at the following query, it will calculate the UNDO total with the following assumption: ACTIVE takes what is needs, EXPIRED ‘is empty’ and UNEXPIRED will be re-calculated against the division of UNDO_RETENTION/TUNED_UNDORETENTION.
BREAK ON REPORT
COMPUTE SUM OF MB ON REPORT
COMPUTE SUM OF PERC ON REPORT
COMPUTE SUM OF FULL ON REPORT

select status,
 round(sum_bytes / (1024*1024), 0) as MB,
 round((sum_bytes / undo_size) * 100, 0) as PERC,
 decode(status, 'UNEXPIRED', round((sum_bytes / undo_size * factor) * 100, 0),
                'EXPIRED',   0,
                             round((sum_bytes / undo_size) * 100, 0)) FULL
from
(
 select status, sum(bytes) sum_bytes
 from dba_undo_extents
 group by status
),
(
 select sum(a.bytes) undo_size
 from dba_tablespaces c
 join v$tablespace b on b.name = c.tablespace_name
 join v$datafile a on a.ts# = b.ts#
 where c.contents = 'UNDO'
 and c.status = 'ONLINE'
),
(
 select tuned_undoretention, u.value, u.value/tuned_undoretention factor
 from v$undostat us
 join (select max(end_time) end_time from v$undostat) usm
    on usm.end_time = us.end_time
 join (select name, value from v$parameter) u
    on u.name = 'undo_retention'
);
When running this query, the next result will show when UNDO_RETENTION = 900 and TUNED_UNDORETENTION is about 1800 seconds:
STATUS            MB       PERC       FULL
--------- ---------- ---------- ----------
ACTIVE             2          1          1
EXPIRED            0          0          0
UNEXPIRED        254         99         50
          ---------- ---------- ----------
sum              256        100         51
Unexpired at 99% is not really a problem here, because the tuned retention is twice as large as the desired retention!
Since 10gR2, a maximum retention is introduced. The longest period of tuned undo I have seen is 96 hours. Automatic tuning retention can also be turned off using the hidden ‘_undo_autotune=false’ parameter (don’t use until Oracle suggested this hidden parameter). See also My Oracle Support Note: Full UNDO Tablespace In 10gR2 [ID 413732.1].

Fixed/Auto Retention

If the Undo tablespace is configured with the autoextend option for the data files, Oracle sets the Retention Time to the time it takes for the longest-running query to run. This can result in a large Undo tablespace if there are un-tuned queries running on your system.
Again in 9i, even though it is called Automatic Undo Management, UNDO_RETENTION parameter seems always ‘fixed’, but it does mean you don’t have to bother about Rollback Segments.

Shrink Undo tablespace

The Undo tablespace can only grow larger, but it can not shrink by itself. If you want to shrink the Undo tablespace, create a new one and set the UNDO_TABLESPACE parameter to the new Undo tablespace.

Retention Guaranteed

When you create the Undo tablespace with the RETENTION GUARANTEE option, UNEXPIRED Undo information will never get stolen. Set this if you want to guarantee Read Consistency or when you want to use Flashback with a guaranteed point-in-time!
Beware that when this is set, the chance of ORA-30036 errors increases. It’s your choice: ORA-30036 or ORA-01555…

Setting the UNDO_RETENTION parameter to the longest running query

A good practice is to set the UNDO_RETENTION parameter to the longest running query, to avoid ORA-01555 (read consistency) errors. To get a good indication about the longest running query in the last 7 days, try:
select max(maxquerylen) from v$undostat;
One can also try V$SESSION_LONGOPS and V$TRANSACTION.
If you want to increase your Flashback period, take the largest of these two.

How much Undo will this generate?

Again take a look at V$UNDOSTAT and the UNDOBLKS column in particular.
Multiply these UNDOBLKS (per 10 minutes by default) times your BLOCKSIZE times the MAXQUERYLEN.
For a worst case scenario size you can calculate much undo would have been generated when you multiply the highest rate with the longest query:
select
  round(max(undoblks/600)*8192*max(maxquerylen)/(1024*1024)) as "UNDO in MB"
from v$undostat;
But, it could be your longest running query will not run when the most undo is generated…

Undo Advisor

The Undo Advisor can be found in the Oracle Enterprise Manager or by using the DBMS_ADVISOR package.
Undo Advisor
Undo Advisor
When opening the Undo advisor, it will show the current retention time and tablespace settings, but also shows analysis results, potential problems and recommendations.
These results are based on a 7 days period, analysing longest query or Flashback duration and Undo generation rates. This can be made visible thought the Undo graph.
Undo Retention Graph
Undo Retention Graph
New: When unchanged, this shows the current retention time setting. You can change the UNDO_RETENTION to a new value by selecting a dot on the line in the graph.
Auto-tuned Undo retention: This is the retention time Oracle can currently hold. It will use the UNDO_RETENTION as a minimum.
Best Possible Undo retention: With the current size (fixed) or maxsize (autoextend) of the Undo tablespace, this is the retention time it possibly could hold.
Oracle uses the statistics in the V$UNDOSTAT view to tune the Undo mechanism. A DBA can also use this view, together with V$ROLLSTAT to get a good indication of current workload. The DBA_HIST_UNDOSTAT view contains statistical snapshots of V$UNDOSTAT information.
All this information is based on the selected Analysis Time Period with the Undo generation rate within that period.

LOBs

Automatic Undo Management not supported for LOBs. Undo information for LOBs is not stored Undo tablespace, but in the segment itself. For LOBs, the database uses the UNDO_RETENTION as a minimum, but when space becomes a problem, the UNEXPIRED Undo information for the LOB may be reused.

Conclusion

Even though Automatic Undo Management is able to tune itself, one needs to keep an eye on transaction duration, retention time and space consumed.
With the addition of Flashback technology to the database, the Undo tablespace is now also used to recover from user errors. When Retention Guaranteed is used, more priority is given to support Read Consistency and Flashback operations, because Oracle will not steal UNEXPIRED extents.
Furthermore the Undo Advisor can be used to visualise retention time vs. space consumed, but in case of Undo related errors, a DBA still needs to analyse V$UNDOSTAT and related views to solve problems other than ‘just enlarge’ the Undo tablespace.

No comments:

Post a Comment