转自:http://www.aixchina.net/Question/29969
前几天在客户数据库做巡检的时候,在警告日志中发现有如下警告:
引用
WARNING: You are creating datafile /dev/rtbs_data01.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
如果在AIX环境创建lv,如果创建带有4k偏移量的lv,Oracle 10g将做如上提示,使用参数-T O,AIX对-T 0有如下解释
引用
-T O
For big vg format volume groups, the -T O option indicates that the logical volume control block will not occupy the first block of the logical volume.
Therefore, the space is available for application data. Applications can identify this type of logical volume with the IOC INFO ioctl. The logical volume
has a device subtype of DS_LVZ. A logical volume created without this option has a device subtype of DS_LV. This option is ignored for old and scalable
vg format volume groups.
我们对AIX解释做进一步延伸:
AIX在创建vg时有3中vg类型可选,分别是 Original Volume Group,Big Volume Group和Scalable Volume Group
对于普通的VG(Original Volume Group),不管你使用什么命令创建lv,都是普通的DS_LV类型的LV。
对于Big VG,是唯一允许同时存在这两种LV类型的VG,如果我们指定-T O(注意,这里是大写的字母O),则创建DS_LVZ类型的LV,否则,创建普通类型的LV。如
/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs。
对于Scalable-type VG类型的VG,不管你使用什么方式的命令创建lv,都是扩展的DS_LVZ类型的LV。
由Oracle的警告日志可以看出,Oracle 使用raw设备时,建议设置不带4k的lv。那我们不禁有3个疑问:
(1)这4k偏移量有什么用处?
(2)怎么样查看LV是否带有4K偏移量呢?
(3)设置这4k偏移量有什么坏处?
AIX将这4k偏移量称之为lvcb(logical volume control block),它将占用4k的前512个字节,它类似于Oracle数据文件头,保留有lv的创建时间,镜像拷贝信息,文件系统挂载点等。
可以通过getlvcb命令查看lvcb信息:
引用
# getlvcb -AT fslv02
AIX LVCB
intrapolicy = m
copies = 1
interpolicy = m
lvid = 000b56cc00004c000000012d264b87e5.14
lvname = fslv02
label = /ora10g
machine id = B56CC4C00
number lps = 112
relocatable = y
strict = y
stripe width = 0
stripe size in exponent = 0
type = jfs2
upperbound = 32
fs = vfs=jfs2:log=/dev/loglv00:mount=true:options=rw:account=false
time created = Mon Apr 18 09:52:50 2011
time modified = Mon Apr 18 09:52:56 2011
从2个方面可以查看lv是否有4k偏移量
1、主机层面
没有4k偏移量:
引用
#lslv jfkdb_2G_044
LOGICAL VOLUME: jfkdb_2G_044 VOLUME GROUP: jfk_dbvg_01
LV IDENTIFIER: 00c3dff400004c00000001217a9d839e.84 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: raw WRITE VERIFY: off
MAX LPs: 1024 PP SIZE: 32 megabyte(s)
COPIES: 1 SCHED POLICY: parallel
LPs: 64 PPs: 64
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: maximum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 1024
MOUNT POINT: N/A LABEL: None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
DEVICESUBTYPE : DS_LVZ
有4k偏移量:
引用
[[email protected]_p560q /]# lslv jfkdb_2G_044
LOGICAL VOLUME: jfkdb_2G_044 VOLUME GROUP: jfk_db_vg01
LV IDENTIFIER: 00ce76de00004c00000001134ee6bc51.84 PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/syncd
TYPE: raw WRITE VERIFY: off
MAX LPs: 1024 PP SIZE: 32 megabyte(s)
COPIES: 1 SCHED POLICY: parallel
LPs: 64 PPs: 64
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: maximum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 16
MOUNT POINT: N/A LABEL: None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
(2)Oracle层面:
Oracle提供了一小工具dbfsize(在$ORACLE_HOME/bin下)用于观察lv是否有4k偏移量
无4k偏移量:
引用
$ dbfsize /dev/rlvsysaux_1g
Database file: /dev/rlvsysaux_1g
Database file type: raw device without 4K starting offset
Database file size: 40960 8192 byte blocks
有4k偏移量:
引用
[[email protected]_p560q /dev]$ dbfsize /dev/rjfkdb_2G_054
Database file: /dev/rjfkdb_2G_054
Database file type: raw device
Database file size: 262016 8192 byte blocks
如果数据库使用block size为16k,创建跨pv带有4k偏移的lv,条带块大小为64k。这样将导致第4个block横跨2个pv(条带化操作,把lvcb也计算进条带块中)。这样会导致
条带块的第4个Oracle block跨磁盘,撇开性能方面考虑,如果系统异常宕机,或者存储异常宕机,极易引起数据库块损坏,引起ora-01578错误。(metalink ID 261460.1)
引用
$ oerr ora 01578
01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)"
// *Cause: The data block indicated was corrupted, mostly due to software
// errors.
// *Action: Try to restore the segment containing the block indicated. This
// may involve dropping the segment and recreating it. If there
// is a trace file, report the errors in it to your ORACLE
// representative.
那是不是不做条带化,lv保留4k,就没问题了呢?
答案还是否定的。如果lv横跨pv,pp size 为64m,那么(64m-4k)/16k,还是除不尽,那问题还是依旧。
Oracle从 9.2.0.3开始可以识别无4k偏移量的lv,那是不是创建无4k偏移量的lv就万事大吉了呢?可惜不是,bug如期而至:
https://www-304.ibm.com/support/docview.wss?uid=isg1IY94343
也就是说当系统重启或者执行chlv之类命令,DS_LVZ标记将会消失,也就意味着Oracle认为此lv有4k偏移量,那也就存在着Oracle block横跨pv的可能性,
如果运气不好的话,ora-01578又不期而至,噩梦由此开始。
引用
IY94343: MKLV -TO ON BIG VOLUME GROUPS FAILS TO PUT SOME LV INFORMATION APPLIES TO AIX 5300-07
****************************************************************
* USERS AFFECTED:
* Users of BIG volume groups with the bos.rte.lvm fileset at
* the 5.3.0.53 or 5.3.0.54 level.
****************************************************************
* PROBLEM DESCRIPTION:
* When creating a logical volume with a device type of DS_LVZ
* using the ‘-TO‘ flag, lslv reports a DEVICESUBTYPE of DS_LV
* rather than DS_LVZ. The problem shows up only after a reboot
* or any subsequent chlv or other LVM command that can update
* the VGDA on disk.
* This problem can cause some applications, such as Oracle, to
* fail to start, and could result in database corruption.
如果没有这个bug,即没有4k的offset,如果db_block_size比strip size大,问题还是存在的,即也会存在跨pv,这是我们建条带化所需要注意的,事实上,我也没看到过条带化大小比block size小的环境。但是这里又引申出一个问题,如果存储底层硬盘全部打散,且已做条带化,并虚拟出硬盘,那讨论应该复杂的多,可能操作系统需要跨磁盘的 block,真正在物理并没有跨磁盘。