TimesTen 数据库复制学习：15. 监控复制系统 / 憋错料

复制系统的监控主要通过ttStatus 和 ttRepAdmin命令，也可以从系统表中得到信息。

以下的输出，基于cachedb1为复制源，cachedb2为复制目标

显示复制代理的状态

ttstatus显示复制代理状态

注意Replication agent is running. 这行

$ ttstatus
TimesTen status report as of Wed Jun 29 18:04:26 2016

Daemon pid 2644 port 53392 instance tt1122
TimesTen server pid 2653 started on port 53393
------------------------------------------------------------------------
Data store /home/oracle/TimesTen/tt1122/info/DemoDataStore/repdb1_1122
There are no connections to the data store
Replication policy  : Manual
Cache Agent policy  : Manual
PL/SQL enabled.
------------------------------------------------------------------------
Data store /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1
There are 17 connections to the data store
Shared Memory KEY 0x2c00c901 ID 3375121
PL/SQL Memory KEY 0x2d00c901 ID 3407890 Address 0x7fa0000000
Type            PID     Context             Connection Name              ConnID
Replication     6382    0x00000000035ab270  REPHOLD:1109059904              130
Replication     6382    0x00000000036000c0  REPLISTENER:1093531968          128
Replication     6382    0x000000000361e360  TRANSMITTER(M):1099831616       127
Replication     6382    0x0000000003672f90  LOGFORCE:1096681792             126
Replication     6382    0x00007fb7480009e0  FAILOVER:1103178048             131
Replication     6382    0x00007fb7480155d0  XLA_PARENT:1081403712           129
Subdaemon       2649    0x0000000000e82360  Manager                         142
Subdaemon       2649    0x0000000000ef9430  Rollback                        141
Subdaemon       2649    0x0000000000fcef50  Flusher                         140
Subdaemon       2649    0x0000000001008980  Checkpoint                      132
Subdaemon       2649    0x000000000101d570  Deadlock Detector               137
Subdaemon       2649    0x00000000010721a0  AsyncMV                         136
Subdaemon       2649    0x00000000010df8d0  IndexGC                         135
Subdaemon       2649    0x0000000001134500  Aging                           134
Subdaemon       2649    0x0000000001189130  Monitor                         133
Subdaemon       2649    0x00000000012b6130  HistGC                          139
Subdaemon       2649    0x00000000013473e0  Log Marker                      138
Replication policy  : Manual
Replication agent is running.
Cache Agent policy  : Manual
PL/SQL enabled.
------------------------------------------------------------------------
Accessible by group oracle
End of report

ttAdmin -query 显示代理启动策略

$ ttAdmin -query cachedb1
......
Replication Agent Policy        : manual
Replication Manually Started    : True
......

使用 ttDataStoreStatus 显示复制代理状态

cachedb1> call ttDataStoreStatus;
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6422, 000000000064FAD0, application     , 2C00C901, cachedb1                      , 1 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 00000000035AB270, replication     , 2C00C901, REPHOLD:1109059904            , 130 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 00000000036000C0, replication     , 2C00C901, REPLISTENER:1093531968        , 128 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 000000000361E360, replication     , 2C00C901, TRANSMITTER(M):1099831616     , 127 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 0000000003672F90, replication     , 2C00C901, LOGFORCE:1096681792           , 126 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 00007FB7480009E0, replication     , 2C00C901, FAILOVER:1103178048           , 131 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 6382, 00007FB7480155D0, replication     , 2C00C901, XLA_PARENT:1081403712         , 129 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000001008980, subdaemon       , 2C00C901, Checkpoint                    , 132 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 000000000101D570, subdaemon       , 2C00C901, Deadlock Detector             , 137 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 00000000010721A0, subdaemon       , 2C00C901, AsyncMV                       , 136 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 00000000010DF8D0, subdaemon       , 2C00C901, IndexGC                       , 135 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000001134500, subdaemon       , 2C00C901, Aging                         , 134 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000001189130, subdaemon       , 2C00C901, Monitor                       , 133 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 00000000012B6130, subdaemon       , 2C00C901, HistGC                        , 139 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 00000000013473E0, subdaemon       , 2C00C901, Log Marker                    , 138 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000000E82360, subdaemon       , 2C00C901, Manager                       , 142 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000000EF9430, subdaemon       , 2C00C901, Rollback                      , 141 >
< /home/oracle/TimesTen/tt1122/info/DemoDataStore/cachedb1, 2649, 0000000000FCEF50, subdaemon       , 2C00C901, Flusher                       , 140 >

显示master(复制源)数据库信息

可使用ttRepAdmin或查询系统表

$ ttRepAdmin -dsn cachedb1 -self -list
Self host "TIMESTEN-HOL", port 12306, name "CACHEDB1", LSN 3/22901000
Operation successful

其中LSN 3/22901000 表示最老的未传输的日志位于日志文件3，LSN为22901000。

使用系统表可得到类似信息：

cachedb1> SELECT t.host_name, t.rep_port_number, t.tt_store_name
        > FROM ttrep.ttstores t, ttrep.repstores s
        > WHERE t.is_local_store = 0x01
        > AND t.tt_store_id = s.tt_store_id;
< TIMESTEN-HOL, 12306, CACHEDB1 >

显示subscriber(复制目标)数据库信息

ASP正常情况

使用ttRepAdmin

$ ttRepAdmin -dsn cachedb2 -receiver -list
Peer name         Host name                 Port    State  Proto Track
----------------  ------------------------ ------  ------- ----- -----
CACHEDB1          TIMESTEN-HOL              Auto   Start      36     0

Last Msg Sent Last Msg Recv Latency TPS     RecordsPS Logs
------------- ------------- ------- ------- --------- ----
00:00:04      00:00:13        -1.00      -1        -1    1

输出第一行表示复制的定义：cachedb2的源数据库为TIMESTEN-HOL上的CACHEDB1，复制端口自动协商(Auto), 状态Start表示正常。

第二行表示延迟和速率信息。注意Latency，-1表示无延迟。另外Last Msg Sent 和 Last Msg Recv也是很有用的信息。

ttReplicationStatus

cachedb2> call ttReplicationStatus(‘cachedb1‘, ‘timesten-hol‘);
< CACHEDB1, TIMESTEN-HOL, 0, start     , 1, 2, _ACTIVESTANDBY                , TTREP                          >
cachedb2> call ttReplicationStatus(‘cachedb1‘, ‘timesten-hol‘);
< CACHEDB1, TIMESTEN-HOL, 0, start     , 1, 0, _ACTIVESTANDBY                , TTREP
cachedb2> call ttReplicationStatus;
< CACHEDB1, TIMESTEN-HOL, 0, start     , 1, 1, _ACTIVESTANDBY                , TTREP                          >

倒数第3列表示日志滞后，2为2秒，0为无延迟

系统表查询

cachedb1>
SELECT t1.tt_store_name, t1.host_name, t1.rep_port_number,
p.state, p.protocol, p.timesend, p.timerecv, p.latency,
p.tps, p.recspersec, t3.last_log_file - p.sendlsnhigh + 1
  FROM ttrep.reppeers p, ttrep.ttstores t1, ttrep.ttstores t2, sys.monitor t3
  WHERE p.tt_store_id = t1.tt_store_id
    AND t2.is_local_store = 0X01
    AND p.subscriber_id = t2.tt_store_id
    AND p.replication_name = ‘_ACTIVESTANDBY‘
    AND p.replication_owner = ‘TTREP‘
    AND (p.state = 0 OR p.state = 1);
< CACHEDB2, TIMESTEN-HOL, 0, 0, 36, 0, 1467250954, -1.00000000000000, -1, -1, 5 >

此SQL语句不如ttRepAdmin -dsn cachedb2 -receiver -list直观。

说明一下，对于active standby pair，replication_name 固定为_ACTIVESTANDBY，replication owner 固定为TTREP。

显示复制数据库的配置

使用ttisql repschemes

cachedb2> repschemes;

Replication Scheme Active Standby:

  Master Store: CACHEDB1 on TIMESTEN-HOL
  Master Store: CACHEDB2 on TIMESTEN-HOL

  Excluded Tables:
    None

  Excluded Cache Groups:
    None

  Excluded sequences:
    None

  Store: CACHEDB1 on TIMESTEN-HOL
    Port: (auto)
    Log Fail Threshold: (none)
    Retry Timeout: 120 seconds
    Compress Traffic: Disabled

  Store: CACHEDB2 on TIMESTEN-HOL
    Port: (auto)
    Log Fail Threshold: (none)
    Retry Timeout: 120 seconds
    Compress Traffic: Disabled

使用ttRepAdmin -showconfig

$ ttRepAdmin -showconfig -dsn cachedb1

Self host "TIMESTEN-HOL", port auto, name "CACHEDB1", LSN 3/31297800, timeout 120, threshold 0

List of subscribers
-------------------

Peer name         Host name                 Port    State  Proto Track
----------------  ------------------------ ------  ------- ----- -----
CACHEDB2          TIMESTEN-HOL              Auto   Start      36     0

Last Msg Sent Last Msg Recv Latency TPS     RecordsPS
------------- ------------- ------- ------- ---------
00:00:04      00:00:08        -1.00      -1        -1 

List of objects and subscriptions
---------------------------------

Table details
-------------
Table : TTHR.A   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB1                  CACHEDB2                

Table details
-------------
Table : TTHR.A   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB2                  CACHEDB1                

Table details
-------------
Table : TTHR.A1   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB1                  CACHEDB2                

Table details
-------------
Table : TTHR.A1   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB2                  CACHEDB1                

Table details
-------------
Table : TTHR.A2   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB1                  CACHEDB2                

Table details
-------------
Table : TTHR.A2   Timestamp updates : -  

Master Name               Subscriber name
-----------               ---------------
CACHEDB2                  CACHEDB1                

Datastore details
-----------------
Master Name               Subscriber name
-----------               ---------------
CACHEDB1                  CACHEDB2                

Datastore details
-----------------
Master Name               Subscriber name
-----------               ---------------
CACHEDB2                  CACHEDB1

查询系统表

SELECT t.host_name, t.rep_port_number, t.tt_store_name, s.peer_timeout,
s.fail_threshold
  FROM ttrep.ttstores t, ttrep.repstores s
    WHERE t.is_local_store = 0X01
      AND t.tt_store_id = s.tt_store_id;

< TIMESTEN-HOL, 0, CACHEDB1, 120, 0 >

SELECT t1.tt_store_name, t1.host_name, t1.rep_port_number,
       p.state, p.protocol, p.timesend, p.timerecv, p.latency,
       p.tps, p.recspersec, t3.last_log_file - p.sendlsnhigh + 1
  FROM ttrep.reppeers p, ttrep.ttstores t1, ttrep.ttstores t2, sys.monitor t3
    WHERE p.tt_store_id = t2.tt_store_id
      AND t2.is_local_store = 0X01
      AND p.subscriber_id = t1.tt_store_id
      AND (p.state = 0 OR p.state = 1);

< CACHEDB1, TIMESTEN-HOL, 0, 0, 36, 1467270082, 0, -1.00000000000000, -1, -1, 1 >

SELECT ds_obj_owner, DS_OBJ_NAME, t1.tt_store_name,t2.tt_store_name
  FROM ttrep.repelements e, ttrep.repsubscriptions s,
      ttrep.ttstores t1, ttrep.ttstores t2
    WHERE s.element_name = e.element_name
      AND e.master_id = t1.tt_store_id
      AND s.subscriber_id = t2.tt_store_id
    ORDER BY ds_obj_owner, ds_obj_name;

< TTHR                           , A                              , CACHEDB1, CACHEDB2 >
< TTHR                           , A                              , CACHEDB2, CACHEDB1 >
< TTHR                           , A1                             , CACHEDB1, CACHEDB2 >
< TTHR                           , A1                             , CACHEDB2, CACHEDB1 >
< TTHR                           , A2                             , CACHEDB1, CACHEDB2 >
< TTHR                           , A2                             , CACHEDB2, CACHEDB1 >
< TTHR                           , A3                             , CACHEDB1, CACHEDB2 >
< TTHR                           , A3                             , CACHEDB2, CACHEDB1 >
< TTHR                           , __DATASTORE                    , CACHEDB1, CACHEDB2 >
< TTHR                           , __DATASTORE                    , CACHEDB2, CACHEDB1 >
< TTREP                          , CLIENTFAILOVER                 , CACHEDB1, CACHEDB2 >
< TTREP                          , CLIENTFAILOVER                 , CACHEDB2, CACHEDB1 >
cachedb1>

显示复制日志

先来看一下重要的概念：

In a replicated database, transactions remain in the transaction log buffer and transaction log files until the master replication agent confirms they have been fully processed by the subscriber. In an active standby pair replication scheme that contains subscribers, transactions remain in the transaction logs until the active master confirms that they are processed by both the standby master and any subscribers. Only then can the active master consider purging them from the log buffer and transaction log files. When the log space is exhausted, subsequent updates on the master database are aborted.

这点说的是只有当日志被所有的复制库，包括standby和只读subscriber完全处理成功后，active master才能删除日志。所以，如果复制的层级或目标很多，网络故障对系统的影响是很严重的。

Transactions are stored in the log in the form of log records. You can use bookmarks to detect which log records have or have not been replicated by a master database. A bookmark consists of log sequence numbers (LSNs) that identify the location of particular records in the transaction log that you can use to gauge replication performance. The LSNs associated with a bookmark are: hold LSN, last written LSN, and last LSN forced to disk. The hold LSN describes the location of the lowest (or oldest) record held in the log for possible transmission to a subscriber. You can compare the hold LSN with the last written LSN to determine the amount of data in the transaction log that have not yet been transmitted to the subscribers. The last LSN forced to disk describes the last records saved in a transaction log file on disk.

通过 TTREP.REPPEERS 系统表查看

从数据库的目标端(standby或subscriber)看才有意义。

这里的SENDLSNHIGH实际上为日志文件号，其包含最高的LSN而已。而SENDLSNHIGH是偏移量


即使没有数据需要传输，这个SENDLSNLOW不断增大，应该是发送的确认心跳信号。

cachedb2> select replication_name, SENDLSNHIGH, SENDLSNLOW, LATENCY, TPS from  TTREP.REPPEERS;
< _ACTIVESTANDBY                 , 3, 23111944, -1.00000000000000, -1 >
< _ACTIVESTANDBY                 , 3, 32391432, -1.00000000000000, -1 >

通过 ttLogHolds 过程查看

cachedb1> call ttLogHolds;
< 3, 31778816, Checkpoint                    , cachedb1.ds1 >
< 3, 31784960, Checkpoint                    , cachedb1.ds0 >
< 3, 31930632, Replication                   , TIMESTEN-HOL:CACHEDB2 >

cachedb2> call ttLogHolds;
< 3, 31920392, Replication                   , TIMESTEN-HOL:CACHEDB1 >
< 3, 31948800, Checkpoint                    , cachedb2.ds0 >
< 3, 31952896, Checkpoint                    , cachedb2.ds1 >

输出的第一二列分别为：

HoldLFN：transaction log file number of the hold

HoldLFO：transaction log file offset of the hold

通过 ttRepAdmin 过程查看

$ ttRepAdmin -dsn cachedb1 -bookmark
Replication hold LSN ...... 3/31697160
Last written LSN .......... 3/31764888
Last LSN forced to disk ... 3/31764480

下面的输出比较详细，也比较全面
$ ttRepAdmin -showstatus cachedb1

Replication Agent Status as of: 2016-06-30 02:59:44

DSN                         : cachedb1
Process ID                  : 8132 (Started)
Replication Agent Policy    : manual
Host                        : TIMESTEN-HOL
RepListener Port            : 59317 (AUTO)
Last write LSN              : 3.32696584
Last LSN forced to disk     : 3.32696320
Replication hold LSN        : 3.32688392

Replication Peers:
   Name                     : CACHEDB2
   Host                     : TIMESTEN-HOL
   Port                     : 42280 (AUTO) (Connected)
   Replication State        : STARTED
   Communication Protocol   : 36

TRANSMITTER thread(s):
 For                     : CACHEDB2 (track 0)
   Start/Restart count   : 1
   Send LSN              : 3.32694536
   Transactions sent     : 0
   Total packets sent    : 228
   Tick packets sent     : 218
   MIN sent packet size  : 64
   MAX sent packet size  : 155
   AVG sent packet size  : 65
   Last packet sent at   : 02:59:40
   Total Packets received: 227
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 128
   AVG rcvd packet size  : 118
   Last packet rcvd‘d at : 02:59:40
   TXNs Allocated        : 1
   TXNs In Use           : 0
   ACTs Allocated        : 0
   ACTs In Use           : 0
   ACTs Data Allocated   : 0
   Most recent errors (max 5):
     TT16025 in repagent.c (line 1227) at 02:42:55 on 06-30-2016
     TT16285 in transmitter.c (line 1020) at 02:42:55 on 06-30-2016
     TT16999 in transmitter.c (line 1340) at 02:42:55 on 06-30-2016

RECEIVER thread(s):
 For                     : CACHEDB2 (track 0)
   Start/Restart count   : 1
   Transactions received : 0
   Total packets sent    : 221
   Tick packets sent     : 0
   MIN sent packet size  : 64
   MAX sent packet size  : 120
   AVG sent packet size  : 119
   Last packet sent at   : 02:59:43
   Total Packets received: 222
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 155
   AVG rcvd packet size  : 64
   Last packet rcvd‘d at : 02:59:43
   rxWaitCTN             : 0.0
   prevCTN               : 0.0
   STA Blk Data Allocated: 0
   STA Data Allocated    : 0
   Most recent errors (max 5):
     TT16025 in repagent.c (line 1227) at 02:42:56 on 06-30-2016

从前面的 SENDLSNLOW = 31697160，和这里的hold LSN对比，发现是一致的，说明数据已经完全同步了。

不过SENDLSNLOW通常比hold LSN大，因为确认都是异步的。即使传输模式是同步，复制的确认也是异步的。

Because replication acknowledgements are asynchronous for better performance, the send LSN can also be some distance behind. Nonetheless, the send LSN for a subscriber is the most accurate value available and is always ahead of the hold LSN.

下面的显示其实没有意义，因为cachedb2没有复制下家了。

$ ttRepAdmin -dsn cachedb2 -bookmark
Replication hold LSN ...... 3/32033032
Last written LSN .......... 3/32041224
Last LSN forced to disk ... 3/32040960

通过 ttBookMark 过程查看

bookmark可以确定一个日志记录是否传输，bookmark包含三个元素:

last written LSN：已写buffer的日志记录, 本项与hold LSN的差就是未传输的日志。

last LSN forced to disk：已持久化到盘的日志记录。表示已durable commit

hold LSN: 已传输的最老的日志记录

看一下英文的说明：

Last write LSN - The location of the most recently generated transaction log record for the database

Last LSN forced - to disk The location of the most recent transaction log record written to the disk.

Replication hold LSN - The location of the lowest (or oldest) record held in the log for possible transmission to a subscriber. A value of -1/-1 indicates replication is in the stop state with respect to all subscribers

Note that the Replication hold LSN, the Last write LSN and the Last LSN forced to disk are very close, which indicates that replication is operating satisfactorily. If the Replication hold LSN falls behind the Last write LSN and the Last LSN, then replication is not keeping up with updates to the master.

You can compare the hold LSN with the last written LSN to determine the amount of data in the transaction log that have not yet been transmitted to the subscribers. The last LSN forced to disk describes the last records saved in a transaction log file on disk.

这三个元素的值越接近，表示越正常。

cachedb1> call ttBookMark();
< 3, 31758744, 3, 31758336, 3, 31697160 >

cachedb1> call ttBookMark();
< 3, 31910152, 3, 31909888, 3, 31899912 >

cachedb2> call ttBookMark();
< 3, 31957256, 3, 31956992, 3, 31920392 >

看一个异常的传输数据的例子。

通过停止standby的复制代理，然后观察复制情况。

cachedb2> call ttrepstop;
cachedb1> select * from a;
< 1, beijing >
< 2, shanghai >
cachedb1> call ttLogHolds;
< 3, 32315392, Checkpoint                    , cachedb1.ds1 >
< 3, 32823296, Checkpoint                    , cachedb1.ds0 >
< 3, 32993544, Replication                   , TIMESTEN-HOL:CACHEDB2 >

cachedb1> insert into a values(3, ‘guangzhou‘);
cachedb1> call ttLogHolds;
< 3, 32315392, Checkpoint                    , cachedb1.ds1 >
< 3, 32823296, Checkpoint                    , cachedb1.ds0 >
< 3, 32993544, Replication                   , TIMESTEN-HOL:CACHEDB2 >

$ ttRepAdmin -dsn cachedb1 -bookmark
Replication hold LSN ...... 3/32993544
Last written LSN .......... 3/33005832
Last LSN forced to disk ... 3/33005568

cachedb1> insert into a values(4, ‘nanjing‘);
[[email protected] ~]$ ttRepAdmin -dsn cachedb1 -bookmark
Replication hold LSN ...... 3/32993544
Last written LSN .......... 3/33009928
Last LSN forced to disk ... 3/33009664

可以看到，当复制无法进行时，hold LSN就保持3/32993544不变，但由于源端还不断有提交的数据，因此Last written LSN不断增大，与hold LSN的差距越来越大，表示有问题了。

使用 ttRepAdmin 显示复制状态

$ ttRepAdmin -showstatus cachedb1

Replication Agent Status as of: 2016-06-30 03:30:44

DSN                         : cachedb1
Process ID                  : 8132 (Started)
Replication Agent Policy    : manual
Host                        : TIMESTEN-HOL
RepListener Port            : 59317 (AUTO)
Last write LSN              : 3.33026312
Last LSN forced to disk     : 3.33026048
Replication hold LSN        : 3.32993544

Replication Peers: <- 复制目标
   Name                     : CACHEDB2
   Host                     : TIMESTEN-HOL
   Port                     : 38045 (AUTO) (Connected)
   Replication State        : STARTED
   Communication Protocol   : 36

TRANSMITTER thread(s): <- 在源数据库上
 For                     : CACHEDB2 (track 0)
   Start/Restart count   : 2
   Send LSN              : 3.32999688 <- 重要！
   Transactions sent     : 2
   Total packets sent    : 315 <- packet包含心跳和交易
   Tick packets sent     : 294
   MIN sent packet size  : 64
   MAX sent packet size  : 1194
   AVG sent packet size  : 69
   Last packet sent at   : 03:30:43 <- 重要！
   Total Packets received: 312 <- 目标数据库接收的
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 128
   AVG rcvd packet size  : 117
   Last packet rcvd‘d at : 03:30:43
   TXNs Allocated        : 4
   TXNs In Use           : 2
   ACTs Allocated        : 2
   ACTs In Use           : 2
   ACTs Data Allocated   : 416
   Most recent errors (max 5):
     TT16290 in transmitter.c (line 8411) at 03:05:35 on 06-30-2016
     TT16999 in repagent.c (line 1276) at 03:05:35 on 06-30-2016
     TT16025 in repagent.c (line 1227) at 03:05:38 on 06-30-2016
     TT16285 in transmitter.c (line 1020) at 03:05:38 on 06-30-2016
     TT16999 in transmitter.c (line 1340) at 03:05:38 on 06-30-2016

RECEIVER thread(s): <- 在目标数据库上
 For                     : CACHEDB2 (track 0)
   Start/Restart count   : 1
   Transactions received : 0
   Total packets sent    : 7
   Tick packets sent     : 0
   MIN sent packet size  : 64
   MAX sent packet size  : 120
   AVG sent packet size  : 98
   Last packet sent at   : 03:30:44
   Total Packets received: 8 <- 源数据库接收到的确认，因此此数与TRANSMITTER thread的数不一致，因为后者包含心跳包
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 155
   AVG rcvd packet size  : 77
   Last packet rcvd‘d at : 03:30:44
   rxWaitCTN             : 0.0
   prevCTN               : 0.0
   STA Blk Data Allocated: 0
   STA Data Allocated    : 0
   Most recent errors (max 5):

$ ttRepAdmin -showstatus cachedb2

Replication Agent Status as of: 2016-06-30 03:32:55

DSN                         : cachedb2
Process ID                  : 8626 (Started)
Replication Agent Policy    : manual
Host                        : TIMESTEN-HOL
RepListener Port            : 38045 (AUTO)
Last write LSN              : 3.33278216
Last LSN forced to disk     : 3.33277952
Replication hold LSN        : 3.33274120

Replication Peers:
   Name                     : CACHEDB1
   Host                     : TIMESTEN-HOL
   Port                     : 59317 (AUTO) (Connected)
   Replication State        : STARTED
   Communication Protocol   : 36

TRANSMITTER thread(s): <- 在源数据库
 For                     : CACHEDB1 (track 0)
   Start/Restart count   : 1
   Send LSN              : 3.33274120
   Transactions sent     : 0
   Total packets sent    : 35
   Tick packets sent     : 31
   MIN sent packet size  : 64
   MAX sent packet size  : 155
   AVG sent packet size  : 67
   Last packet sent at   : 03:32:50
   Total Packets received: 34
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 120
   AVG rcvd packet size  : 115
   Last packet rcvd‘d at : 03:32:50
   TXNs Allocated        : 1
   TXNs In Use           : 0
   ACTs Allocated        : 0
   ACTs In Use           : 0
   ACTs Data Allocated   : 0
   Most recent errors (max 5):
     TT16025 in repagent.c (line 1227) at 03:30:41 on 06-30-2016
     TT16285 in transmitter.c (line 1020) at 03:30:41 on 06-30-2016
     TT16999 in transmitter.c (line 1340) at 03:30:41 on 06-30-2016

RECEIVER thread(s):
 For                     : CACHEDB1 (track 0)
   Start/Restart count   : 1
   Transactions received : 2
   Total packets sent    : 39
   Tick packets sent     : 0
   MIN sent packet size  : 64
   MAX sent packet size  : 128
   AVG sent packet size  : 110
   Last packet sent at   : 03:32:53
   Total Packets received: 47
   MIN rcvd packet size  : 64
   MAX rcvd packet size  : 298
   AVG rcvd packet size  : 83
   Last packet rcvd‘d at : 03:32:53
   rxWaitCTN             : 0.0
   prevCTN               : 0.0
   STA Blk Data Allocated: 64
   STA Data Allocated    : 8192
   Most recent errors (max 5):
     TT16025 in repagent.c (line 1227) at 03:30:43 on 06-30-2016

检查 return service 的状态

查询 return service是否被禁???

cachedb1> CALL ttRepSyncSubscriberStatus (‘cachedb2‘);
< 0 >
cachedb2> call ttrepstop;
cachedb1> CALL ttRepSyncSubscriberStatus (‘cachedb2‘);
< 0 >

cachedb1> call ttrepstop;
cachedb1> alter active standby pair alter store cachedb2 set RETURN SERVICES OFF WHEN REPLICATION STOPPED;
cachedb1> call ttrepstart;
cachedb1> CALL ttRepSyncSubscriberStatus (‘cachedb2‘);
< 0 >

$ ttRepAdmin -receiver -name cachedb2 -state stop cachedb1
Cannot set a receiver state to STOP in an Active Standby scheme

CALL ttRepSubscriberStateSet( , , , , 1 );
cachedb1> CALL ttRepSubscriberStateSet( , , , , 2 );
17037: The receiver state in an ACTIVE STANDBY scheme cannot be set to STOP

p65 rep guide

Setting the transaction log failure threshold 一节

0表示没有被禁，即使复制代理停止，return service也不一定被禁

检查最近一次的return service的返回状态

先必须通过ttRepXactTokenGet得到token, 然后将token带入到

由于之前是no return

cachedb1> call ttRepXactTokenGet(‘RR’); <- RR表示return receipt, RT表示return twosafe

8187: A Return receipt transaction has not been executed on this connection

不得已重建

cachedb1> create active standby pair cachedb1, cachedb2 return receipt;

cachedb1> call ttrepstart;

cachedb1> call ttrepstateget;

< IDLE, NO GRID >

cachedb1> call ttrepstateset(‘active’);

ttRepAdmin?duplicate?fromcachedb1?hosttimesten?hol?uidrepadmin?pwdtimestencachedb2(reverse?i?search)‘tt′:ttRepAdmin?duplicate?fromcachedb1?hosttimesten?hol?uidrepadmin?pwdtimestencachedb2[oracle@timesten?holinfo] ttisql -v1 -e “set prompt ‘cachedb2> ‘” “dsn=cachedb2;uid=tthr;pwd=timesten;oraclepwd=oracle”

cachedb2> call ttrepstart;

12026: The agent is already running for the data store.

cachedb2> call ttrepstateget;

< STANDBY, NO GRID >

cachedb1> insert into a values(1, ‘beijing’);

cachedb1> call ttRepXactTokenGet(‘RR’);

< 7EEAF21F7BC2E405D312D1B96D877A88C80100000100000000000000000000009CE97457000000000A0B0000000000009CE97457000000000A0B00000000000080467447436D6CCA00000000000000000000000000000000 >

好怪异，不会用

cachedb1> call ttRepXactStatus;

< [email protected] , AP, >

返回的状态说明：

‘NS’ - Transaction not sent to the subscriber.

‘RC’ - Transaction received by the subscriber agent.

‘CT’ - Transaction applied at the subscriber store. (Does not convey whether the transaction ran into an error when being applied.)

‘AP’ - Transaction has been durably applied on the subscriber.

分析日志中未传输的交易

异常时：

cachedb2> call ttrepstop;
cachedb1> insert into a values(2, ‘shanghai‘);
Warning  8170: Receipt or commit acknowledgement not returned in the specified timeout interval for XID:1.234

记住此XID:1.234

$ ttXactLog -v1 -logAnalyze cachedb1
Summary:
Total transactions left to replicate: 1
Total rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Start LSN = 4.3856648
End   LSN = 4.3868936

[[email protected] ~]$ ttXactLog -v2 -logAnalyze cachedb1
Track analysis for track number: 0
Transactions left to replicate: 1
Rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Summary:
Total transactions left to replicate: 1
Total rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Start LSN = 4.3856648
End   LSN = 4.3868936

[[email protected] ~]$ ttXactLog -v3 -logAnalyze cachedb1
Transaction id: 1.234
Track for this xid: 0
Logmarker before this xid: 3003
Rows left to replicate: 1
Transaction size: 520.00 B
Size of rows left: 166.00 B
Total inserts remaining: 1

Track analysis for track number: 0
Transactions left to replicate: 1
Rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Summary:
Total transactions left to replicate: 1
Total rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Start LSN = 4.3856648
End   LSN = 4.3868936

$ ttXactLog -logAnalyze -xid 1.234 cachedb1
$ ttXactLog -logAnalyze -xid 1.234 cachedb1
Summary:
Total transactions left to replicate: 1
Total rows left to replicate: 1
Size of transactions left to replicate: 520.00 B
Size of rows left to replicate: 166.00 B
Total inserts remaining: 1

Start LSN = 4.3856648
End   LSN = 4.3868936

正常时：

cachedb2> call ttrepstart;

$ ttXactLog -logAnalyze -xid 1.234 cachedb1
Summary:
Total transactions left to replicate: 0
Total rows left to replicate: 0
Size of transactions left to replicate: 0.00 B
Size of rows left to replicate: 0.00 B

Start LSN = 4.3877128
End   LSN = 4.3881224

$ ttXactLog -logAnalyze -xid 1.234 cachedb1
Summary:
Total transactions left to replicate: 0
Total rows left to replicate: 0
Size of transactions left to replicate: 0.00 B
Size of rows left to replicate: 0.00 B

Start LSN = 4.3885600
End   LSN = 4.3889416

[[email protected] ~]$ ttXactLog -v3 -logAnalyze cachedb1
Track analysis for track number: 0
Transactions left to replicate: 0
Rows left to replicate: 0
Size of transactions left to replicate: 0.00 B
Size of rows left to replicate: 0.00 B

Summary:
Total transactions left to replicate: 0
Total rows left to replicate: 0
Size of transactions left to replicate: 0.00 B
Size of rows left to replicate: 0.00 B

Start LSN = 4.3920136
End   LSN = 4.3922184

时间： 2024-10-18 03:54:14

TimesTen 数据库复制学习：15. 监控复制系统