原始xml文本如下
1 <?xml version="1.0" encoding="utf-8"?> 2 <Message> 3 <Header> 4 <Version>2000000</Version> 5 <MessageClass>5</MessageClass> 6 <MessageType>7</MessageType> 7 <SenderId>9999999964020001</SenderId> 8 <ReceiverId>9999999964011001</ReceiverId> 9 <MessageId>3280260</MessageId> 10 </Header> 11 <Body ContentType="1"> 12 <ClearTargetDate>2017-03-22</ClearTargetDate> 13 <ServiceProviderId>9999999934030001</ServiceProviderId> 14 <IssuerId>9999999964011001</IssuerId> 15 <MessageId>406843026</MessageId> 16 <Count>1</Count> 17 <Amount>110.00</Amount> 18 <Transaction> 19 <TransId>1</TransId> 20 <Time>2017-03-21T20:40:36</Time> 21 <Fee>110.00</Fee> 22 <Service> 23 <ServiceType>1</ServiceType> 24 <Description>曹庄|宿州</Description> 25 <Detail>1|04|3401|804|33|20170321 204036|03|3401|1105|1|20170321 182056</Detail> 26 </Service> 27 <ICCard> 28 <CardType>22</CardType> 29 <NetNo>6401</NetNo> 30 <CardId>1638220100098530</CardId> 31 <License>宁B63222</License> 32 <TransNo>104</TransNo> 33 <PreBalance>2157.60</PreBalance> 34 <PostBalance>2047.60</PostBalance> 35 </ICCard> 36 <Validation> 37 <TAC>9439DAD2</TAC> 38 <TransType>09</TransType> 39 <TerminalNo>0134000030BC</TerminalNo> 40 <TerminalTransNo>0018002D</TerminalTransNo> 41 </Validation> 42 <OBU> 43 <NetNo>C4FE</NetNo> 44 <OBUId>0000000200031918</OBUId> 45 <OBEState>0001</OBEState> 46 <License>宁B63222</License> 47 </OBU> 48 </Transaction> 49 </Body> 50 </Message>
现在需要将上述内容Transaction标签中的值转换为下面的分隔符格式
1|||2017-03-21T20:40:36|||110.00|||1|||曹庄|宿州|||1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||22|||6401|||1638220100098530|||宁B63222|||104|||2157.60|||2047.60||||||9439DAD2|||09|||0134000030BC|||0018002D||||||C4FE|||0000000200031918|||0001|||宁B63222|||
下面是我执行的操作步骤
1、替换换行符,将整个xml文件处理成一行文本,重定向到文本1中
cat ***.xml | tr "\n" " " > 1
结果如下
<?xml version="1.0" encoding="utf-8"?><Message> <Header> <Version>2000000</Version> <MessageClass>5</MessageClass> <MessageType>7</MessageType> <SenderId>9999999964020001</SenderId> <ReceiverId>9999999964011001</ReceiverId> <MessageId>3280260</MessageId> </Header> <Body ContentType="1"> <ClearTargetDate>2017-03-22</ClearTargetDate> <ServiceProviderId>9999999934030001</ServiceProviderId> <IssuerId>9999999964011001</IssuerId> <MessageId>406843026</MessageId> <Count>1</Count> <Amount>110.00</Amount> <Transaction> <TransId>1</TransId> <Time>2017-03-21T20:40:36</Time> <Fee>110.00</Fee> <Service> <ServiceType>1</ServiceType> <Description>曹庄|宿州</Description> <Detail>1|04|3401|804|33|20170321 204036|03|3401|1105|1|20170321182056</Detail> </Service> <ICCard> <CardType>22</CardType> <NetNo>6401</NetNo> <CardId>1638220100098530</CardId> <License>宁B63222</License> <TransNo>104</TransNo> <PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance> </ICCard> <Validation> <TAC>9439DAD2</TAC> <TransType>09</TransType> <TerminalNo>0134000030BC</TerminalNo> <TerminalTransNo>0018002D</TerminalTransNo> </Validation> <OBU> <NetNo>C4FE</NetNo> <OBUId>0000000200031918</OBUId> <OBEState>0001</OBEState> <License>宁B63222</License> </OBU> </Transaction> </Body> </Message>
2、去除空格
sed ‘s/ //g‘ 1 > 2
结果如下
<?xml version="1.0" encoding="utf-8"?><Message><Header><Version>2000000</Version><MessageClass>5</MessageClass><MessageType>7</MessageType><SenderId>9999999964020001</SenderId><ReceiverId>9999999964011001</ReceiverId><MessageId>3280260</MessageId></Header><BodyContentType="1"><ClearTargetDate>2017-03-22</ClearTargetDate><ServiceProviderId>9999999934030001</ServiceProviderId><IssuerId>9999999964011001</IssuerId><MessageId>406843026</MessageId><Count>1</Count><Amount>110.00</Amount><Transaction><TransId>1</TransId><Time>2017-03-21T20:40:36</Time><Fee>110.00</Fee><Service><ServiceType>1</ServiceType><Description>曹庄|宿州</Description><Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056</Detail></Service><ICCard><CardType>22</CardType><NetNo>6401</NetNo><CardId>1638220100098530</CardId><License>宁B63222</License><TransNo>104</TransNo><PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance></ICCard><Validation><TAC>9439DAD2</TAC><TransType>09</TransType><TerminalNo>0134000030BC</TerminalNo><TerminalTransNo>0018002D</TerminalTransNo></Validation><OBU><NetNo>C4FE</NetNo><OBUId>0000000200031918</OBUId><OBEState>0001</OBEState><License>宁B63222</License></OBU></Transaction></Body></Message>
3、去除无用的头部和尾部xml,只保留Transaction标签中的内容
sed ‘s/.*<Transaction>//g;s/<\/OBU>.*<\/Message>//g‘ 2 > 3
结果如下
<TransId>1</TransId><Time>2017-03-21T20:40:36</Time><Fee>110.00</Fee><Service><ServiceType>1</ServiceType><Description>曹庄|宿州</Description><Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056</Detail></Service><ICCard><CardType>22</CardType><NetNo>6401</NetNo><CardId>1638220100098530</CardId><License>宁B63222</License><TransNo>104</TransNo><PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance></ICCard><Validation><TAC>9439DAD2</TAC><TransType>09</TransType><TerminalNo>0134000030BC</TerminalNo><TerminalTransNo>0018002D</TerminalTransNo></Validation><OBU><NetNo>C4FE</NetNo><OBUId>0000000200031918</OBUId><OBEState>0001</OBEState><License>宁B63222</License>
4、将闭合标签</***>替换为|||
sed ‘s/<\/[^>]*>/|||/g‘ 3 > 4
结果如下
<TransId>1|||<Time>2017-03-21T20:40:36|||<Fee>110.00|||<Service><ServiceType>1|||<Description>曹庄|宿州|||<Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||<ICCard><CardType>22|||<NetNo>6401|||<CardId>1638220100098530|||<License>宁B63222|||<TransNo>104|||<PreBalance>2157.60|||<PostBalance>2047.60||||||<Validation><TAC>9439DAD2|||<TransType>09|||<TerminalNo>0134000030BC|||<TerminalTransNo>0018002D||||||<OBU><NetNo>C4FE|||<OBUId>0000000200031918|||<OBEState>0001|||<License>宁B63222|||
5、将开始标签<***>去除
sed ‘s/<[^>]*>//g‘ 4 > 5
结果如下
1|||2017-03-21T20:40:36|||110.00|||1|||曹庄|宿州|||1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||22|||6401|||1638220100098530|||宁B63222|||104|||2157.60|||2047.60||||||9439DAD2|||09|||0134000030BC|||0018002D||||||C4FE|||0000000200031918|||0001|||宁B63222|||
到此大功告成
将所有标签整理在一起
cat ***.xml | tr "\n" " " > 1
sed ‘s/ //g;s/.*<Transaction>//g;s/<\/OBU>.*<\/Message>//g;s/<\/[^>]*>/|||/g;s/<[^>]*>//g‘ 1 > 2