怎么对Microsoft (Office) Word Document 2007 索引化?
来源于:How To Index a Microsoft (Office) Word Document 2007 ? (文档 ID 752710.1)
适用于:
Oracle Text - Version: 11.1.0.7 to 11.2.0.3 - Release: 11.1 to 11.2
Information in this document applies to any platform.
目标
本文解释了对一个表中 含有 Microsoft Word 2007 document (new Microsoft formatting,DOCX格式)的 blob 列进行索引化的方法。
从Oracle Database 11.1.0.7开始,Oracle Text使用Oracle Outside In HTML Export技术(额外注:Oracle Outside In HTML Export技术来源于Oracle 公司的如下产品线:Middleware > Content Management > Oracle Outside In Technology > )进行文档过滤,该技术替代了Autonomy Inc公司授权给Oracle公司的filtering technology。
因此,这将会允许从Oracle Database 11.1.0.7+开始来对Microsoft (Office) Word 2007 documents进行索引化。
Kindly refer to the Appendix B of Oracle Text Reference for a complete list of filter-supported document formats in 11.1.0.7.
Oracle Text Reference 11g Release 1 (11.1)
Part Number B28304-03
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/afilsupt.htm#i634493
B.2 Supported Document Formats
解决方案:
请按照下面的步骤来完成对 Microsoft Word 2007 document的搜索
Step 1 - Within the /tmp directory place all the files to be used from this note.
docx1.sql docx2.sql test.txt test.docx
--如上4个文档已经上传到csdn资源中,地址如下:
http://download.csdn.net/download/msdnchina/9480052
Step 2 - Create the necessary schema and privileges
connect system/manager or as any privileged user...
create user testdocx identified by testdocx; grant connect, resource, create any directory to testdocx; connect testdocx/testdocx
Step 3 - Create the necessary objects (refer to the docx1.sql script)...
SQL> @/tmp/docx1.sql
Step 4 - Check a couple of terms inside the documents (refer to the docx2.sql script) ...
SQL> @/tmp/docx2.sql