Android中解析读取复杂word,excel,ppt等的方法

前段时间在尝试做一个Android里的万能播放器,能播放各种格式的软件,其中就涉及到了最常用的office软件。查阅了下资料,发现Android中最传统的直接解析读取word,excel的方法主要用了java里第三方包,比如利用tm-extractors-0.4.jar和jxl.jar等,下面附上代码和效果图。

读取word用了tm-extractors-0.4.jar包,代码如下:

package com.example.readword;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;

import org.textmining.text.extraction.WordExtractor;

import android.app.Activity;
import android.os.Bundle;
import android.os.Environment;
import android.widget.TextView;

public class MainActivity extends Activity {
    /** Called when the activity is first created. */

    private TextView text;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        text = (TextView) findViewById(R.id.text);

        String str = readWord("/storage/emulated/0/ArcGIS/localtilelayer/11.doc");
        text.setText(str.trim().replace("", ""));
    }

    public String readWord(String file){
        //创建输入流用来读取doc文件
        FileInputStream in;
        String text = null;
        try {
            in = new FileInputStream(new File(file));
            WordExtractor extractor = null;
            //创建WordExtractor
            extractor = new WordExtractor();
            //进行提取对doc文件
            text = extractor.extractText(in);
        }
        catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
        return text;
    }
}

效果图如下:

只是从网上随便下载的一个文档,我们可以看出,虽然能读取,但是格式的效果并不佳,而且只能读取doc,不能读取docx格式,也不能读取doc里的图片。另外就是加入使用WPF打开过这个doc的话,将无法再次读取(对于只安装WPF的我简直是个灾难)

然后是用jxl读取excel的代码,这个代码不是很齐,就写了个解析的,将excel里每行每列都解析了出来,然后自己可以重新再编辑,代码如下:

package com.readexl;

import java.io.FileInputStream;
import java.io.InputStream;

import android.os.Bundle;
import android.os.Environment;
import android.app.Activity;
import android.text.method.ScrollingMovementMethod;
import android.view.Menu;
import android.widget.TextView;

import jxl.*;

public class MainActivity extends Activity {
    TextView txt = null;
    public String filePath_xls = Environment.getExternalStorageDirectory()
            + "/case.xls";

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        txt = (TextView)findViewById(R.id.txt_show);
        txt.setMovementMethod(ScrollingMovementMethod.getInstance());
        readExcel();
    }

    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.main, menu);
        return true;
    }

    public void readExcel() {
        try {
            /**
             * 后续考虑问题,比如Excel里面的图片以及其他数据类型的读取
             **/

            InputStream is = new FileInputStream(filePath_xls);
            //Workbook book = Workbook.getWorkbook(new File("mnt/sdcard/test.xls"));
            Workbook book = Workbook.getWorkbook(is);

            int num = book.getNumberOfSheets();
            txt.setText("the num of sheets is " + num+ "\n");
            // 获得第一个工作表对象
            Sheet sheet = book.getSheet(0);
            int Rows = sheet.getRows();
            int Cols = sheet.getColumns();
            txt.append("the name of sheet is " + sheet.getName() + "\n");
            txt.append("total rows is " + Rows + "\n");
            txt.append("total cols is " + Cols + "\n");
            for (int i = 0; i < Cols; ++i) {
                for (int j = 0; j < Rows; ++j) {
                    // getCell(Col,Row)获得单元格的值
                    txt.append("contents:" + sheet.getCell(i,j).getContents() + "\n");
                }
            }
            book.close();
        } catch (Exception e) {
            System.out.println(e);
        }
    }

}

效果图如下:

好吧,这只是个半成品,不过,这个方法肯定是行得通的。

之前说了这么多,很明白的意思就是我对于这两种方法都不是很满意。在这里,我先说下doc和docx的区别(xls和xlsx,ppt和pptx等区别都和此类似)

众所周知的是doc是03及之前版本word所保存的格式,docx是07版本之后保存的格式,简单的说,在doc中,微软还是用二进制存储方式;在docx中微软开始用xml方式,docx实际上成了一个打包的ZIP压缩文件。doc解压得到的是没有扩展名的文件碎片,而docx解压可以得到一个XML和几个包含信息的文件夹。两者比较的结论就是docx更小,而且要读取图片更容易。(参考http://www.zhihu.com/question/21547795)

好吧,回到正题。如何才能解析各种word,excel等能保留原来格式并且解析里面的图片,表格或附件等内容呢。那当然就是html了!不得不承认html对于页面,表格等展示的效果确是是很强大的,原生很难写出这样的效果。在网上找了诸多的资料,以及各个大神的代码,自己又再此基础上修改了下,实现的效果还不错吧。

利用的包是POI(一堆很强大的包,可以解析几乎所有的office软件,这里以doc,docx,xls,xlsx为例)

读取文件后根据不同文件类型分别进行操作。

public void read() {
    if(!myFile.exists()){
        if (this.nameStr.endsWith(".doc")) {
            this.getRange();
            this.makeFile();
            this.readDOC();
        }
        if (this.nameStr.endsWith(".docx")) {
            this.makeFile();
            this.readDOCX();
        }
        if (this.nameStr.endsWith(".xls")) {
            try {
                this.makeFile();
                this.readXLS();
            } catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        if (this.nameStr.endsWith(".xlsx")) {
            try{
                this.makeFile();
                this.readXLSX();
            }catch (Exception e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }
    returnPath = "file:///" + myFile;
    // this.view.loadUrl("file:///" + this.htmlPath);
    System.out.println("htmlPath" + this.htmlPath);

}

先贴上公用的方法,主要是设置生成的html文件保存地址:

public void makeFile() {
    String sdStateString = android.os.Environment.getExternalStorageState();// 获取外部存储状态
    if (sdStateString.equals(android.os.Environment.MEDIA_MOUNTED)) {// 确认sd卡存在,原理不知,媒体安装??
        try {
            File sdFile = android.os.Environment
                    .getExternalStorageDirectory();// 获取扩展设备的文件目录
            String path = sdFile.getAbsolutePath() + File.separator
                    + "library";// 得到sd卡(扩展设备)的绝对路径+"/"+xiao
            File dirFile = new File(path);// 获取xiao文件夹地址
            if (!dirFile.exists()) {// 如果不存在
                dirFile.mkdir();// 创建目录
            }
            File myFile = new File(path + File.separator +filename+ ".html");// 获取my.html的地址
            if (!myFile.exists()) {// 如果不存在
                myFile.createNewFile();// 创建文件
            }
            this.htmlPath = myFile.getAbsolutePath();// 返回路径
        } catch (Exception e) {
        }
    }
}

然后是读取doc:

private void getRange() {
    FileInputStream in = null;
    POIFSFileSystem pfs = null;

    try {
        in = new FileInputStream(nameStr);
        pfs = new POIFSFileSystem(in);
        hwpf = new HWPFDocument(pfs);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    range = hwpf.getRange();

    pictures = hwpf.getPicturesTable().getAllPictures();

    tableIterator = new TableIterator(range);

}
public void readDOC() {

    try {
        myFile = new File(htmlPath);
        output = new FileOutputStream(myFile);
        presentPicture=0;
        String head = "<html><meta charset=\"utf-8\"><body>";
        String tagBegin = "<p>";
        String tagEnd = "</p>";
        output.write(head.getBytes());
        int numParagraphs = range.numParagraphs();// 得到页面所有的段落数
        for (int i = 0; i < numParagraphs; i++) { // 遍历段落数
            Paragraph p = range.getParagraph(i); // 得到文档中的每一个段落
            if (p.isInTable()) {
                int temp = i;
                if (tableIterator.hasNext()) {
                    String tableBegin = "<table style=\"border-collapse:collapse\" border=1 bordercolor=\"black\">";
                    String tableEnd = "</table>";
                    String rowBegin = "<tr>";
                    String rowEnd = "</tr>";
                    String colBegin = "<td>";
                    String colEnd = "</td>";
                    Table table = tableIterator.next();
                    output.write(tableBegin.getBytes());
                    int rows = table.numRows();
                    for (int r = 0; r < rows; r++) {
                        output.write(rowBegin.getBytes());
                        TableRow row = table.getRow(r);
                        int cols = row.numCells();
                        int rowNumParagraphs = row.numParagraphs();
                        int colsNumParagraphs = 0;
                        for (int c = 0; c < cols; c++) {
                            output.write(colBegin.getBytes());
                            TableCell cell = row.getCell(c);
                            int max = temp + cell.numParagraphs();
                            colsNumParagraphs = colsNumParagraphs
                                    + cell.numParagraphs();
                            for (int cp = temp; cp < max; cp++) {
                                Paragraph p1 = range.getParagraph(cp);
                                output.write(tagBegin.getBytes());
                                writeParagraphContent(p1);
                                output.write(tagEnd.getBytes());
                                temp++;
                            }
                            output.write(colEnd.getBytes());
                        }
                        int max1 = temp + rowNumParagraphs;
                        for (int m = temp + colsNumParagraphs; m < max1; m++) {
                            temp++;
                        }
                        output.write(rowEnd.getBytes());
                    }
                    output.write(tableEnd.getBytes());
                }
                i = temp;
            } else {
                output.write(tagBegin.getBytes());
                writeParagraphContent(p);
                output.write(tagEnd.getBytes());
            }
        }
        String end = "</body></html>";
        output.write(end.getBytes());
        output.close();
    } catch (Exception e) {

        System.out.println("readAndWrite Exception:"+e.getMessage());
        e.printStackTrace();
    }
}

读取docx

public void readDOCX() {
    String river = "";
    try {
        this.myFile = new File(this.htmlPath);// new一个File,路径为html文件
        this.output = new FileOutputStream(this.myFile);// new一个流,目标为html文件
        presentPicture=0;
        String head = "<!DOCTYPE><html><meta charset=\"utf-8\"><body>";// 定义头文件,我在这里加了utf-8,不然会出现乱码
        String end = "</body></html>";
        String tagBegin = "<p>";// 段落开始,标记开始?
        String tagEnd = "</p>";// 段落结束
        String tableBegin = "<table style=\"border-collapse:collapse\" border=1 bordercolor=\"black\">";
        String tableEnd = "</table>";
        String rowBegin = "<tr>";
        String rowEnd = "</tr>";
        String colBegin = "<td>";
        String colEnd = "</td>";
        String style = "style=\"";
        this.output.write(head.getBytes());// 写如头部
        ZipFile xlsxFile = new ZipFile(new File(this.nameStr));
        ZipEntry sharedStringXML = xlsxFile.getEntry("word/document.xml");
        InputStream inputStream = xlsxFile.getInputStream(sharedStringXML);
        XmlPullParser xmlParser = Xml.newPullParser();
        xmlParser.setInput(inputStream, "utf-8");
        int evtType = xmlParser.getEventType();
        boolean isTable = false; // 是表格 用来统计 列 行 数
        boolean isSize = false; // 大小状态
        boolean isColor = false; // 颜色状态
        boolean isCenter = false; // 居中状态
        boolean isRight = false; // 居右状态
        boolean isItalic = false; // 是斜体
        boolean isUnderline = false; // 是下划线
        boolean isBold = false; // 加粗
        boolean isR = false; // 在那个r中
        boolean isStyle = false;
        int pictureIndex = 1; // docx 压缩包中的图片名 iamge1 开始 所以索引从1开始
        while (evtType != XmlPullParser.END_DOCUMENT) {
            switch (evtType) {

                // 开始标签
                case XmlPullParser.START_TAG:
                    String tag = xmlParser.getName();

                    if (tag.equalsIgnoreCase("r")) {
                        isR = true;
                    }
                    if (tag.equalsIgnoreCase("u")) { // 判断下划线
                        isUnderline = true;
                    }
                    if (tag.equalsIgnoreCase("jc")) { // 判断对齐方式
                        String align = xmlParser.getAttributeValue(0);
                        if (align.equals("center")) {
                            this.output.write("<center>".getBytes());
                            isCenter = true;
                        }
                        if (align.equals("right")) {
                            this.output.write("<div align=\"right\">"
                                    .getBytes());
                            isRight = true;
                        }
                    }

                    if (tag.equalsIgnoreCase("color")) { // 判断颜色

                        String color = xmlParser.getAttributeValue(0);

                        this.output
                                .write(("<span style=\"color:" + color + ";\">")
                                        .getBytes());
                        isColor = true;
                    }
                    if (tag.equalsIgnoreCase("sz")) { // 判断大小
                        if (isR == true) {
                            int size = decideSize(Integer.valueOf(xmlParser
                                    .getAttributeValue(0)));
                            this.output.write(("<font size=" + size + ">")
                                    .getBytes());
                            isSize = true;
                        }
                    }
                    // 下面是表格处理
                    if (tag.equalsIgnoreCase("tbl")) { // 检测到tbl 表格开始
                        this.output.write(tableBegin.getBytes());
                        isTable = true;
                    }
                    if (tag.equalsIgnoreCase("tr")) { // 行
                        this.output.write(rowBegin.getBytes());
                    }
                    if (tag.equalsIgnoreCase("tc")) { // 列
                        this.output.write(colBegin.getBytes());
                    }

                    if (tag.equalsIgnoreCase("pic")) { // 检测到标签 pic 图片
                        String entryName_jpeg = "word/media/image"
                                + pictureIndex + ".jpeg";
                        String entryName_png = "word/media/image"
                                + pictureIndex + ".png";
                        String entryName_gif = "word/media/image"
                                + pictureIndex + ".gif";
                        String entryName_wmf = "word/media/image"
                                + pictureIndex + ".wmf";
                        ZipEntry sharePicture = null;
                        InputStream pictIS = null;
                        sharePicture = xlsxFile.getEntry(entryName_jpeg);
                        // 一下为读取docx的图片 转化为流数组
                        if (sharePicture == null) {
                            sharePicture = xlsxFile.getEntry(entryName_png);
                        }
                        if(sharePicture == null){
                            sharePicture = xlsxFile.getEntry(entryName_gif);
                        }
                        if(sharePicture == null){
                            sharePicture = xlsxFile.getEntry(entryName_wmf);
                        }

                        if(sharePicture != null){
                            pictIS = xlsxFile.getInputStream(sharePicture);
                            ByteArrayOutputStream pOut = new ByteArrayOutputStream();
                            byte[] bt = null;
                            byte[] b = new byte[1000];
                            int len = 0;
                            while ((len = pictIS.read(b)) != -1) {
                                pOut.write(b, 0, len);
                            }
                            pictIS.close();
                            pOut.close();
                            bt = pOut.toByteArray();
                            Log.i("byteArray", "" + bt);
                            if (pictIS != null)
                                pictIS.close();
                            if (pOut != null)
                                pOut.close();
                            writeDOCXPicture(bt);
                        }

                        pictureIndex++; // 转换一张后 索引+1
                    }

                    if (tag.equalsIgnoreCase("b")) { // 检测到加粗标签
                        isBold = true;
                    }
                    if (tag.equalsIgnoreCase("p")) {// 检测到 p 标签
                        if (isTable == false) { // 如果在表格中 就无视
                            this.output.write(tagBegin.getBytes());
                        }
                    }
                    if (tag.equalsIgnoreCase("i")) { // 斜体
                        isItalic = true;
                    }
                    // 检测到值 标签
                    if (tag.equalsIgnoreCase("t")) {
                        if (isBold == true) { // 加粗
                            this.output.write("<b>".getBytes());
                        }
                        if (isUnderline == true) { // 检测到下划线标签,输入<u>
                            this.output.write("<u>".getBytes());
                        }
                        if (isItalic == true) { // 检测到斜体标签,输入<i>
                            output.write("<i>".getBytes());
                        }
                        river = xmlParser.nextText();
                        this.output.write(river.getBytes()); // 写入数值
                        if (isItalic == true) { // 检测到斜体标签,在输入值之后,输入</i>,并且斜体状态=false
                            this.output.write("</i>".getBytes());
                            isItalic = false;
                        }
                        if (isUnderline == true) {// 检测到下划线标签,在输入值之后,输入</u>,并且下划线状态=false
                            this.output.write("</u>".getBytes());
                            isUnderline = false;
                        }
                        if (isBold == true) { // 加粗
                            this.output.write("</b>".getBytes());
                            isBold = false;
                        }
                        if (isSize == true) { // 检测到大小设置,输入结束标签
                            this.output.write("</font>".getBytes());
                            isSize = false;
                        }
                        if (isColor == true) { // 检测到颜色设置存在,输入结束标签
                            this.output.write("</span>".getBytes());
                            isColor = false;
                        }
                        if (isCenter == true) { // 检测到居中,输入结束标签
                            this.output.write("</center>".getBytes());
                            isCenter = false;
                        }
                        if (isRight == true) { // 居右不能使用<right></right>,使用div可能会有状况,先用着
                            this.output.write("</div>".getBytes());
                            isRight = false;
                        }
                    }
                    break;
                // 结束标签
                case XmlPullParser.END_TAG:
                    String tag2 = xmlParser.getName();
                    if (tag2.equalsIgnoreCase("tbl")) { // 检测到表格结束,更改表格状态
                        this.output.write(tableEnd.getBytes());
                        isTable = false;
                    }
                    if (tag2.equalsIgnoreCase("tr")) { // 行结束
                        this.output.write(rowEnd.getBytes());
                    }
                    if (tag2.equalsIgnoreCase("tc")) { // 列结束
                        this.output.write(colEnd.getBytes());
                    }
                    if (tag2.equalsIgnoreCase("p")) { // p结束,如果在表格中就无视
                        if (isTable == false) {
                            this.output.write(tagEnd.getBytes());
                        }
                    }
                    if (tag2.equalsIgnoreCase("r")) {
                        isR = false;
                    }
                    break;
                default:
                    break;
            }
            evtType = xmlParser.next();
        }
        this.output.write(end.getBytes());
    } catch (ZipException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (XmlPullParserException e) {
        e.printStackTrace();
    }
    if (river == null) {
        river = "解析文件出现问题";
    }
}

读取xls:

public StringBuffer readXLS() throws Exception {

        myFile = new File(htmlPath);
        output = new FileOutputStream(myFile);
        lsb.append("<html xmlns:o=‘urn:schemas-microsoft-com:office:office‘ xmlns:x=‘urn:schemas-microsoft-com:office:excel‘ xmlns=‘http://www.w3.org/TR/REC-html40‘>");
        lsb.append("<head><meta http-equiv=Content-Type content=‘text/html; charset=utf-8‘><meta name=ProgId content=Excel.Sheet>");
        HSSFSheet sheet = null;

        String excelFileName = nameStr;
        try {
        HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(
        excelFileName)); // 获整个Excel

        for (int sheetIndex = 0; sheetIndex < workbook.getNumberOfSheets(); sheetIndex++) {
        sheet = workbook.getSheetAt(sheetIndex);// 获所有的sheet
        String sheetName = workbook.getSheetName(sheetIndex); // sheetName
        if (workbook.getSheetAt(sheetIndex) != null) {
        sheet = workbook.getSheetAt(sheetIndex);// 获得不为空的这个sheet
        if (sheet != null) {
        int firstRowNum = sheet.getFirstRowNum(); // 第一行
        int lastRowNum = sheet.getLastRowNum(); // 最后一行
        lsb.append("<table width=\"100%\" style=\"border:1px solid #000;border-width:1px 0 0 1px;margin:2px 0 2px 0;border-collapse:collapse;\">");

        for (int rowNum = firstRowNum; rowNum <= lastRowNum; rowNum++) {
        if (sheet.getRow(rowNum) != null) {// 如果行不为空,
        HSSFRow row = sheet.getRow(rowNum);
        short firstCellNum = row.getFirstCellNum(); // 该行的第一个单元格
        short lastCellNum = row.getLastCellNum(); // 该行的最后一个单元格
        int height = (int) (row.getHeight() / 15.625); // 行的高度
        lsb.append("<tr height=\""
        + height
        + "\" style=\"border:1px solid #000;border-width:0 1px 1px 0;margin:2px 0 2px 0;\">");
        for (short cellNum = firstCellNum; cellNum <= lastCellNum; cellNum++) { // 循环该行的每一个单元格
        HSSFCell cell = row.getCell(cellNum);
        if (cell != null) {
        if (cell.getCellType() == HSSFCell.CELL_TYPE_BLANK) {
        continue;
        } else {
        StringBuffer tdStyle = new StringBuffer(
        "<td style=\"border:1px solid #000; border-width:0 1px 1px 0;margin:2px 0 2px 0; ");
        HSSFCellStyle cellStyle = cell
        .getCellStyle();
        HSSFPalette palette = workbook
        .getCustomPalette(); // 类HSSFPalette用于求颜色的国际标准形式
        HSSFColor hColor = palette
        .getColor(cellStyle
        .getFillForegroundColor());
        HSSFColor hColor2 = palette
        .getColor(cellStyle
        .getFont(workbook)
        .getColor());

        String bgColor = convertToStardColor(hColor);// 背景颜色
        short boldWeight = cellStyle
        .getFont(workbook)
        .getBoldweight(); // 字体粗细
        short fontHeight = (short) (cellStyle
        .getFont(workbook)
        .getFontHeight() / 2); // 字体大小
        String fontColor = convertToStardColor(hColor2); // 字体颜色
        if (bgColor != null
        && !"".equals(bgColor
        .trim())) {
        tdStyle.append(" background-color:"
        + bgColor + "; ");
        }
        if (fontColor != null
        && !"".equals(fontColor
        .trim())) {
        tdStyle.append(" color:"
        + fontColor + "; ");
        }
        tdStyle.append(" font-weight:"
        + boldWeight + "; ");
        tdStyle.append(" font-size: "
        + fontHeight + "%;");
        lsb.append(tdStyle + "\"");

        int width = (int) (sheet.getColumnWidth(cellNum) / 35.7);

        int cellReginCol = getMergerCellRegionCol(
        sheet, rowNum, cellNum); // 合并的列(solspan)
        int cellReginRow = getMergerCellRegionRow(
        sheet, rowNum, cellNum);// 合并的行(rowspan)
        String align = convertVerticalAlignToHtml(cellStyle
        .getAlignment()); //
        String vAlign = convertVerticalAlignToHtml(cellStyle
        .getVerticalAlignment());

        lsb.append(" align=\"" + align
        + "\" valign=\"" + vAlign
        + "\" width=\"" + width
        + "\" ");

        lsb.append(" colspan=\""
        + cellReginCol
        + "\" rowspan=\""
        + cellReginRow + "\"");
        lsb.append(">" + getCellValue(cell)
        + "</td>");
        }
        }
        }
        lsb.append("</tr>");
        }
        }
        }

        }

        }
        output.write(lsb.toString().getBytes());
        } catch (FileNotFoundException e) {
        throw new Exception("文件" + excelFileName + " 没有找到!");
        } catch (IOException e) {
        throw new Exception("文件" + excelFileName + " 处理错误("
        + e.getMessage() + ")!");
        }
        return lsb;
        }

读取xlsx:

public void readXLSX() {
        try {
        this.myFile = new File(this.htmlPath);// new一个File,路径为html文件
        this.output = new FileOutputStream(this.myFile);// new一个流,目标为html文件
        String head = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\"http://www.w3.org/TR/html4/loose.dtd\"><html><meta charset=\"utf-8\"><head></head><body>";// 定义头文件,我在这里加了utf-8,不然会出现乱码
        String tableBegin = "<table style=\"border-collapse:collapse\" border=1 bordercolor=\"black\">";
        String tableEnd = "</table>";
        String rowBegin = "<tr>";
        String rowEnd = "</tr>";
        String colBegin = "<td>";
        String colEnd = "</td>";
        String end = "</body></html>";
        this.output.write(head.getBytes());
        this.output.write(tableBegin.getBytes());
        String str = "";
        String v = null;
        boolean flat = false;
        List<String> ls = new ArrayList<String>();
        try {
        ZipFile xlsxFile = new ZipFile(new File(this.nameStr));// 地址
        ZipEntry sharedStringXML = xlsxFile
        .getEntry("xl/sharedStrings.xml");// 共享字符串
        InputStream inputStream = xlsxFile
        .getInputStream(sharedStringXML);// 输入流 目标上面的共享字符串
        XmlPullParser xmlParser = Xml.newPullParser();// new 解析器
        xmlParser.setInput(inputStream, "utf-8");// 设置解析器类型
        int evtType = xmlParser.getEventType();// 获取解析器的事件类型
        while (evtType != XmlPullParser.END_DOCUMENT) {// 如果不等于 文档结束
        switch (evtType) {
        case XmlPullParser.START_TAG: // 标签开始
        String tag = xmlParser.getName();
        if (tag.equalsIgnoreCase("t")) {
        ls.add(xmlParser.nextText());
        }
        break;
        case XmlPullParser.END_TAG: // 标签结束
        break;
default:
        break;
        }
        evtType = xmlParser.next();
        }
        ZipEntry sheetXML = xlsxFile
        .getEntry("xl/worksheets/sheet1.xml");
        InputStream inputStreamsheet = xlsxFile
        .getInputStream(sheetXML);
        XmlPullParser xmlParsersheet = Xml.newPullParser();
        xmlParsersheet.setInput(inputStreamsheet, "utf-8");
        int evtTypesheet = xmlParsersheet.getEventType();
        this.output.write(rowBegin.getBytes());
        int i = -1;
        while (evtTypesheet != XmlPullParser.END_DOCUMENT) {
        switch (evtTypesheet) {
        case XmlPullParser.START_TAG: // 标签开始
        String tag = xmlParsersheet.getName();
        if (tag.equalsIgnoreCase("row")) {
        } else {
        if (tag.equalsIgnoreCase("c")) {
        String t = xmlParsersheet.getAttributeValue(
        null, "t");
        if (t != null) {
        flat = true;
        System.out.println(flat + "有");
        } else {// 没有数据时 找了我n年,终于找到了 输入<td></td> 表示空格
        this.output.write(colBegin.getBytes());
        this.output.write(colEnd.getBytes());
        System.out.println(flat + "没有");
        flat = false;
        }
        } else {
        if (tag.equalsIgnoreCase("v")) {
        v = xmlParsersheet.nextText();
        this.output.write(colBegin.getBytes());
        if (v != null) {
        if (flat) {
        str = ls.get(Integer.parseInt(v));
        } else {
        str = v;
        }
        this.output.write(str.getBytes());
        this.output.write(colEnd.getBytes());
        }
        }
        }
        }
        break;
        case XmlPullParser.END_TAG:
        if (xmlParsersheet.getName().equalsIgnoreCase("row")
        && v != null) {
        if (i == 1) {
        this.output.write(rowEnd.getBytes());
        this.output.write(rowBegin.getBytes());
        i = 1;
        } else {
        this.output.write(rowBegin.getBytes());
        }
        }
        break;
        }
        evtTypesheet = xmlParsersheet.next();
        }
        System.out.println(str);
        } catch (ZipException e) {
        e.printStackTrace();
        } catch (IOException e) {
        e.printStackTrace();
        } catch (XmlPullParserException e) {
        e.printStackTrace();
        }
        if (str == null) {
        str = "解析文件出现问题";
        }
        this.output.write(rowEnd.getBytes());
        this.output.write(tableEnd.getBytes());
        this.output.write(end.getBytes());
        } catch (Exception e) {
        System.out.println("readAndWrite Exception");
        }
        }

单元格的操作:

/**
 * 取得单元格的值
 *
 * @param cell
 * @return
 * @throws IOException
 */
private static Object getCellValue(HSSFCell cell) throws IOException {
        Object value = "";
        if (cell.getCellType() == HSSFCell.CELL_TYPE_STRING) {
        value = cell.getRichStringCellValue().toString();
        } else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
        if (HSSFDateUtil.isCellDateFormatted(cell)) {
        Date date = (Date) cell.getDateCellValue();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        value = sdf.format(date);
        } else {
        double value_temp = (double) cell.getNumericCellValue();
        BigDecimal bd = new BigDecimal(value_temp);
        BigDecimal bd1 = bd.setScale(3, bd.ROUND_HALF_UP);
        value = bd1.doubleValue();

        DecimalFormat format = new DecimalFormat("#0.###");
        value = format.format(cell.getNumericCellValue());

        }
        }
        if (cell.getCellType() == HSSFCell.CELL_TYPE_BLANK) {
        value = "";
        }
        return value;
        }

/**
 * 判断单元格在不在合并单元格范围内,如果是,获取其合并的列数。
 *
 * @param sheet
 *            工作表
 * @param cellRow
 *            被判断的单元格的行号
 * @param cellCol
 *            被判断的单元格的列号
 * @return
 * @throws IOException
 */
private static int getMergerCellRegionCol(HSSFSheet sheet, int cellRow,
        int cellCol) throws IOException {
        int retVal = 0;
        int sheetMergerCount = sheet.getNumMergedRegions();
        for (int i = 0; i < sheetMergerCount; i++) {
        CellRangeAddress cra = (CellRangeAddress) sheet.getMergedRegion(i);
        int firstRow = cra.getFirstRow(); // 合并单元格CELL起始行
        int firstCol = cra.getFirstColumn(); // 合并单元格CELL起始列
        int lastRow = cra.getLastRow(); // 合并单元格CELL结束行
        int lastCol = cra.getLastColumn(); // 合并单元格CELL结束列
        if (cellRow >= firstRow && cellRow <= lastRow) { // 判断该单元格是否是在合并单元格中
        if (cellCol >= firstCol && cellCol <= lastCol) {
        retVal = lastCol - firstCol + 1; // 得到合并的列数
        break;
        }
        }
        }
        return retVal;
        }

/**
 * 判断单元格是否是合并的单格,如果是,获取其合并的行数。
 *
 * @param sheet
 *            表单
 * @param cellRow
 *            被判断的单元格的行号
 * @param cellCol
 *            被判断的单元格的列号
 * @return
 * @throws IOException
 */
private static int getMergerCellRegionRow(HSSFSheet sheet, int cellRow,
        int cellCol) throws IOException {
        int retVal = 0;
        int sheetMergerCount = sheet.getNumMergedRegions();
        for (int i = 0; i < sheetMergerCount; i++) {
        CellRangeAddress cra = (CellRangeAddress) sheet.getMergedRegion(i);
        int firstRow = cra.getFirstRow(); // 合并单元格CELL起始行
        int firstCol = cra.getFirstColumn(); // 合并单元格CELL起始列
        int lastRow = cra.getLastRow(); // 合并单元格CELL结束行
        int lastCol = cra.getLastColumn(); // 合并单元格CELL结束列
        if (cellRow >= firstRow && cellRow <= lastRow) { // 判断该单元格是否是在合并单元格中
        if (cellCol >= firstCol && cellCol <= lastCol) {
        retVal = lastRow - firstRow + 1; // 得到合并的行数
        break;
        }
        }
        }
        return 0;
        }

/**
 * 单元格背景色转换
 *
 * @param hc
 * @return
 */
private String convertToStardColor(HSSFColor hc) {
        StringBuffer sb = new StringBuffer("");
        if (hc != null) {
        int a = HSSFColor.AUTOMATIC.index;
        int b = hc.getIndex();
        if (a == b) {
        return null;
        }
        sb.append("#");
        for (int i = 0; i < hc.getTriplet().length; i++) {
        String str;
        String str_tmp = Integer.toHexString(hc.getTriplet()[i]);
        if (str_tmp != null && str_tmp.length() < 2) {
        str = "0" + str_tmp;
        } else {
        str = str_tmp;
        }
        sb.append(str);
        }
        }
        return sb.toString();
        }

/**
 * 单元格小平对齐
 *
 * @param alignment
 * @return
 */
private String convertAlignToHtml(short alignment) {
        String align = "left";
        switch (alignment) {
        case HSSFCellStyle.ALIGN_LEFT:
        align = "left";
        break;
        case HSSFCellStyle.ALIGN_CENTER:
        align = "center";
        break;
        case HSSFCellStyle.ALIGN_RIGHT:
        align = "right";
        break;
default:
        break;
        }
        return align;
        }

/**
 * 单元格垂直对齐
 *
 * @param verticalAlignment
 * @return
 */
private String convertVerticalAlignToHtml(short verticalAlignment) {
        String valign = "middle";
        switch (verticalAlignment) {
        case HSSFCellStyle.VERTICAL_BOTTOM:
        valign = "bottom";
        break;
        case HSSFCellStyle.VERTICAL_CENTER:
        valign = "center";
        break;
        case HSSFCellStyle.VERTICAL_TOP:
        valign = "top";
        break;
default:
        break;
        }
        return valign;
        }

public void makeFile() {
        String sdStateString = android.os.Environment.getExternalStorageState();// 获取外部存储状态
        if (sdStateString.equals(android.os.Environment.MEDIA_MOUNTED)) {// 确认sd卡存在,原理不知,媒体安装??
        try {
        File sdFile = android.os.Environment
        .getExternalStorageDirectory();// 获取扩展设备的文件目录
        String path = sdFile.getAbsolutePath() + File.separator
        + "library";// 得到sd卡(扩展设备)的绝对路径+"/"+xiao
        File dirFile = new File(path);// 获取xiao文件夹地址
        if (!dirFile.exists()) {// 如果不存在
        dirFile.mkdir();// 创建目录
        }
        File myFile = new File(path + File.separator +filename+ ".html");// 获取my.html的地址
        if (!myFile.exists()) {// 如果不存在
        myFile.createNewFile();// 创建文件
        }
        this.htmlPath = myFile.getAbsolutePath();// 返回路径
        } catch (Exception e) {
        }
        }
        }

保存和读取图片:

/* 用来在sdcard上创建图片 */
public void makePictureFile() {
        String sdString = android.os.Environment.getExternalStorageState();// 获取外部存储状态
        if (sdString.equals(android.os.Environment.MEDIA_MOUNTED)) {// 确认sd卡存在,原理不知
        try {
        File picFile = android.os.Environment
        .getExternalStorageDirectory();// 获取sd卡目录
        String picPath = picFile.getAbsolutePath() + File.separator
        + "library";// 创建目录,上面有解释
        File picDirFile = new File(picPath);
        if (!picDirFile.exists()) {
        picDirFile.mkdir();
        }
        File pictureFile = new File(picPath + File.separator
        +getFileName(nameStr)+ presentPicture + ".jpg");// 创建jpg文件,方法与html相同
        if (!pictureFile.exists()) {
        pictureFile.createNewFile();
        }
        this.picturePath = pictureFile.getAbsolutePath();// 获取jpg文件绝对路径
        } catch (Exception e) {
        System.out.println("PictureFile Catch Exception");
        }
        }
        }

public String getFileName(String pathandname){

        int start=pathandname.lastIndexOf("/");
        int end=pathandname.lastIndexOf(".");
        if(start!=-1 && end!=-1){
        return pathandname.substring(start+1,end);
        }else{
        return null;
        }

        }

public void writePicture() {
        Picture picture = (Picture) pictures.get(presentPicture);

        byte[] pictureBytes = picture.getContent();

        Bitmap bitmap = BitmapFactory.decodeByteArray(pictureBytes, 0,
        pictureBytes.length);

        makePictureFile();
        presentPicture++;

        File myPicture = new File(picturePath);

        try {

        FileOutputStream outputPicture = new FileOutputStream(myPicture);

        outputPicture.write(pictureBytes);

        outputPicture.close();
        } catch (Exception e) {
        System.out.println("outputPicture Exception");
        }

        String imageString = "<img src=\"" + picturePath + "\"";
        imageString = imageString + ">";

        try {
        output.write(imageString.getBytes());
        } catch (Exception e) {
        System.out.println("output Exception");
        }
        }
public void writeDOCXPicture(byte[] pictureBytes) {
        Bitmap bitmap = BitmapFactory.decodeByteArray(pictureBytes, 0,
        pictureBytes.length);
        makePictureFile();
        this.presentPicture++;
        File myPicture = new File(this.picturePath);
        try {
        FileOutputStream outputPicture = new FileOutputStream(myPicture);
        outputPicture.write(pictureBytes);
        outputPicture.close();
        } catch (Exception e) {
        System.out.println("outputPicture Exception");
        }
        String imageString = "<img src=\"" + this.picturePath + "\"";

        imageString = imageString + ">";
        try {
        this.output.write(imageString.getBytes());
        } catch (Exception e) {
        System.out.println("output Exception");
        }
        }

public void writeParagraphContent(Paragraph paragraph) {
        Paragraph p = paragraph;
        int pnumCharacterRuns = p.numCharacterRuns();

        for (int j = 0; j < pnumCharacterRuns; j++) {

        CharacterRun run = p.getCharacterRun(j);

        if (run.getPicOffset() == 0 || run.getPicOffset() >= 1000) {
        if (presentPicture < pictures.size()) {
        writePicture();
        }
        } else {
        try {
        String text = run.text();
        if (text.length() >= 2 && pnumCharacterRuns < 2) {
        output.write(text.getBytes());
        } else {
        int size = run.getFontSize();
        int color = run.getColor();
        String fontSizeBegin = "<font size=\""
        + decideSize(size) + "\">";
        String fontColorBegin = "<font color=\""
        + decideColor(color) + "\">";
        String fontEnd = "</font>";
        String boldBegin = "<b>";
        String boldEnd = "</b>";
        String islaBegin = "<i>";
        String islaEnd = "</i>";

        output.write(fontSizeBegin.getBytes());
        output.write(fontColorBegin.getBytes());

        if (run.isBold()) {
        output.write(boldBegin.getBytes());
        }
        if (run.isItalic()) {
        output.write(islaBegin.getBytes());
        }

        output.write(text.getBytes());

        if (run.isBold()) {
        output.write(boldEnd.getBytes());
        }
        if (run.isItalic()) {
        output.write(islaEnd.getBytes());
        }
        output.write(fontEnd.getBytes());
        output.write(fontEnd.getBytes());
        }
        } catch (Exception e) {
        System.out.println("Write File Exception");
        }
        }
        }
        }

一些格式:

public int decideSize(int size) {

        if (size >= 1 && size <= 8) {
        return 1;
        }
        if (size >= 9 && size <= 11) {
        return 2;
        }
        if (size >= 12 && size <= 14) {
        return 3;
        }
        if (size >= 15 && size <= 19) {
        return 4;
        }
        if (size >= 20 && size <= 29) {
        return 5;
        }
        if (size >= 30 && size <= 39) {
        return 6;
        }
        if (size >= 40) {
        return 7;
        }
        return 3;
        }

private String decideColor(int a) {
        int color = a;
        switch (color) {
        case 1:
        return "#000000";
        case 2:
        return "#0000FF";
        case 3:
        case 4:
        return "#00FF00";
        case 5:
        case 6:
        return "#FF0000";
        case 7:
        return "#FFFF00";
        case 8:
        return "#FFFFFF";
        case 9:
        return "#CCCCCC";
        case 10:
        case 11:
        return "#00FF00";
        case 12:
        return "#080808";
        case 13:
        case 14:
        return "#FFFF00";
        case 15:
        return "#CCCCCC";
        case 16:
        return "#080808";
default:
        return "#000000";
        }
        }

好吧,大功告成,看看效果。

我设置了个dialog式的activity来展现下,可以放大缩小:

读取doc:

读取docx:

读取xls和xlsx:

代码demo已经上传:http://download.csdn.net/detail/bit_kaki/9676592 需要两积分。如果积分不够的可以私信我或者留言地址,我看到也会回复

时间: 2024-08-05 15:25:16

Android中解析读取复杂word,excel,ppt等的方法的相关文章

lucent检索技术之创建索引:使用POI读取txt/word/excel/ppt/pdf内容

在使用lucent检索文档时,必须先为各文档创建索引.索引的创建即读出文档信息(如文档名称.上传时间.文档内容等),然后再经过分词建索引写入到索引文件里.这里主要是总结下读取各类文档内容这一步. 一.之前做过一个小工具也涉及到读取word和excel内容,采用的是com组件的方式来读取.即导入COM库,引入命名空间(using Microsoft.Office.Interop.Word;using Microsoft.Office.Interop.Excel;),然后读代码如下: 读取word

Aspose是一个很强大的控件,可以用来操作word,excel,ppt等文件

Aspose是一个很强大的控件,可以用来操作word,excel,ppt等文件,用这个控件来导入.导出数据非常方便.其中Aspose.Cells就是用来操作Excel的,功能有很多.我所用的是最基本的功能,读取Excel的数据并导入到Dataset或数据库中.读取Excel表格数据的代码如下: 首先要引入命名空间:using Aspose.Cells; Workbook workbook = new Workbook(); workbook.Open("C:\\test.xlsx");

java 如果将 word,excel,ppt如何转pdf --openoffice (1)

承上启下,可折叠 上一篇说的是:服务器是windows server时,用jacob将msoffice(指的是word,excel,ppt)转换成pdf. 若被部署项目的服务器是centOS等linux server时,就不能用之前的上述说的那种方式了. 在上一篇说到openoffice将msoffice转成pdf的时候会存在排版错位的问题,或者有的内容消失了,这是因为msoffice中的一些特有格式,openoffice不识别解析不了导致的.当然大部分的普通msoffice文档转换成pdf时,

Atitit.office&#160;word&#160;&#160;excel&#160;&#160;ppt&#160;pdf&#160;的web在线预览方案与html转换方案&#160;attilax&#160;总结

Atitit.office word  excel  ppt pdf 的web在线预览方案与html转换方案 attilax 总结 1. office word  excel pdf 的web预览要求1 1.1. 显示效果要好1 1.2. 可以自定义显示界面1 1.3. 不需要控件,兼容性好1 1.4. 支持编辑操作1 2. 纯html预览解决之道(自由的格式)1 3. 转换swf flash方案2 4. 转换pdf方式..更多的浏览器已经直接支持pdf格式查看2 5. 控件方式2 6. Hyb

Android中写入读取XML

获取XML文件的基本思路是,通过getResources().getXml()获的XML原始文件,得到XmlResourceParser对象,通过该对象来判断是文档的开头还是结尾,是某个标签的开始还是结尾,并通过一些获取属性的方法来遍历XML文件,从而访问XML文件的内容,下面是一个访问XML文件内容的例子,并将内容更显示在一个TextView上 数据写入xml: ReadXMLTest.java [java] view plaincopy //xml数据生成 private String Wr

Android中解析Json数据

在开发中经常会遇到解析json的问题 在这里总结几种解析的方式: 方式一: json数据: private String jsonData = "[{\"name\":\"Michael\",\"age\":20},{\"name\":\"Mike\",\"age\":21}]"; 解析jsonData的方法 try { //如果需要解析Json数据,首先要生成一个J

Android中解析与创建XML文件

Android中解析与创建XML文件 在Android中对XML的操作有多种方式,常见的有三种方式:SAX.DOM和PULL方式. DOM方式会把整个XML文件加载到内存中,在PC上常使用DOM的方式. 但是在性能敏感的设备上,主要采用的是SAX的方式,但是缺点是嵌套多个分支的时候处理不是很方便. 而PULL的方式类似SAX方式,同样很节省内存. 因此,本文章中只提供PULL的方式解析与创建XML文件. 基础类 本例中使用的实体类的定义如下: public class CAddress impl

Android中 与数据库有关的两个废除的方法

占没有查到替换的方法,先记着! The method startManagingCursor(Cursor) from the type Activity is deprecated The constructor SimpleCursorAdapter(Context, int, Cursor, String[], int[]) is deprecated Android中 与数据库有关的两个废除的方法,布布扣,bubuko.com

Android中退出多个Activity的两个经典方法

这里介绍两种方法:一种把每个activity记住,然后逐一干掉:另一种思路是使用广播. 方法一.用list保存activity实例,然后逐一干掉 上代码: import java.util.LinkedList; import java.util.List; import android.app.Activity; import android.app.AlertDialog; import android.app.Application; import android.content.Dial