protected void processHyperlink(HWPFDocu" />
乔山办公网我们一直在努力
您的位置:乔山办公网 > word文档 > java <em>poi</em>怎么将<em>word</em>转&l

java <em>poi</em>怎么将<em>word</em>转&l

作者:乔山办公网日期:

返回目录:word文档


用poi读取 xlsx或者 docx,然后输出成 html文件。

protected void processHyperlink(HWPFDocumentCore wordDocument,
org.w3c.dom.Element currentBlock,
Range textRange,
int currentTableLevel,
java.lang.String hyperlink)

实现代码如下:

public class Word2Html {

public static void main(String argv[]) {
7a686964616fe78988e69d83365try {
//word 路径       html输出路径
convert2Html("D:/doctohtml/1.doc","D:/doctohtml/1.html");
} catch (Exception e) {
e.printStackTrace();
}
}

public static void writeFile(String content, String path) {
FileOutputStream fos = null;
BufferedWriter bw = null;
try {
File file = new File(path);
fos = new FileOutputStream(file);
bw = new BufferedWriter(new OutputStreamWriter(fos,"utf-8"));
bw.write(content);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (bw != null)
bw.close();
if (fos != null)
fos.close();
} catch (IOException ie) {
}
}
}

public static void convert2Html(String fileName, String outPutFile)
throws TransformerException, IOException,
ParserConfigurationException {
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
 wordToHtmlConverter.setPicturesManager( new PicturesManager()
         {
             public String savePicture( byte[] content,
                     PictureType pictureType, String suggestedName,
                     float widthInches, float heightInches )
             {
                 //html 中  图片标签中 显示的图片路路径  <img src="d:/test/0.jpg"/>
                 return "d:/doctohtml/"+suggestedName;
             }
         } );
wordToHtmlConverter.processDocument(wordDocument);
//save pictures
List pics=wordDocument.getPicturesTable().getAllPictures();
if(pics!=null){
for(int i=0;i<pics.size();i++){
Picture pic = (Picture)pics.get(i);
System.out.println();
try {
//word中图片的存储路径
pic.writeImageContent(new FileOutputStream("D:/doctohtml/"
+ pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}  
}
}
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
writeFile(new String(out.toByteArray()), outPutFile);
}
}


Java可以使用这个开源框架,对word进行读取合并等操作,Apache POI是一个开源的利用Java读写Excel、WORD等微软OLE2组件文档的项目。最新的3.5版本e799bee5baa6e997aee7ad94e4b893e5b19e334有很多改进,加入了对采用OOXML格式的Office 2007支持,如xlsx、docx、pptx文档。 示例如下:import org.apache.poi.POITextExtractor;
import org.apache.poi.hwpf.extractor.WordExtractor;
//得到.doc文件提取器
org.apache.poi.hwpf.extractor.WordExtractor doc = new WordExtractor(new FileInputStream(filePath));
//提取.doc正文文本
String text = doc.getText();
//提取.doc批注
String[] comments = doc. getCommentsText();

2007

import org.apache.poi.POITextExtractor;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFComment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
//得到.docx文件提取器
org.apache.poi.xwpf.extractor.XWPFWordExtractor docx = new XWPFWordExtractor(POIXMLDocument.openPackage(filePath));
//提取.docx正文文本
String text = docx.getText();
//提取.docx批注
org.apache.poi.xwpf.usermodel.XWPFComment[] comments = docx.getDocument()).getComments();
for(XWPFComment comment:comments){
comment.getId();//提取批注Id
comment.getAuthor();//提取批注修改人
comment.getText();//提取批注内容
}

相关阅读

  • <em>poi</em>的<em>word</em>转<em>

  • 乔山办公网word文档
  • 实现代码如下:public class Word2Html { public static void main(String argv[]) { try { //word 路径 html输出路径 e5a48de588b6e799bee5baa6e997aee7ad94365convert2Html(
关键词不能为空
极力推荐

ppt怎么做_excel表格制作_office365_word文档_365办公网