如何将PDF格式文件转换成HTML网页格式-c office转html,office和wps哪个好用-乔山办公网

作者：乔山办公网日期：2020-10-24 05:59:21

返回目录：office365

UIPanel.clipRange属性，他是一个思维向量，前两个维度是CENTER，后两个是SIZE。设置的时候，需要把VECTOR4取出来修改后，全部设置回去。

如何将PDF格式文件转换成HTML网页格式
方法：需要PDF转换成html转换器（迅捷PDF在线转换器），进入浏览器内搜索PDF转HTML后，按下面方法进行：
1.双击打开PDF转换成html转换器，如果是zd想PDF转html的话就选择“文件转html”模式版；
2.转换模式选择好之后，点击“添加文件”按钮，将需要转换的PDF文件添加到软件中，一次可添加多个文件实现批量转换
3.PDF文件添加好之后，设置需要咋混和的文件格式参数后点击开始转换的按钮
4.转换开始后需要耐权心等10秒钟左右，转换进度完成后就可以保存文件在桌面上了
如果还有什么问题欢迎题主追问！
可以重属名：HTML格式的文件名后来面加入EXCEL格式．如：文件名为源：1．HTML格式，可以改成：1．XLS就可以变成表格百的格试了还有一种：选中文件，单击右健，在打开方式里面选择加应的文件格式就度行了．

1、下载OpenOffice，
2、下载Jodconverter 这是一个开启OpenOffice进行格式转化的第三方jar包。

3、等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice的一项服务：C:\Program Files (x86)\OpenOffice.org 3\program>soffice -headless -accept="socket,port=8100;urp;"

5、打开eclipse

6、喝杯热茶，等待eclipse打开。

7、新建eclipse项目，导入Jodconverter/lib 下得jar包。

8、Coding...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

package com.mzule.doc2html.util;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.ConnectException;
import java.util.Date;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.artofsolving.jodconverter.DocumentConverter;
import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;

/**
* 将Word文档转换e799bee5baa6e997aee7ad94e59b9ee7ad94333成html字符串的工具类
*
* @author MZULE
*
*/
public class Doc2Html {

public static void main(String[] args) {
System.out
.println(toHtmlString(new File("C:/test/test.doc"), "C:/test"));
}

/**
* 将word文档转换成html文档
*
* @param docFile
* 需要转换的word文档
* @param filepath
* 转换之后html的存放路径
* @return 转换之后的html文件
*/
public static File convert(File docFile, String filepath) {
// 创建保存html的文件
File htmlFile = new File(filepath + "/" + new Date().getTime()
+ ".html");
// 创建Openoffice连接
OpenOfficeConnection con = new SocketOpenOfficeConnection(8100);
try {
// 连接
con.connect();
} catch (ConnectException e) {
System.out.println("获取OpenOffice连接失败...");
e.printStackTrace();
}
// 创建转换器
DocumentConverter converter = new OpenOfficeDocumentConverter(con);
// 转换文档问html
converter.convert(docFile, htmlFile);
// 关闭openoffice连接
con.disconnect();
return htmlFile;
}

/**
* 将word转换成html文件，并且获取html文件代码。
*
* @param docFile
* 需要转换的文档
* @param filepath
* 文档中图片的保存位置
* @return 转换成功的html代码
*/
public static String toHtmlString(File docFile, String filepath) {
// 转换word文档
File htmlFile = convert(docFile, filepath);
// 获取html文件流
StringBuffer htmlSb = new StringBuffer();
try {
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(htmlFile)));
while (br.ready()) {
htmlSb.append(br.readLine());
}
br.close();
// 删除临时文件
htmlFile.delete();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
// HTML文件字符串
String htmlStr = htmlSb.toString();
// 返回经过清洁的html文本
return clearFormat(htmlStr, filepath);
}

/**
* 清除一些不需要的html标记
*
* @param htmlStr
* 带有复杂html标记的html语句
* @return 去除了不需要html标记的语句
*/
protected static String clearFormat(String htmlStr, String docImgPath) {
// 获取body内容的正则
String bodyReg = "<BODY .*</BODY>";
Pattern bodyPattern = Pattern.compile(bodyReg);
Matcher bodyMatcher = bodyPattern.matcher(htmlStr);
if (bodyMatcher.find()) {
// 获取BODY内容，并转化BODY标签为DIV
htmlStr = bodyMatcher.group().replaceFirst("<BODY", "<DIV")
.replaceAll("</BODY>", "</DIV>");
}
// 调整图片地址
htmlStr = htmlStr.replaceAll("<IMG SRC=\"", "<IMG SRC=\"" + docImgPath
+ "/");
// 把<P></P>转换成</div></div>保留样式
// content = content.replaceAll("(<P)([^>]*>.*?)(<\\/P>)",
// "<div$2</div>");
// 把<P></P>转换成</div></div>并删除样式
htmlStr = htmlStr.replaceAll("(<P)([^>]*)(>.*?)(<\\/P>)", "<p$3</p>");
// 删除不需要的标签
htmlStr = htmlStr
.replaceAll(
"<[/]?(font|FONT|span|SPAN|xml|XML|del|DEL|ins|INS|meta|META|[ovwxpOVWXP]:\\w+)[^>]*?>",
"");
// 删除不需要的属性
htmlStr = htmlStr
.replaceAll(
"<([^>]*)(?:lang|LANG|class|CLASS|style|STYLE|size|SIZE|face|FACE|[ovwxpOVWXP]:\\w+)=(?:'[^']*'|\"\"[^\"\"]*\"\"|[^>]+)([^>]*)>",
"<$1$2>");
return htmlStr;
}

}

本文标签：office和wps哪个好用(13)c office转html(1)

如何将PDF格式文件转换成HTML网页格式-c office转html,office和wps哪个好用

返回目录：office365

相关阅读

如何将PDF格式文件转换成HTML网页格式-c office转html,office和wps哪个好用

安装wps之后有必要再安装office吗？？-wps2019对比office2019,office和wps哪个好用

WPS office如何转换成 Microsoft offi...-微软office转wps,微软office和wps哪

PPT做完以后怎么保存幻灯片-极速office ppt好用吗,极速office和wps哪个好用

WPS Office和微软的Office有什么区别？-wps显示不了微软office,微软office和wps哪个好用

wps和office区别在哪里，哪个好用啊-wps和微软office 2016,微软office和wps哪个好用

<em>office2016</em>中的“自动保存”快速访问按钮为什么是灰色...-offic

热门文章

在哪里可以下载微软的<em>office2007</em>？-micsoft office200

win10重置后打开office提示需要密匙激活解决方法-office2016激活工具

把笔记本预装office家庭和学生版2016之后不能安装pr...-安装正版office,office2019如何自定义

在Word2003中怎样使一个文档中的每一个页眉的内容都不相...-office2003怎么改页眉,页眉怎么改不同的

<em>SP3是什么意思</em>呀，-office sp3是什么意思,office2007sp

怎么下载并安装word2010？详细步骤-如何下载office步骤,如何下载office

聚合标签

ppt怎么做_excel表格制作_office365_word文档_365办公网