c# 去除文本的html标签

 1 public static string ContentReplace(string input)
 2         {
 3             input = Regex.Replace(input, @"<(.[^>]*)>", "", RegexOptions.IgnoreCase);
 4             input = Regex.Replace(input, @"([\r\n])[\s]+", "", RegexOptions.IgnoreCase);
 5             input = Regex.Replace(input, @"-->", "", RegexOptions.IgnoreCase);
 6             input = Regex.Replace(input, @"<!--.*", "", RegexOptions.IgnoreCase);
 7
 8             input = Regex.Replace(input, @"&(quot|#34);", "\"", RegexOptions.IgnoreCase);
 9             input = Regex.Replace(input, @"&(amp|#38);", "&", RegexOptions.IgnoreCase);
10             input = Regex.Replace(input, @"&(lt|#60);", "<", RegexOptions.IgnoreCase);
11             input = Regex.Replace(input, @"&(gt|#62);", ">", RegexOptions.IgnoreCase);
12             input = Regex.Replace(input, @"&(nbsp|#160);", " ", RegexOptions.IgnoreCase);
13             input = Regex.Replace(input, @"&(iexcl|#161);", "\xa1", RegexOptions.IgnoreCase);
14             input = Regex.Replace(input, @"&(cent|#162);", "\xa2", RegexOptions.IgnoreCase);
15             input = Regex.Replace(input, @"&(pound|#163);", "\xa3", RegexOptions.IgnoreCase);
16             input = Regex.Replace(input, @"&(copy|#169);", "\xa9", RegexOptions.IgnoreCase);
17             input = Regex.Replace(input, @"&#(\d+);", "", RegexOptions.IgnoreCase);
18
19             input.Replace("<", "");
20             input.Replace(">", "");
21             input.Replace("\r\n", "");
22             //去两端空格，中间多余空格
23             input = Regex.Replace(input.Trim(), "\\s+", " ");
24             return input;
25         }

时间： 2024-10-10 03:04:24

c# 去除文本的html标签的相关文章

将Repeater控件导入Excel表（正则清除a标签保留文本，img标签清除）

导入的表:<table width="100%" cellpadding="0" style="border-color: #89C2D6; margin-top: 5px; <colgroup> <col width="16%"> <col width="100">

Jsoup提取文本时保留标签

使用Jsoup来对html进行处理比较方便,你可能会用它来提取文本或清理html标签.如果你想提取文本时保留标签,可以使用Jsoup.clean方法,参数为html及标签白名单: Jsoup.clean(html, new Whitelist().addTags("img").addAttributes("img", "data-original", "align", "alt", "height

去除html的&nbsp;标签

// 去除html的标签 String str = " 2016-09-02"; if (str.indexOf("\u00A0") != -1) { str= str.replaceAll("\u00A0", "");} System.err.println(str); 成功删掉,好烦啊增加一个方法: /** * java去除字符串中的空格.回车.换行符.制表符 * @param str * @return */ pu

C# 去除所有的html标签

/// <summary> /// 去除所有的html标签 /// </summary> /// <param name="strhtml"></param> /// <returns></returns> public static string Removestriphtml(string strhtml) { string stroutput = strhtml; Regex regex = new Rege

java 解析富文本处理 img 标签

很多项目都需要到富文本来添加内容,就好比新闻啊,旅游景点之类的,都需要使用富文本去添加数据,然而怎么我这边就发现了两个问题 1)怎样将富文本的图片的 src 获取出来? 2)后台上传的时候用的是相对路径,前端显示需要的是最对路径我下面就记录一下解决这两个问题的方法 1):怎么将富文本的图片的 src 获取出来?很简单,就一个工具即可 public static List<String> getImgStr(String htmlStr) { List<String> list

js去除字符串中的标签

var str="<p>js去除字符串中的标签</p>"; var result=str.replace(/<.*?>/ig,""); console.log(result); 原文地址:https://www.cnblogs.com/Mrrabbit/p/8455139.html

去除文本中的HTML标签、中英文标点符号、数字及英文单词

在进行中文分词统计前,往往要先把爬取下来的文本中包含的一些标签.标点符号.英文字母等过滤掉,这一过程叫做数据清洗. #coding=utf-8 import re import codecs def strs_filter(file): with codecs.open(file,"r","utf8") as f,codecs.open("result.txt","a+","utf8") as c: lin

去除HTML中的标签内容

采集后的数据都带有'<>'html标签: <img src="http://i4.hdfimg.com/www/images/giftrans/3d/da/7b/18414.gif" border="0"/><span class='WmoJPQM2AzpQMA'>科研<span class='WmoJPQM2AzhQMQ'>最早和<span class='WmoJPQM2AzxQNw'>一项<spa

去除行块级标签之间的默认间距

当两个行块级标签之间有空格,或者换行时,行块级标签之间会有一个默认4px的间距.去除方法如下: 给行块级标签的父级设置:font-size: 0: 在给相应的行块级标签设置需要的 font-size: 如下所示: <style> .parent{ font-size: 0; } .parent span{ display: inline-block; font-size: 14px; } </style> <div class="parent"> &