.NET 轻松实现HTML的绝对过滤之SafeHelper

当今网页中经常使用到网页编辑器，因为人们需要在网页中插入图片，视频，样式等html代码内容，这使得网页的信息更加丰富。随之而来的，也给程序开发者带来了不少麻烦，因为提交的html中难免会出现不安全标记和非法标记，比如script，比如未知标签。这需要我们编写大量的程序代码去分析指定用户提交的html信息安全性，标准性。

方法1：

今天我要给大家推荐一个组件，他可以智能的分析出代码的出错部份和清除出错部份，并且配置比较简单。他的名字叫SafeHelper，通过配置文件设定的标记外，他将清楚和检查出不允许出现的标记。使用方法相当简单，只需要调用一个静态方法即可。

第一步，新建一个文件名为“wuxiu.HtmlAnalyserConfig.xml”的xml文件到网站跟目录，并添写以下代码：

<?xml version="1.0" encoding="utf-8" ?>

<HtmlAnylyser >

  <AllowTags>

    <div attrs="class|style"/>

    <ul attrs="class"/>

    <li/>

    <table attrs="class|cellpadding|cellspacing|border|width"/>

    <tr attrs="class"/>

    <th attrs="class"/>

    <td attrs="class"/>

    <span attrs="style|class"/>

    <object attrs="classid|codebase|width|height"/>

    <param attrs="name|value"/>

    <embed attrs="src|width|height|quality|pluginspage|type|wmode"/>

    <a attrs="href|target|title"/>

    <h1 attrs="class"/>

    <h2 attrs="class"/>

    <h3 attrs="class"/>

    <h4 attrs="class"/>

    <h5 attrs="class"/>

    <h6 attrs="class"/>

    <strong attrs="class"/>

    <b attrs="class"/>

    <i attrs="class"/>

    <em attrs="class"/>

    <u attrs="class"/>

    <hr attrs="class"/>

    <br attrs="class"/>

    <img attrs="class|src|width|height|alt"/>

    <p attrs="class"/>

    <ol attrs="class"/>

    <dl attrs="class"/>

    <dt attrs="class"/>

    <dd attrs="class"/>

  </AllowTags>

</HtmlAnylyser>

第二步，添加dll引用，safehelper官网：http://www.wuxiu.org/downloads.html

第三步,调用如下代码可以实现对html中未知标记清除（wuxiu.HtmlAnalyserConfig.xml中未定义的所有标记）：

string html = "<script>alert(‘yes‘);</script><p>content</p>";

html = wuxiu.SafeHelper.HtmlSafer.HtmlSaferAnalyser.ToSafeHtml(html);

Response.Write(html);

或检查所有未知标记

string html = "<script>alert(‘yes‘);</script><p>myhtmlcontent</p>";

string [] dangers = wuxiu.SafeHelper.HtmlSafer.HtmlSaferAnalyser.ValidHtml(html,false);

foreach (string danger_tag in dangers)

{

    Response.Write(danger_tag+"<br/>");

}

方法二，通过正则表达式匹配出script危险标记：

public static string StripHTML(string strHtml)

{

    string[]aryReg =

    {

      @"<script[^>]*?>.*?</script>",
@"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""‘])(\\["

        "‘tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>", @"([\r\n])[\s]+", @

        "&(quot|#34);", @"&(amp|#38);", @"&(lt|#60);", @"&(gt|#62);", @

        "&(nbsp|#160);", @"&(iexcl|#161);", @"&(cent|#162);", @"&(pound|#163);",

        @"&(copy|#169);", @"&#(\d+);", @"-->", @"<!--.*\n"

    };
string[]aryRep =

    {

      "", "", "", "\"", "&", "<", ">", "   ", "\xa1", //chr(161),

      "\xa2", //chr(162),

      "\xa3", //chr(163),

      "\xa9", //chr(169),

      "", "\r\n", ""

    };
string newReg = aryReg[0];

    string strOutput = strHtml;

    for (int i = 0; i < aryReg.Length; i++)

    {

      Regex regex = new Regex(aryReg[i], RegexOptions.IgnoreCase);

      strOutput = regex.Replace(strOutput, aryRep[i]);

    }

    strOutput.Replace("<", "");

    strOutput.Replace(">", "");

    strOutput.Replace("\r\n", "");

    return strOutput;

}

时间： 2024-10-13 11:52:43

.NET 轻松实现HTML的绝对过滤之SafeHelper

.NET 轻松实现HTML的绝对过滤之SafeHelper的相关文章

使用GoWorld游戏服务器引擎轻松实现分布式聊天服务器

在ArcEngine下实现图层属性过滤的两种方法

ASP.NET SignalR 与 LayIM2.0 配合轻松实现Web聊天室（一）之基层数据搭建，让数据活起来（数据获取）

.NET深入实战系列--EF到底怎么写过滤条件

探索推荐引擎内部的秘密，第 2 部分: 深入推荐引擎相关算法 - 协同过滤(转)

.NET深入实战系列--EF到底怎么写过滤条件（转）

统计--过滤(筛选)索引的统计信息过期问题测试

MSCRM4 在过滤后的LOOKUP框中实现查找

ASP.NET SignalR 与 LayIM2.0 配合轻松实现Web聊天室（七）之历史记录查询（时间，关键字，图片，文件），关键字高亮显示。