过滤特殊输入字符(Java)

Arbitrary text placed in an HTML tag often needs to be altered, to ensure that the resulting HTML remains valid.

Problem characters can include:

  • <
  • >
  • "
  • \
  • &

These characters can be replaced with HTML character entities. For example, < can be replaced with &lt;.

Query strings (Blah=1&Name=Bob) often need to be escaped as well. If the query string contains special characters, it will need to be "URL encoded". (See the javadoc for the URLEncoder class for further information.) This will ensure the query string conforms with valid HTTP.

There‘s often a second issue, however, with regard to query strings. If a query string is placed in an HREF attribute, then even a URL encoded query string is often not of valid form. This is because URLEncoder produces valid HTTP, but it doesn‘t in general produce text which is a valid HTML attribute - the ampersand character needs to be replaced by the corresponding character entity &amp;.

Here is an example of a utility class which escapes special characters for HTML, XML, regular expressions, and so on.

package hirondelle.web4j.util;

import java.net.URLEncoder;
import java.io.UnsupportedEncodingException;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import hirondelle.web4j.security.SafeText;
import hirondelle.web4j.ui.translate.Text;
import hirondelle.web4j.ui.translate.Tooltips;
import hirondelle.web4j.ui.translate.TextFlow;
import hirondelle.web4j.ui.tag.Populate;
import hirondelle.web4j.database.Report;

/**
 Convenience methods for escaping special characters related to HTML, XML,
 and regular expressions.

 <P>To keep you safe by default, WEB4J goes to some effort to escape
 characters in your data when appropriate, such that you <em>usually</em>
 don‘t need to think too much about escaping special characters. Thus, you
  shouldn‘t need to <em>directly</em> use the services of this class very often. 

 <P><span class=‘highlight‘>For Model Objects containing free form user input,
 it is highly recommended that you use {@link SafeText}, not <tt>String</tt></span>.
 Free form user input is open to malicious use, such as
 <a href=‘http://www.owasp.org/index.php/Cross_Site_Scripting‘>Cross Site Scripting</a>
 attacks.
 Using <tt>SafeText</tt> will protect you from such attacks, by always escaping
 special characters automatically in its <tt>toString()</tt> method.   

 <P>The following WEB4J classes will automatically escape special characters
 for you, when needed :
 <ul>
 <li>the {@link SafeText} class, used as a building block class for your
 application‘s Model Objects, for modeling all free form user input
 <li>the {@link Populate} tag used with forms
 <li>the {@link Report} class used for creating quick reports
 <li>the {@link Text}, {@link TextFlow}, and {@link Tooltips} custom tags used
 for translation
 </ul>
*/
public final class EscapeChars {

  /**
    Escape characters for text appearing in HTML markup.

    <P>This method exists as a defence against Cross Site Scripting (XSS) hacks.
    The idea is to neutralize control characters commonly used by scripts, such that
    they will not be executed by the browser. This is done by replacing the control
    characters with their escaped equivalents.
    See {@link hirondelle.web4j.security.SafeText} as well.

    <P>The following characters are replaced with corresponding
    HTML character entities :
    <table border=‘1‘ cellpadding=‘3‘ cellspacing=‘0‘>
    <tr><th> Character </th><th>Replacement</th></tr>
    <tr><td> < </td><td> &lt; </td></tr>
    <tr><td> > </td><td> &gt; </td></tr>
    <tr><td> & </td><td> &amp; </td></tr>
    <tr><td> " </td><td> &quot;</td></tr>
    <tr><td> \t </td><td> 	</td></tr>
    <tr><td> ! </td><td> !</td></tr>
    <tr><td> # </td><td> #</td></tr>
    <tr><td> $ </td><td> $</td></tr>
    <tr><td> % </td><td> %</td></tr>
    <tr><td> ‘ </td><td> '</td></tr>
    <tr><td> ( </td><td> (</td></tr>
    <tr><td> ) </td><td> )</td></tr>
    <tr><td> * </td><td> *</td></tr>
    <tr><td> + </td><td> + </td></tr>
    <tr><td> , </td><td> , </td></tr>
    <tr><td> - </td><td> - </td></tr>
    <tr><td> . </td><td> . </td></tr>
    <tr><td> / </td><td> / </td></tr>
    <tr><td> : </td><td> :</td></tr>
    <tr><td> ; </td><td> ;</td></tr>
    <tr><td> = </td><td> =</td></tr>
    <tr><td> ? </td><td> ?</td></tr>
    <tr><td> @ </td><td> @</td></tr>
    <tr><td> [ </td><td> [</td></tr>
    <tr><td> \ </td><td> \</td></tr>
    <tr><td> ] </td><td> ]</td></tr>
    <tr><td> ^ </td><td> ^</td></tr>
    <tr><td> _ </td><td> _</td></tr>
    <tr><td> ` </td><td> `</td></tr>
    <tr><td> { </td><td> {</td></tr>
    <tr><td> | </td><td> |</td></tr>
    <tr><td> } </td><td> }</td></tr>
    <tr><td> ~ </td><td> ~</td></tr>
    </table>

    <P>Note that JSTL‘s {@code <c:out>} escapes <em>only the first
    five</em> of the above characters.
   */
   public static String forHTML(String aText){
     final StringBuilder result = new StringBuilder();
     final StringCharacterIterator iterator = new StringCharacterIterator(aText);
     char character =  iterator.current();
     while (character != CharacterIterator.DONE ){
       if (character == ‘<‘) {
         result.append("&lt;");
       }
       else if (character == ‘>‘) {
         result.append("&gt;");
       }
       else if (character == ‘&‘) {
         result.append("&amp;");
      }
       else if (character == ‘\"‘) {
         result.append("&quot;");
       }
       else if (character == ‘\t‘) {
         addCharEntity(9, result);
       }
       else if (character == ‘!‘) {
         addCharEntity(33, result);
       }
       else if (character == ‘#‘) {
         addCharEntity(35, result);
       }
       else if (character == ‘$‘) {
         addCharEntity(36, result);
       }
       else if (character == ‘%‘) {
         addCharEntity(37, result);
       }
       else if (character == ‘\‘‘) {
         addCharEntity(39, result);
       }
       else if (character == ‘(‘) {
         addCharEntity(40, result);
       }
       else if (character == ‘)‘) {
         addCharEntity(41, result);
       }
       else if (character == ‘*‘) {
         addCharEntity(42, result);
       }
       else if (character == ‘+‘) {
         addCharEntity(43, result);
       }
       else if (character == ‘,‘) {
         addCharEntity(44, result);
       }
       else if (character == ‘-‘) {
         addCharEntity(45, result);
       }
       else if (character == ‘.‘) {
         addCharEntity(46, result);
       }
       else if (character == ‘/‘) {
         addCharEntity(47, result);
       }
       else if (character == ‘:‘) {
         addCharEntity(58, result);
       }
       else if (character == ‘;‘) {
         addCharEntity(59, result);
       }
       else if (character == ‘=‘) {
         addCharEntity(61, result);
       }
       else if (character == ‘?‘) {
         addCharEntity(63, result);
       }
       else if (character == ‘@‘) {
         addCharEntity(64, result);
       }
       else if (character == ‘[‘) {
         addCharEntity(91, result);
       }
       else if (character == ‘\\‘) {
         addCharEntity(92, result);
       }
       else if (character == ‘]‘) {
         addCharEntity(93, result);
       }
       else if (character == ‘^‘) {
         addCharEntity(94, result);
       }
       else if (character == ‘_‘) {
         addCharEntity(95, result);
       }
       else if (character == ‘`‘) {
         addCharEntity(96, result);
       }
       else if (character == ‘{‘) {
         addCharEntity(123, result);
       }
       else if (character == ‘|‘) {
         addCharEntity(124, result);
       }
       else if (character == ‘}‘) {
         addCharEntity(125, result);
       }
       else if (character == ‘~‘) {
         addCharEntity(126, result);
       }
       else {
         //the char is not a special one
         //add it to the result as is
         result.append(character);
       }
       character = iterator.next();
     }
     return result.toString();
  }

  /**
   Escape all ampersand characters in a URL. 

   <P>Replaces all <tt>‘&‘</tt> characters with <tt>‘&amp;‘</tt>.

  <P>An ampersand character may appear in the query string of a URL.
   The ampersand character is indeed valid in a URL.
   <em>However, URLs usually appear as an <tt>HREF</tt> attribute, and
   such attributes have the additional constraint that ampersands
   must be escaped.</em>

   <P>The JSTL <c:url> tag does indeed perform proper URL encoding of
   query parameters. But it does not, in general, produce text which
   is valid as an <tt>HREF</tt> attribute, simply because it does
   not escape the ampersand character. This is a nuisance when
   multiple query parameters appear in the URL, since it requires a little
   extra work.
  */
  public static String forHrefAmpersand(String aURL){
    return aURL.replace("&", "&amp;");
  }

  /**
    Synonym for <tt>URLEncoder.encode(String, "UTF-8")</tt>.

    <P>Used to ensure that HTTP query strings are in proper form, by escaping
    special characters such as spaces.

    <P>It is important to note that if a query string appears in an <tt>HREF</tt>
    attribute, then there are two issues - ensuring the query string is valid HTTP
    (it is URL-encoded), and ensuring it is valid HTML (ensuring the
    ampersand is escaped).
   */
   public static String forURL(String aURLFragment){
     String result = null;
     try {
       result = URLEncoder.encode(aURLFragment, "UTF-8");
     }
     catch (UnsupportedEncodingException ex){
       throw new RuntimeException("UTF-8 not supported", ex);
     }
     return result;
   }

  /**
   Escape characters for text appearing as XML data, between tags.

   <P>The following characters are replaced with corresponding character entities :
   <table border=‘1‘ cellpadding=‘3‘ cellspacing=‘0‘>
   <tr><th> Character </th><th> Encoding </th></tr>
   <tr><td> < </td><td> &lt; </td></tr>
   <tr><td> > </td><td> &gt; </td></tr>
   <tr><td> & </td><td> &amp; </td></tr>
   <tr><td> " </td><td> &quot;</td></tr>
   <tr><td> ‘ </td><td> '</td></tr>
   </table>

   <P>Note that JSTL‘s {@code <c:out>} escapes the exact same set of
   characters as this method. <span class=‘highlight‘>That is, {@code <c:out>}
    is good for escaping to produce valid XML, but not for producing safe
    HTML.</span>
  */
  public static String forXML(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == ‘<‘) {
        result.append("&lt;");
      }
      else if (character == ‘>‘) {
        result.append("&gt;");
      }
      else if (character == ‘\"‘) {
        result.append("&quot;");
      }
      else if (character == ‘\‘‘) {
        result.append("'");
      }
      else if (character == ‘&‘) {
         result.append("&amp;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }

  /**
   Escapes characters for text appearing as data in the
   <a href=‘http://www.json.org/‘>Javascript Object Notation</a>
   (JSON) data interchange format.

   <P>The following commonly used control characters are escaped :
   <table border=‘1‘ cellpadding=‘3‘ cellspacing=‘0‘>
   <tr><th> Character </th><th> Escaped As </th></tr>
   <tr><td> " </td><td> \" </td></tr>
   <tr><td> \ </td><td> \\ </td></tr>
   <tr><td> / </td><td> \/ </td></tr>
   <tr><td> back space </td><td> \b </td></tr>
   <tr><td> form feed </td><td> \f </td></tr>
   <tr><td> line feed </td><td> \n </td></tr>
   <tr><td> carriage return </td><td> \r </td></tr>
   <tr><td> tab </td><td> \t </td></tr>
   </table>

   <P>See <a href=‘http://www.ietf.org/rfc/rfc4627.txt‘>RFC 4627</a> for more information.
  */
  public static String forJSON(String aText){
    final StringBuilder result = new StringBuilder();
    StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character = iterator.current();
    while (character != StringCharacterIterator.DONE){
      if( character == ‘\"‘ ){
        result.append("\\\"");
      }
      else if(character == ‘\\‘){
        result.append("\\\\");
      }
      else if(character == ‘/‘){
        result.append("\\/");
      }
      else if(character == ‘\b‘){
        result.append("\\b");
      }
      else if(character == ‘\f‘){
        result.append("\\f");
      }
      else if(character == ‘\n‘){
        result.append("\\n");
      }
      else if(character == ‘\r‘){
        result.append("\\r");
      }
      else if(character == ‘\t‘){
        result.append("\\t");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }

  /**
   Return <tt>aText</tt> with all <tt>‘<‘</tt> and <tt>‘>‘</tt> characters
   replaced by their escaped equivalents.
  */
  public static String toDisableTags(String aText){
    final StringBuilder result = new StringBuilder();
    final StringCharacterIterator iterator = new StringCharacterIterator(aText);
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      if (character == ‘<‘) {
        result.append("&lt;");
      }
      else if (character == ‘>‘) {
        result.append("&gt;");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }

  /**
   Replace characters having special meaning in regular expressions
   with their escaped equivalents, preceded by a ‘\‘ character.

   <P>The escaped characters include :
  <ul>
  <li>.
  <li>  <li>?, * , and +
  <li>&
  <li>:
  <li>{ and }
  <li>[ and ]
  <li>( and )
  <li>^ and $
  </ul>
  */
  public static String forRegex(String aRegexFragment){
    final StringBuilder result = new StringBuilder();

    final StringCharacterIterator iterator =
      new StringCharacterIterator(aRegexFragment)
    ;
    char character =  iterator.current();
    while (character != CharacterIterator.DONE ){
      /*
       All literals need to have backslashes doubled.
      */
      if (character == ‘.‘) {
        result.append("\\.");
      }
      else if (character == ‘\\‘) {
        result.append("\\\\");
      }
      else if (character == ‘?‘) {
        result.append("\\?");
      }
      else if (character == ‘*‘) {
        result.append("\\*");
      }
      else if (character == ‘+‘) {
        result.append("\\+");
      }
      else if (character == ‘&‘) {
        result.append("\\&");
      }
      else if (character == ‘:‘) {
        result.append("\\:");
      }
      else if (character == ‘{‘) {
        result.append("\\{");
      }
      else if (character == ‘}‘) {
        result.append("\\}");
      }
      else if (character == ‘[‘) {
        result.append("\\[");
      }
      else if (character == ‘]‘) {
        result.append("\\]");
      }
      else if (character == ‘(‘) {
        result.append("\\(");
      }
      else if (character == ‘)‘) {
        result.append("\\)");
      }
      else if (character == ‘^‘) {
        result.append("\\^");
      }
      else if (character == ‘$‘) {
        result.append("\\$");
      }
      else {
        //the char is not a special one
        //add it to the result as is
        result.append(character);
      }
      character = iterator.next();
    }
    return result.toString();
  }

  /**
   Escape <tt>‘$‘</tt> and <tt>‘\‘</tt> characters in replacement strings.

   <P>Synonym for <tt>Matcher.quoteReplacement(String)</tt>.

   <P>The following methods use replacement strings which treat
   <tt>‘$‘</tt> and <tt>‘\‘</tt> as special characters:
   <ul>
   <li><tt>String.replaceAll(String, String)</tt>
   <li><tt>String.replaceFirst(String, String)</tt>
   <li><tt>Matcher.appendReplacement(StringBuffer, String)</tt>
   </ul>

   <P>If replacement text can contain arbitrary characters, then you
   will usually need to escape that text, to ensure special characters
   are interpreted literally.
  */
  public static String forReplacementString(String aInput){
    return Matcher.quoteReplacement(aInput);
  }

  /**
   Disable all <tt><SCRIPT></tt> tags in <tt>aText</tt>.

   <P>Insensitive to case.
  */
  public static String forScriptTagsOnly(String aText){
    String result = null;
    Matcher matcher = SCRIPT.matcher(aText);
    result = matcher.replaceAll("&lt;SCRIPT>");
    matcher = SCRIPT_END.matcher(result);
    result = matcher.replaceAll("&lt;/SCRIPT>");
    return result;
  }

  // PRIVATE //

  private EscapeChars(){
    //empty - prevent construction
  }

  private static final Pattern SCRIPT = Pattern.compile(
    "<SCRIPT>", Pattern.CASE_INSENSITIVE
   );
  private static final Pattern SCRIPT_END = Pattern.compile(
    "</SCRIPT>", Pattern.CASE_INSENSITIVE
  );

  private static void addCharEntity(Integer aIdx, StringBuilder aBuilder){
    String padding = "";
    if( aIdx <= 9 ){
       padding = "00";
    }
    else if( aIdx <= 99 ){
      padding = "0";
    }
    else {
      //no prefix
    }
    String number = padding + aIdx.toString();
    aBuilder.append("&#" + number + ";");
  }
}
 
时间: 2024-10-14 07:12:37

过滤特殊输入字符(Java)的相关文章

java学习从控制台接收输入字符

java学习从控制台接收输入字符 工具:netbeans System类除了out和err两个输出流之外,还有in输入流的实现. 随便创建一个类,我这里是用helloword类,在该类的主方法中创建Scanner扫描来封装System类的输入流,然后提示用户输入身份证号码并输出用户身份证号码的位数. 1 package helloword; 2 import java.util.Scanner; 3 /** 4 * 5 * @author Administrator 6 */ 7 public

【API】反转输入字符(Java)

请求输入字符, 输出反转. 1 import java.util.Scanner; 2 3 public class T01 { 4 5 public static void main(String[] args) { 6 /** 7 * get str from kbd 8 * out a new str that sakasamad 9 */ 10 System.out.print("in put a str : | "); 11 String str = new Scanner(

android 中如何限制 EditText 最大输入字符数

方法一: 在 xml 文件中设置文本编辑框属性作字符数限制 如:android:maxLength="10" 即限制最大输入字符个数为10 方法二: 在代码中使用InputFilter 进行过滤 //editText.setFilters(new InputFilter[]{new InputFilter.LengthFilter(20)}); 即限定最大输入字符数为20 [java] view plaincopy public class TextEditActivity exten

(转)Android EditText限制输入字符的5种实现方式

最近项目要求限制密码输入的字符类型, 例如不能输入中文.   现在总结一下EditText的各种实现方式,  以比较各种方法的优劣. 第一种方式:  设置EditText的inputType属性,可以通过xml或者java文件来设置.假如我要设置为显示密码的形式,可以像下面这样设置: 在xml中,   android:inputType="textPassword" 在java文件中,可以用 ev.setInputType(InputType.TYPE_TEXT_VARIATION_P

常用输入字符流Reader

Reader是用于输入字符数据的,它所根据的 方法跟InputStream基本一样.它是所有输入字符流的抽象父类,因此不能直接构建Reader的实例,必须通过它的子类来构建.以下是几个常用的子类: 1.字符数组作为输入源--CharArrayReader CharArrayReader包含一个内部缓冲区,该缓冲区包括从流中读取的字符数组.所谓内存缓存区,就是对应了内存中存在的字符数组,因此可以根据字符数组来创建该类的实例.它有以下两个构造函数: CharArrayReader(char[] bu

JS(javascript)动态判断输入文本框剩余可输入字符数

一.描述 我们在空间中发表状态,当我们输入一个字符,上面的剩余可输入字符数就会减一,直到输入的字符数达到之前设定的最大数量为止,效果如下图所示: 二.实现方法 首先,我们先确定文本框内的最大可输入长度,其次在输入一个字符抬起键盘的时候对输入文本框中的字符长度进行验证,并在动态显示在剩余可输入字符数中. 三.源代码 <%@ page language="java" pageEncoding="UTF-8"%> <%@ taglib uri="

EditText限制输入字符类型的几种方式

?       近期的项目上须要限制EditText输入字符的类型,就把能够实现这个功能的方法整理了一下: 1.第一种方式是通过EditText的inputType来实现,能够通过xml或者java文件来设置.假如我要设置为显示password的形式,能够像以下这样设置: 在xml中.   android:inputType="textPassword" 在java文件里,能够用 myEditText.setInputType(InputType.TYPE_TEXT_VARIATION

Android EditText的输入监听,输入字符的动态获取

http://itindex.net/detail/38974-android-edittext-%E7%9B%91%E5%90%AC 有时候我们可能会用到时时的监听EditText输入字符的时时监听,监听字符的个数,做一些正则表达式的处理等.如下方法可以实现: 我做的是时时的把EditeText输入的数据同步到TextView上 布局文件: <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android&

EditText限制输入字符的类型

?       最近的项目上需要限制EditText输入字符的类型,就把可以实现这个功能的方法整理了一下: 1.第一种方式是通过EditText的inputType来实现,可以通过xml或者java文件来设置.假如我要设置为显示密码的形式,可以像下面这样设置: 在xml中,   android:inputType="textPassword" 在java文件中,可以用 myEditText.setInputType(InputType.TYPE_TEXT_VARIATION_PASSW