解决java中对URL编码的问题

首先查看javascript中的encodeURI和encodeURLComponent方法的区别.

encodeURI:不会对 ASCII 字母和数字进行编码，也不会对这些 ASCII 标点符号进行编码： - _ . ! ~ * ‘ ( ) 也不会对以下在 URI 中具有特殊含义的 ASCII 标点符　　　　　　号，encodeURI() 函数是不会进行转义的：;/?:@&=+$,#

encodeURLComponent:不会对 ASCII 字母和数字进行编码，也不会对这些 ASCII 标点符号进行编码： - _ . ! ~ * ‘ ( )

而java中,URLEncoder.encode(string content,String enc) 方法:

　　不会对 ASCII 字母和数字进行编码，也不会对这些 ASCII 标点符号进行编码： - _ . *

参考代码如下:

        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = ‘a‘; i <= ‘z‘; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = ‘A‘; i <= ‘Z‘; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = ‘0‘; i <= ‘9‘; i++) {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(‘ ‘); /* encoding a space to a + is done
                                    * in the encode() method */
        dontNeedEncoding.set(‘-‘);
        dontNeedEncoding.set(‘_‘);
        dontNeedEncoding.set(‘.‘);
        dontNeedEncoding.set(‘*‘);

如果我想要在java中对一个url进行编码,但是不对URI 中具有特殊含义的 ASCII 标点符号进行编码,需要在dontNeedEncoding中添加相关字符,创建自己的编码类MyURIEncode:

package com.sitech.solr.util;

import java.io.CharArrayWriter;
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.IllegalCharsetNameException;
import java.nio.charset.UnsupportedCharsetException;
import java.security.AccessController;
import java.util.BitSet;
import sun.security.action.GetPropertyAction;
public class MyURIEncoder {
    static BitSet dontNeedEncoding;
    static final int caseDiff = (‘a‘ - ‘A‘);
    static String dfltEncName = null;

    static {

        /* The list of characters that are not encoded has been
         * determined as follows:
         *
         * RFC 2396 states:
         * -----
         * Data characters that are allowed in a URI but do not have a
         * reserved purpose are called unreserved.  These include upper
         * and lower case letters, decimal digits, and a limited set of
         * punctuation marks and symbols.
         *
         * unreserved  = alphanum | mark
         *
         * mark        = "-" | "_" | "." | "!" | "~" | "*" | "‘" | "(" | ")"
         *
         * Unreserved characters can be escaped without changing the
         * semantics of the URI, but this should not be done unless the
         * URI is being used in a context that does not allow the
         * unescaped character to appear.
         * -----
         *
         * It appears that both Netscape and Internet Explorer escape
         * all special characters from this list with the exception
         * of "-", "_", ".", "*". While it is not clear why they are
         * escaping the other characters, perhaps it is safest to
         * assume that there might be contexts in which the others
         * are unsafe if not escaped. Therefore, we will use the same
         * list. It is also noteworthy that this is consistent with
         * O‘Reilly‘s "HTML: The Definitive Guide" (page 164).
         *
         * As a last note, Intenet Explorer does not encode the "@"
         * character which is clearly not unreserved according to the
         * RFC. We are being consistent with the RFC in this matter,
         * as is Netscape.
         *
         */

        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = ‘a‘; i <= ‘z‘; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = ‘A‘; i <= ‘Z‘; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = ‘0‘; i <= ‘9‘; i++) {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(‘ ‘); /* encoding a space to a + is done
                                    * in the encode() method */
        dontNeedEncoding.set(‘-‘);
        dontNeedEncoding.set(‘_‘);
        dontNeedEncoding.set(‘.‘);
        dontNeedEncoding.set(‘*‘);

        //对以下在 URI 中具有特殊含义的 ASCII 标点符号    ;/?:@&=+$,#  不需要转义
        dontNeedEncoding.set(‘;‘);
        dontNeedEncoding.set(‘/‘);
        dontNeedEncoding.set(‘?‘);
        dontNeedEncoding.set(‘:‘);
        dontNeedEncoding.set(‘@‘);
        dontNeedEncoding.set(‘&‘);
        dontNeedEncoding.set(‘=‘);
        dontNeedEncoding.set(‘+‘);
        dontNeedEncoding.set(‘$‘);
        dontNeedEncoding.set(‘,‘);
        dontNeedEncoding.set(‘#‘);

        dfltEncName = AccessController.doPrivileged(
            new GetPropertyAction("file.encoding")
        );
    }

    /**
     * You can‘t call the constructor.
     */
    private MyURIEncoder() { }

    public static String encode(String s, String enc)
        throws UnsupportedEncodingException {

        boolean needToChange = false;
        StringBuffer out = new StringBuffer(s.length());
        Charset charset;
        CharArrayWriter charArrayWriter = new CharArrayWriter();

        if (enc == null)
            throw new NullPointerException("charsetName");

        try {
            charset = Charset.forName(enc);
        } catch (IllegalCharsetNameException e) {
            throw new UnsupportedEncodingException(enc);
        } catch (UnsupportedCharsetException e) {
            throw new UnsupportedEncodingException(enc);
        }

        for (int i = 0; i < s.length();) {
            int c = (int) s.charAt(i);
            //System.out.println("Examining character: " + c);
            if (dontNeedEncoding.get(c)) {
                if (c == ‘ ‘) {
                    c = ‘+‘;
                    needToChange = true;
                }
                //System.out.println("Storing: " + c);
                out.append((char)c);
                i++;
            } else {
                // convert to external encoding before hex conversion
                do {
                    charArrayWriter.write(c);
                    /*
                     * If this character represents the start of a Unicode
                     * surrogate pair, then pass in two characters. It‘s not
                     * clear what should be done if a bytes reserved in the
                     * surrogate pairs range occurs outside of a legal
                     * surrogate pair. For now, just treat it as if it were
                     * any other character.
                     */
                    if (c >= 0xD800 && c <= 0xDBFF) {
                        /*
                          System.out.println(Integer.toHexString(c)
                          + " is high surrogate");
                        */
                        if ( (i+1) < s.length()) {
                            int d = (int) s.charAt(i+1);
                            /*
                              System.out.println("\tExamining "
                              + Integer.toHexString(d));
                            */
                            if (d >= 0xDC00 && d <= 0xDFFF) {
                                /*
                                  System.out.println("\t"
                                  + Integer.toHexString(d)
                                  + " is low surrogate");
                                */
                                charArrayWriter.write(d);
                                i++;
                            }
                        }
                    }
                    i++;
                } while (i < s.length() && !dontNeedEncoding.get((c = (int) s.charAt(i))));

                charArrayWriter.flush();
                String str = new String(charArrayWriter.toCharArray());
                byte[] ba = str.getBytes(charset);
                for (int j = 0; j < ba.length; j++) {
                    out.append(‘%‘);
                    char ch = Character.forDigit((ba[j] >> 4) & 0xF, 16);
                    // converting to use uppercase letter as part of
                    // the hex value if ch is a letter.
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                    ch = Character.forDigit(ba[j] & 0xF, 16);
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                }
                charArrayWriter.reset();
                needToChange = true;
            }
        }

        return (needToChange? out.toString() : s);
    }
}

时间： 2024-10-21 16:46:49

解决java中对URL编码的问题的相关文章

浅谈利用同步机制解决Java中的线程安全问题

我们知道大多数程序都不会是单线程程序,单线程程序的功能非常有限,我们假设一下所有的程序都是单线程程序,那么会带来怎样的结果呢?假如淘宝是单线程程序,一直都只能一个一个用户去访问,你要在网上买东西还得等着前面千百万人挑选购买,最后心仪的商品下架或者售空......假如饿了吗是单线程程序,那么一个用户得等前面全国千万个用户点完之后才能进行点餐,那饿了吗就该倒闭了不是吗?以上两个简单的例子,就说明一个程序能进行多线程并发访问的重要性,今天就让我们去了解一下Java中多线程并发访问这个方向吧. **第一

JAVA中BufferedReader设置编码的必要性

实验环境 Myeclipse 默认编码 UTF-8 先看两种读文件的方式: 方式一: InputStreamReader fReader = new InputStreamReader(new FileInputStream(filePathString),"UTF-8"); BufferedReader reader = new BufferedReader(fReader); String line; while ((line = reader.readLine()) != nul

转：JAVA中各种字符编码类型转换

import java.io.UnsupportedEncodingException; /** * 转换字符串的编码 */public class ChangeCharset { /** 7位ASCII字符,也叫作ISO646-US.Unicode字符集的基本拉丁块 */ public static final String US_ASCII = "US-ASCII"; /** ISO 拉丁字母表 No.1,也叫作 ISO-LATIN-1 */ public static final

在java中获取URL的域名或IP与端口

在java中获取URL的域名或IP与端口获取IP与域名方法一,使用正则表达式 public static String getIP(String url) { //使用正则表达式过滤, String re = "((http|ftp|https)://)(([a-zA-Z0-9._-]+)|([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}))(([a-zA-Z]{2,6})|(:[0-9]{1,4})?)"; String str = &quo

解决Java中There is no getter for property XXX'XXX' in 'class XXX'的问题

当你出现There is no getter for property XXX'XXX' in 'class XXX'时, 就是在你的这个类中没有找到你这个属性. 检查两个地方 1.你的返回值类型是否正确就是class 类路径是否正确, 我这里设置了别名,如果你没有设置别名是需要写全路径的. 2.检查你#{}中的参数是否和你实体类中的字段相同,必须相同他才会识别到. 而且是区分大小写的,如果你大小写不对应也是找不到的. 解决Java中There is no getter for prop

关于Cookie中的URL编码

今天对登录访问的安全以及web客户端存储做了一些大致的理解,在学习cookie的使用时发现其名称以及存储的字符串值是必须经过URL编码的. 然而网上的一些示例都没有做这一个动作,所以将参考阮一峰老师的关于URL编码博文做些解决分享: 一.问题的由来 URL就是网址,只要上网,就一定会用到. 一般来说,URL只能使用英文字母.阿拉伯数字和某些标点符号,不能使用其他文字和符号.比如,世界上有英文字母的网址"http://www.abc.com",但是没有希腊字母的网址"http:

解决java中ZipFile解压缩时候的中文路径和乱码问题

JAVA中对jar文件或zip文件解压的时候,能够使用JDK内置的API:JarFile和ZipFile,在windows下解压这2种格式文件的时候,常常报下面错误: Exception in thread "main" java.lang.IllegalArgumentException: MALFORMED at java.util.zip.ZipCoder.toString(ZipCoder.java:58) at java.util.zip.ZipFile.getZipEntr

使用myeclipse开发java，解决java中继承JFrame类出现The type JFrame is not accessible due to restriction的问题

在java中创建窗体,导入了java中的JFrame类,之后会出现错误: Access restriction: The type QName is not accessible due to restriction on required library D:\myeclipse professer2014 可以解决的办法为: Project—>Properties—>选中Java Build Path—>选择Libraries,出现下面界面: 选中窗口中原有的JRE库,点击Remov

详解JavaScript中的Url编码/解码，表单提交中网址编码

本文主要针对URI编解码的相关问题做了介绍,对Url编码中哪些字符需要编码.为什么需要编码做了详细的说明,并对比分析了Javascript 中和编解码相关的几对函数escape / unescape,encodeURI / decodeURI和 encodeURIComponent / decodeURIComponent. 预备知识 foo://example.com:8042/over/there?name=ferret#nose \_/ \______________/ \_______