HtmlUnit解析document时js不能拿到body对象问题解决

login.html

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=GBK" /> 
    </head>
    <script>
    function getContent(){ 
        var url= "result.html"; 
        var xhr=new (window.XMLHttpRequest||window.ActiveXObject)("Microsoft.XMLHTTP");
        xhr.onreadystatechange = function() { 
            if (xhr.readyState == 4 && xhr.status == 200) {
                document.write(xhr.responseText); 
                document.close(); 
            }
        }; 
        xhr.open("GET",url,false); xhr.send(); 
    }
    getContent(); 
    </script>
 </html>

result.html

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=GBK"/>
        <title>login</title> 
    </head>
    <body>
        <script type="text/javascript"> 
            var d = document, b = d.body; 
            var n = d.createElement("div"); 
            n.innerHTML = "<div> I was appended... </div>"; 
            b.appendChild(n); 
        </script>
    </body>
</html>

Test.java

@Test public void testExecScript() throws Exception { 
    WebClient client = new WebClient(BrowserVersion.CHROME);
    client.getOptions().setUseInsecureSSL(true);
    client.getOptions().setJavaScriptEnabled(true); 
    client.getOptions().setCssEnabled(false); 
    client.getOptions().setThrowExceptionOnScriptError(false); 
    client.getOptions().setTimeout(10000); 
    String url = "http://localhost/login.html";
    HtmlPage loginPage = client.getPage(url); 
    logger.info("{}\n{}", loginPage.getTitleText(), loginPage.asXml()); 
}

error output:

EcmaError: lineNumber=[1] column=[0] lineSource=[<no source>] name=[TypeError] sourceName=[script in http://localhost/login.html from (1, 1462) to (1, 1780)] message=[TypeError: Cannot call method "appendChild" of null (script in login.html from (1, 1462) to (1, 1780)#1)]com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "appendChild" of null (script in login.html from (1, 1462) to (1, 1780)#1)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:847)at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:708)at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:982)at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:276)at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)at org.cyberneko.html.HTMLScanner.evaluateInputSource(HTMLScanner.java:608)at org.cyberneko.html.HTMLConfiguration.evaluateInputSource(HTMLConfiguration.java:342)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.pushInputString(HTMLParser.java:420)at com.gargoylesoftware.htmlunit.html.HtmlPage.writeInParsedStream(HtmlPage.java:2375)at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.write(HTMLDocument.java:683)at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.write(HTMLDocument.java:569)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153)at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:384)at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:772)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:779)at com.gargoylesoftware.htmlunit.javascript.host.xml.XMLHttpRequest.setState(XMLHttpRequest.java:218)at com.gargoylesoftware.htmlunit.javascript.host.xml.XMLHttpRequest.doSend(XMLHttpRequest.java:762)at com.gargoylesoftware.htmlunit.javascript.host.xml.XMLHttpRequest.send(XMLHttpRequest.java:598)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153)at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:448)at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:708)at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:982)at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:276)at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:345)at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:410)at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395)at com.aduan.study.test.web.crawler.HtmlUnitTest.testExecScript(HtmlUnitTest.java:186)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)at org.junit.runners.ParentRunner.run(ParentRunner.java:309)at org.junit.runner.JUnitCore.run(JUnitCore.java:160)at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


解决方法1:

删掉result.html的head

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=GBK"/>
    <title>login</title> 
</head>

或login.html添加body

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=GBK" /> </head><body><script>function getContent(){ 
    var url= "result.html"; 
    var xhr=new (window.XMLHttpRequest||window.ActiveXObject)("Microsoft.XMLHTTP");
    xhr.onreadystatechange = function() { 
        if (xhr.readyState == 4 && xhr.status == 200) {
            document.write(xhr.responseText); 
            document.close(); 
        }
    }; 
    xhr.open("GET",url,false); xhr.send(); }getContent(); </script>
</head>
<body>
</body>
</html>

result.html javascript add

d.write("d:" + d + "<br/>");d.write("b:" + b + "<br/>");

ouput:

d:[object HTMLDocument]<br/>b:[object HTMLBodyElement]<br/><div>
  <div>
     I was appended... 
  </div></div>

解决方法2:webclient处理返回信息,加上body

client.setWebConnection(
        new WebConnectionWrapper(client) {
            public WebResponse getResponse(WebRequest request) throws IOException {
                WebResponse response = super.getResponse(request);
                String content = response.getContentAsString("UTF-8");
                if(content != null) {
                    if(!content.contains("<body>") && content.contains("</head>")) {
                        content = content.replace("</head>", "</head>\n<body>");
                        if(!content.contains("</body>") && content.contains("</html>")) {
                            content = content.replace("</html>", "</body>\n</html>");
                        }
                    }
                }
                logger.info("response: {}", content);
                WebResponseData data = new WebResponseData(content.getBytes("UTF-8"),
                        response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
                response = new WebResponse(data, request, response.getLoadTime());
                return response;
            }
        });
时间: 2024-11-05 20:49:41

HtmlUnit解析document时js不能拿到body对象问题解决的相关文章

使用POI解析Excel时,出现org.xml.sax.SAXParseException: duplicate attribute &#39;o:relid&#39;的解决办法

1.使用org.apache.poi解析excle,.xlsx类型文件InputStream is = new FileInputStream(strFileName);XSSFWorkbook wb = new XSSFWorkbook(is);出现异常如下: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetExceptionat org.apache.poi.xssf.usermodel.XSSFFactor

Jsoup解析html时对相对地址的处理

前一段时间运用htmlparser时,获取地址是时直接将html页面的相对地址转换成绝对地址,然而今天在运用jsoup,开始的时候发现只能得到相对地址,最后在网上寻找到了解决办法. htmlparser获得URLs: //参数说明:parser为模拟浏览器对URL地址操 dataPath:所解析的URL存放路径 dataName:存放URL的数据库名 public static void extractURL(final Parser parser, final String dataPath,

使用JsonKit解析中文时注意事项

使用JsonKit解析中文时如果返回值为空,可能使用因为计算长度时不准确造成的. 解决方法: NSString *pricePath = [[NSBundle mainBundle] pathForResource:@"CarSelectPrice" ofType:@"json"]; NSString *priceStr = [[NSString alloc] initWithContentsOfFile:pricePath encoding:NSUTF8Strin

IE在开发工具启动的情况下(打开F12)时 JS才能执行

在开发一个项目时遇到一个bug:在360急速浏览器的兼容模式下并且是线上环境时js无法执行(360急速浏览器的兼容模式下测试环境就ok), 打开f12以后刷新就没问题了,查了一下网上说的IE6/7是没有console对象的,IE8/9只有在打开F12的时候才会创建console对象, 但是我的项目是兼容ie789的,平时用console.log也没有出现错误,而且这次的bug只有在360急速浏览器的兼容模式下才出现, 我觉得应该是某些浏览器一些版本会有上述在打开F12的时候才有console对象

dom4j解析xml时取消DTD验证

解决方式整合一下,就分两种: 1.用setFeature() SAXReader reader = new SAXReader();reader.setValidation(false); reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);... 2.用setEntityResolver() SAXReader reader = new SAXReader

Hexo 编译时 JS 混乱解决方案

之前为在 GitHub 博客上写自己的 html 网页,在 hexo 根目录下的 source 文档夹中放了自己的 js 文档,每次 hexo g 后,pubic 文档夹中的 js 文档就混乱了,与自己实际 js 不一致. 解决方案: hexo g 编译后在 public 文档夹中修改,将自己的 js 文档覆盖 hexo 编译后的 js 文档.或者 hexo d 提交到 GitHub 后直接到 GitHub 线上修改 js 文档. 博客中的 js 引用使用外链的形式.如将 js 文档上传七牛云,

JS搞基指南----延迟对象入门提高资料整理

原文:JS搞基指南----延迟对象入门提高资料整理 JavaScript的Deferred是比较高大上的东西,  主要的应用还是主ajax的应用,  因为JS和nodeJS这几年的普及,  前端的代码越来越多,  各种回调套回调再套回调实在太让人崩溃, 所以就从后端拖了一个延迟对象这货, 用来解决回调地狱这个问题 .  我们使用ajax的时候多数都是为ajax添加回调 ,然后异步向服务器发送请求, 比如如下原生的XHR代码: <!DOCTYPE html PUBLIC "-//W3C//D

javascript不依赖JS加载顺序事件对象实现

背景: 在现在WEB开发中,稍复杂一点的页面,都会涉及到多个模块,尤其是类似seajs.LABjs.requireJS等模块工具出来后,前端开发者分模块开发已经慢慢变成一种习惯了,但是多个模块间的常常有各种交集,需要通信.需要互相调用,这时监听者模式便是必不可少的一种设计模式了,在前端表现事件操作.backbone和jquery都有提供了很好的事件处理方式. 但是,真正开发需求的时候我们常常会遇到文件加载顺序跟事件监听与更新不一致的需求,比如说:在一个视频网站里面,有一个视频处理的JS模块和用户

JS和JQ的event对象对比和应用

摘要 js和jq的event对象大同小异,本文简单对比下它们的'click'事件下的不同和应用 js jquery jq event 代码测试: ? 1 2 3 4 5 6 7 <div id= "test" ><p>test text<p></div> <script src= "vendor/jquery-2.1.1.js" ></script> <script> test.ad