u = "http://www.rateinflation.com/consumer-price-index/usa-historical-cpi"
页面有三个form一个from一个to,还有一个提交按钮
forms = getHTMLFormDescription(u)
[[1]]
HTML Form: http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php
start-year: [ 2005 ] 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
end-year: [ 2015 ] 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
没有发现提交button,需要制定dropButtons = FALSE
得到一个为NULL的元素
.. ..$ NULL :List of 1
.. .. ..$ name: NULL
form[[1]]$formAttributes method "get" action "http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php" attr(,"class") [1] "HTMLFormAttributes"
发现是GET方式,而不是POST。提交1990-2000的查询
doc = htmlParse(getCPI(`start-year` = 1990,`end-year`=2000))
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.rateinflation.com/consumer-price-index/usa-historical-cpi?start-year=1990&end-year=2000">here</a>.</p>
</body>
</html>
重新生成了一个查询URL,那么
readHTMLTable(htmlParse(getHTMLLinks(doc)))
就可以读取页面的表格了。
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 Year jan feb mar apr may jun jul aug sep oct nov dec ann
2 2000 168.8 169.8 171.2 171.3 171.5 172.4 172.8 172.8 173.7 174 174.1 174 172.2
3 1999 164.3 164.5 165 166.2 166.2 166.2 166.7 167.1 167.9 168.2 168.3 168.3 166.6