Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
764 views
in Technique[技术] by (71.8m points)

web scraping - What if I want to web scrape with R for a page with parameters?

The page I would like to scrape here: http://stoptb.org/countries/tbteam/searchExperts.asp requires the submission of parameters in this page: http://stoptb.org/countries/tbteam/experts.asp in order to get the data out. Since the parameters are not nested in the URL, I don't know how to pass them with R. Is there a way to do this in R?

(BTW, I know next to nothing about ASP, so maybe that's the component I'm missing.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use RHTMLForms

You may need to install it first:

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R")

or under windows you may need

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R", type = "source")


 require(RHTMLForms)
 require(RCurl)
 require(XML)
 forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp")
 fun = createFunction(forms$sExperts)
 # find experts with expertise in "Infection control: Engineering Consultant"
 results <- fun(Expertise = "Infection control: Engineering Consultant")

 tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
 readHTMLTable(tableData[[1]])

#                              V1                   V2                     V3
#1                                                <NA>                   <NA>
#2                 Name of Expert Country of Residence                  Email
#3               Girmay, Desalegn             Ethiopia    [email protected]
#4            IVANCHENKO, VARVARA              Estonia [email protected]
#5                   JAUCOT, Alex              Belgium  [email protected]
#6 Mulder, Hans Johannes Henricus              Namibia        [email protected]
#7                    Walls, Neil            Australia        [email protected]
#8                 Zuccotti, Thea                Italy     [email protected]
#                  V4
#1               <NA>
#2 Number of Missions
#3                  0
#4                  3
#5                  0
#6                  0
#7                  0
#8                  1

or create a reader to return a table

 returnTable <- function(results){
  tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
  readHTMLTable(tableData[[1]])
 }
 fun = createFunction(forms$sExperts, reader = returnTable)
 fun(CBased = "Bhutan") # find experts based in Bhutan
#                 V1                   V2                      V3
#1                                   <NA>                    <NA>
#2    Name of Expert Country of Residence                   Email
#3 Wangchuk, Lungten               Bhutan [email protected]
#                  V4
#1               <NA>
#2 Number of Missions
#3                  2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...