curl - automating the login to the uk data service website in R with RCurl or httr -
i in process of writing collection of freely-downloadable r scripts http://asdfree.com/ people analyze complex sample survey data hosted uk data service. in addition providing lots of statistics tutorials these data sets, want automate download , importation of survey data. in order that, need figure out how programmatically log uk data service website.
i have tried lots of different configurations of rcurl , httr log in, i'm making mistake somewhere , i'm stuck. have tried inspecting elements as outlined in post, websites jump around fast in browser me understand what's going on.
this website require login , password, believe i'm making mistake before login page.
here's how website works:
the starting page should be: https://www.esds.ac.uk/secure/ukdsregister_start.asp
this page automatically re-direct web browser long url starts with: https://wayf.ukfederation.org.uk/ds002/uk.ds?[blahblahblah]
(1) reason, ssl certificate not work on website. here's the question posted regarding this. workaround i've used ignoring ssl:
library(httr) set_config( config( ssl.verifypeer = 0l ) )
and first command on starting website is:
z <- get( "https://www.esds.ac.uk/secure/ukdsregister_start.asp" )
this gives me z$url
looks lot https://wayf.ukfederation.org.uk/ds002/uk.ds?[blahblahblah]
page browser re-directs to.
in browser, then, you're supposed type in "uk data archive" , click continue
button. when that, re-directs me web page https://shib.data-archive.ac.uk/idp/authn/userpassword
i think i'm stuck because cannot figure out how have curl followlocation
, land on website. note: no username/password has been entered yet.
when use httr get
command wayf.ukfederation.org.uk page this:
y <- get( z$url , query = list( combobox = "https://shib.data-archive.ac.uk/shibboleth-idp" ) )
the y$url
string looks lot z$url
(except it's got combobox= on end). there way through uk data archive
authentication page rcurl or httr?
i can't tell if i'm overlooking or if absolutely must use ssl certificate described in my previous post or what?
(2) @ point make through page, believe remainder of code be:
values <- list( j_username = "your.username" , j_password = "your.password" ) post( "https://shib.data-archive.ac.uk/idp/authn/userpassword" , body = values)
but guess page have wait...
the relevant data variables returned form action
, origin
, not combobox
. give action
value selection
, origin
value relevant entry in combobox
y <- get( z$url, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") ) > y$url [1] "https://shib.data-archive.ac.uk:443/idp/authn/userpassword"
edit
it looks though handle pool isn't keeping session alive correctly. therefore need pass handles directly rather automatically. post
command need set multipart=false
default html forms. r command has different default designed uploading files. so:
y <- get( handle=z$handle, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") ) post(body=values,multipart=false,handle=y$handle) response [https://www.esds.ac.uk/] status: 200 content-type: text/html ...snipped... <title> introduction esds </title> <meta name="description" content="introduction esds, home page" />
Comments
Post a Comment