curl - automating the login to the uk data service website in R with RCurl or httr -


i in process of writing collection of freely-downloadable r scripts http://asdfree.com/ people analyze complex sample survey data hosted uk data service. in addition providing lots of statistics tutorials these data sets, want automate download , importation of survey data. in order that, need figure out how programmatically log uk data service website.

i have tried lots of different configurations of rcurl , httr log in, i'm making mistake somewhere , i'm stuck. have tried inspecting elements as outlined in post, websites jump around fast in browser me understand what's going on.

this website require login , password, believe i'm making mistake before login page.

here's how website works:

the starting page should be: https://www.esds.ac.uk/secure/ukdsregister_start.asp

this page automatically re-direct web browser long url starts with: https://wayf.ukfederation.org.uk/ds002/uk.ds?[blahblahblah]

(1) reason, ssl certificate not work on website. here's the question posted regarding this. workaround i've used ignoring ssl:

library(httr) set_config( config( ssl.verifypeer = 0l ) ) 

and first command on starting website is:

z <- get( "https://www.esds.ac.uk/secure/ukdsregister_start.asp" ) 

this gives me z$url looks lot https://wayf.ukfederation.org.uk/ds002/uk.ds?[blahblahblah] page browser re-directs to.

in browser, then, you're supposed type in "uk data archive" , click continue button. when that, re-directs me web page https://shib.data-archive.ac.uk/idp/authn/userpassword

i think i'm stuck because cannot figure out how have curl followlocation , land on website. note: no username/password has been entered yet.

when use httr get command wayf.ukfederation.org.uk page this:

 y <- get( z$url , query = list( combobox = "https://shib.data-archive.ac.uk/shibboleth-idp" ) ) 

the y$url string looks lot z$url (except it's got combobox= on end). there way through uk data archive authentication page rcurl or httr?

i can't tell if i'm overlooking or if absolutely must use ssl certificate described in my previous post or what?

(2) @ point make through page, believe remainder of code be:

values <- list( j_username = "your.username" ,                  j_password = "your.password" ) post( "https://shib.data-archive.ac.uk/idp/authn/userpassword" , body = values) 

but guess page have wait...

the relevant data variables returned form action , origin, not combobox. give action value selection , origin value relevant entry in combobox

y <- get( z$url, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") ) > y$url [1] "https://shib.data-archive.ac.uk:443/idp/authn/userpassword" 

edit

it looks though handle pool isn't keeping session alive correctly. therefore need pass handles directly rather automatically. post command need set multipart=false default html forms. r command has different default designed uploading files. so:

y <- get( handle=z$handle, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") ) post(body=values,multipart=false,handle=y$handle) response [https://www.esds.ac.uk/]   status: 200   content-type: text/html  ...snipped...                       <title>                          introduction esds                  </title>                  <meta name="description" content="introduction esds, home page" /> 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

html - Unable to style the color of bullets in a list -

c# - must be a non-abstract type with a public parameterless constructor in redis -