Like login on page download some info and go out.
There is html parsers they can do such tasks
For example it can be login script for some browser game or mail account that doesnt allow
SMTP or SMTP is not for free.
For example there is web-browser game travian an it after some time playing
it becomes very boring to play because only thing that you do it waiting
while some game events take too many time. Like when you click upgdade
something than you need to wait some hours until finish.
Now here we will make login example.
We need external libraries:
httplib2 http://code.google.com/p/httplib2/
lxml http://lxml.de/
First thing that we need its to get page source.
conn = httplib2.Http("cache")
resp,cont = conn.request("http://travian.com")
After we have source we look on login form
<form method="post" name="snd" action="dorf1.php"> <input class="text" type="text" name="name" value=""> <input class="text" type="password" name="password" value="" maxlength="20"> <input type="image" value="login" name="s1" onclick="xy();" id="btn_login" class="dynamic_img"> <input type="hidden" name="w" value=""> <input type="hidden" name="login" value="1299937743"> </form>
As we see here is many inputs
As ther is only 1 form we dont check and simply take first form from array
from lxml.html import parse,tostring,fromstring,submit_form page = fromstring( cont ) form = page.forms[0] for inp in form.inputs: if inp.type == "text": inp.value = name if inp.type == "password": inp.value = password
Dont forget about method="post"
headers = {'Content-type': 'application/x-www-form-urlencoded'}
Now we are ready to send data and get cookie that will allow us
get inside the page
resp , cont = self.conn.request( self.server+"/"+form.action , "POST" , body=urllib.urlencode(body) , headers=headers )
Response has cookie that we need to save if would like to work with page in future
cookie = resp['set-cookie']
Also cookie is needed if whant to logout:
headers = { 'Content-type': 'application/x-www-form-urlencoded' }
headers = { 'Cookie': self.cookie }
body = {}
resp,cont = self.conn.request(self.server+"/logout.php", body=urllib.urlencode(body) , headers=headers)
As you see now cookie is inside headers. You should allways place cookie
inside headers if whant to be loged in. Because only cookie that you get at login
says for server that you are loged in and can see what is behind the wall.
Thers is also easy way how to access DOM components
With your favorite browser you can easly get DOM path to prefered tag in HTML source.
tmp = page.xpath("/html//div//div//div//div//p//span")
You can find some tag by class name using find_class()
Or get text content from tag with text_content()
tmp = page.xpath("/html//div//div//div//div//p//span")[2].find_class("none")[0].text_content()
To make your own script that can parse and get info you need only
reguest() find_class() text_content() xpath() fromstring()
It is very easy. Now you know everything to make your first script that can login on
you favorite page.