python - Parsing html form tags with Beautiful Soup -


i try parse sites html forms. there no problem if there 1 opening , 1 closing form-tag. first realized problem parsing http://www.w3schools.com/html/html_forms.asp

if there 2 form-tags or more strange behavior, closing form-tags moving end of document. have same problem?

here basic example webpage:

<!doctype html> <html lang="en-us"> <head> <title>html forms , input</title> </head> <body> <p>stuff , on</p> <form> first name: <input type="text" name="firstname" size="20"><br> last name: <input type="text" name="lastname" size="20"> </form> <p>some text</p> <form> first name: <input type="text" name="firstname" size="20"><br> last name: <input type="text" name="lastname" size="20"> </form> </body> </html> 

here code:

#!/usr/bin/python # -*- coding: utf-8 -*-  import urllib2 bs4 import beautifulsoup lsoup = beautifulsoup(open("forms2.html")) print lsoup 

and thats got:

<!doctype html> <html lang="en-us"><head> <title>html forms , input</title> </head> <body> <p>stuff , on</p> <form> first name: <input name="firstname" size="20" type="text"/><br/> last name: <input name="lastname" size="20" type="text"/> <p>some text</p> <form> first name: <input name="firstname" size="20" type="text"/><br/> last name: <input name="lastname" size="20" type="text"/> </form></form></body></html> 

any ideas?

thanks help!


Comments