Python XML parsing example -


i need simplify data in xml able read single table, csv. found python 2.7 examples elementtree, far not tailor work further down tree, not collecting highest-level elements. repeat highest level element each of rows , rest.

i know , should rtfm, need solve problem asap sadly.

maybe xsd file linked help?

my data looks

<!-- moneymate (tm) xmlperfs application version 1.0.1.1 - copyright © 2000 moneymate limited. rights reserved. moneymate ® --> <!-- discrete perfs 180 periods monthly frequency --> <moneymate_xml_feed xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="http://mmia2.moneymate.com/xml/moneymatecomplete.xsd" version="1.0" calccurrency="sek"> <types> <type typecountry="se" typeid="85" typename="string" calctodate="2013-07-16"> <companies> <company companyid="25000068" companyname="string"/> …  <categories> <category categoryid="1101" categoryname="aktie -- asien"> <funds> <fund fundid="6201" fundname="string" fundcurrency="gbp" fundcompanyid="25000068"><performances><monthlyperfs><performancemonth perfendmonth="2006-05-31" perfmonth="-0.087670"/><performancemonth> … </performances></fund></funds> </category> <category categoryid="13" categoryname="räntefonder"> <funds></funds> </category> </categories> </type> </types> </moneymate_xml_feed> 

so hope see table data funds only, but:

fundid   fundname   fundcurrency   fundcompanyid   perfendmonth   perfmonth …        …          …              …               …              … 

etc.

and in csv file, did not want break formatting.

and please note perfmonth key, code did not wrap in box above data example.

i used lxml.

import csv  import lxml.etree  x = u'''<!-- moneymate (tm) xmlperfs application version 1.0.1.1 - copyright 2000 moneymate limited. rights reserved. moneymate --> <!-- discrete perfs 180 periods monthly frequency --> <moneymate_xml_feed xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="http://mmia2.moneymate.com/xml/moneymatecomplete.xsd" version="1.0" calccurrency="sek">     <types>         <type typecountry="se" typeid="85" typename="string" calctodate="2013-07-16">             <companies>                 <company companyid="25000068" companyname="string"/>                 <categories>                     <category categoryid="1101" categoryname="aktie -- asien">                         <funds>                             <fund fundid="6201" fundname="string" fundcurrency="gbp" fundcompanyid="25000068">                                 <performances>                                     <monthlyperfs>                                         <performancemonth perfendmonth="2006-05-31" perfmonth="-0.087670"/>                                     </monthlyperfs>                                 </performances>                             </fund>                         </funds>                     </category>                     <category categoryid="13" categoryname="rntefonder">                         <funds></funds>                     </category>                 </categories>             </companies>         </type>     </types> </moneymate_xml_feed> '''  open('output.csv', 'w') f:     writer = csv.writer(f)     writer.writerow(('fundid', 'fundname', 'fundcurrency', 'fundcompanyid', 'perfendmonth', 'perfmonth'))     root = lxml.etree.fromstring(x)     fund in root.iter('fund'):         perf = fund.find('.//performancemonth')         row = fund.get('fundid'), fund.get('fundname'), fund.get('fundcurrency'), fund.get('fundcompanyid'), perf.get('perfendmonth'), perf.get('perfmonth')         writer.writerow(row) 

note

given xml in question has mismatched tag. may need fix first.


Comments