c# - How to read JavaScript object with XPath/HTMLAgilityPack -


for crawler project, need product details javascript object.

how can object details following javascript? use xpath , htmlagilitypack.

<script type="text/javascript">     var product = {         identifier: '2051189775',     //product id         fn: 'fit- whiskered dark wash skirt',         category: ['sale'],         brand: 'brand name',         price: '22.90',  // discount price         amount: '31.80',  // original price         currency: 'usd',         //list can me more.     }; </script> 

i've not tried getting details javascript objects before. getting details directly html other crawlers.

since html agility pack doesn't evaluate of contents of html, javascript code should considered plain text. use selectsinglenode method find piece of javascript, grab innerhtml contents.

either find c# javascript parser (iron js example) or write parser using standard text manipulation techniques (string.* or regex extract bits you're after.

once have bits between curly brackets parse them using before mentioned parser or library json.net, since pieces between curly brackets seems valid json.


Comments