i'm having trouble understanding complexities of lucene. appreciated.
we're using windows azure blob store our lucene index, lucene.net , azuredirectory. workerrole contains indexwriter, , adds 20,000 or more records day, , changes small number (fewer 100) of existing documents. webrole on different box set take 2 snapshots of index (into azuredirectory), alternating between two, , telling webservice directory use becomes available.
the webservice has 2 indexsearchers alternate, reloading next snapshot ready--one indexsearcher supposed handle client requests @ time (until newer snapshot ready). indexsearcher takes long time (minutes) instantiate, , other times it's fast (a few seconds). since directory physically on disk (not using blob @ stage), expected fast operation, 1 confusing point.
we're around 8 million records. lucene search used fast (it great), it's slow. try improve this, we've started indexwriter.optimize index once day after up--some resources online indicated optimize not required often-changing indexes, other resources indicate optimization required, we're not sure.
the big problem whenever our web site has more traffic single user, we're getting timeouts on lucene search. we're trying figure out if there's bottleneck @ indexsearcher object. it's supposed thread-safe, seems blocking requests single search performed @ time. box azure vm, set medium size has lots of resources available.
thanks whatever insight can provide. obviously, can provide more detail if have further questions, think start.
i have larger indexes , have not run these issues (~100 million records).
- put indexes in memory if can (8 million records sounds should fit memory depending on amount of analyzed fields etc.) can use ramdirectory cache directory
indexsearcher thread-safe , supposed re-used, not sure if reality. in lucene 3.5 (java version) have searchermanager class manages multiple threads you. http://java.dzone.com/news/lucenes-searchermanager
also non-lucene post, if on extra-large+ vm make sure taking advantage of of cores. if have web api/asp.net front-end it, calls should asynchronous.
Comments
Post a Comment