php - How can I extract metadata from 10 000 PDF documents and store it in my database? -


i'm building document-sharing platform, , attract many users possible, want add 10 000 documents platform. documents pdf files. i'm working symfony2, guess doesn't change problem: how can extract metadata need these documents (for example, title, first 100 words description) automatically , insert database (in case, hydrate entities, know part).

i guess crawler i'm looking have no idea find nor how make work.

thanks in advance!

well don't have real question:

  • define document types/formats allow
  • google how read each document type php (php-functions, libraries, code-snippets)
  • determine file type of uploaded documents
  • read files in php using googled funcs, libs etc.

when have done , have specific problem: ask real question ;)


Comments