i'm building document-sharing platform, , attract many users possible, want add 10 000 documents platform. documents pdf files. i'm working symfony2, guess doesn't change problem: how can extract metadata need these documents (for example, title, first 100 words description) automatically , insert database (in case, hydrate entities, know part).
i guess crawler i'm looking have no idea find nor how make work.
thanks in advance!
well don't have real question:
- define document types/formats allow
- google how read each document type php (php-functions, libraries, code-snippets)
- determine file type of uploaded documents
- read files in php using googled funcs, libs etc.
when have done , have specific problem: ask real question ;)
Comments
Post a Comment