I'm still looking for just the 'right' combo of software for this.
I've currently settled on organizing everything (or almost everything) manually, i.e. by moving/sym-linking files in a directory tree.
For searching, I was using DocFetcher, but I'm (almost hypothetically) planning on (someday) building my own index/search tool based on the tools/software that DocFetcher uses, i.e. Apache Lucene (or maybe SOLR)?
Are the pdf files, scanned images or text? If they are text based then DocFetcher is a small program that can index the files and you can search within multiple pdf files in a folder.
. . . . .well fuck - thank you!
Now I just gotta scan all of these and make a database of all words - which you can do with DocFetcher (http://docfetcher.sourceforge.net/en/index.html) - its an offline database creator, sorta like google but for your own documents.
I use Foxit Reader, I read the manual and it can't do what you want. But you can do it in an other way and more efficiently.
Create a folder and put inside all the pdf files you want to be searchable. Download DocFetcher, it is free with a portable version available so you can try it without installing it.
Open it and create an index using as a source the folder you created in the previous step.
Now let's say a pdf file contains the paragraph below:
> In this chapter we’re going to walk through the installation of Asterisk from the source code. Many people shy away from this method, claiming that it is too difficult and time consuming.
If you search for "chapter * code" (note the double quotes) it will find a match (I tried it and it works). Double click and it will open the file in the specific point.
Plus, you will notice that DocFetcher can index many file types, not just pdfs.
I'd recommend DocFetcher. It's a small but powerful document search engine that can work with docs, PDFs, TXT files and many other formats. You create a full-text index once, and after that the program will find all keyword/phrase occurrences in no time, with a preview and a link to the file, similar to Google.
It's very easy to use, and free:
DocFetcher. It works with docs, PDFs, epubs, TXT, HTML files and many other formats. You create a full-text index once, and after that the program will find all keyword/phrase occurrences in no time, with a preview snippet and a link to the file. It's based on Java, so it's portable and can run on Windows and almost any other modern OS.
While Calibre is great for managing an e-book collection and e-book metadata, I'd recommend DocFetcher for the indexing of the actual book contents.
It's a small but powerful document search engine that can work with ePubs, PDFs, text files and many other (e-book) formats. You create a full-text index once, and after that the program will find all keyword/phrase occurrences in no time, with a preview and a link to the file. It supports advanced search operators and wildcards.
Haven't tried it, so I can't vouch for it, but maybe it's useful to you: http://docfetcher.sourceforge.net/en/index.html
> Powerful query syntax: In addition to basic constructs like OR, AND and NOT DocFetcher also supports, among other things: Wildcards, phrase search, fuzzy search ("find words that are similar to..."), proximity search ("these two words should be at most 10 words away from each other"), boosting ("increase the score of documents containing...")
Sounds indeed quite powerful.
So when I had intermediate internet connection, I exported all of my man pages to text, then parsed all of those into a document searcher called DocFetcher. Its sorta like a Google search for documents. (I'm sure I could learn some more grep-foo -- but this was easier) Any time I was off line but needed to figure something out, it would take a few hours to do, but I would hack my way there eventually.
Foxit Reader is quite fast, but not instant. It can also search through files in a folder.
Docfetcher can create an index out of your PDF files and provide instant search results. I suppose you could just feed it a single PDF. It's entirely free.
DocFetcher. It works with docs, PDFs, epubs, TXT, HTML files and many other formats. You create a full-text index once, and after that the program will find all keyword/phrase occurrences in no time, with a preview snippet and a link to the file. It's based on Java, so it's portable and can run on Windows and almost any other modern OS.
Answer will depend on how far you want to scale. Do you have a shit ton of files, shit ton of users, or both?
If you are just looking for a very simple solution to do content indexing for a few users I've used this. It's better than Windows Search's interface and can index network drives:
Sure there is. On macOS for instance, the built-in Spotlight search tool indexes ODT files, so you can just type words and it'll show ODT documents containing those words. But it's a matter for the operating system. On Linux, it looks like DocFetcher lets you search through ODT files. Or I saw this command, which should work (I'm not on Linux at the moment, so can't test though):
for file in *.odt; do unzip -c "$file" | grep -iq insertsearchtexthere && echo "$file"; done
YES!
One time I exported all of the man pages into text, and then used an indexer to create such a thing called: DocFetcher
http://docfetcher.sourceforge.net/en/index.html
It can also do the same thing to PDF's, so if you can find some "linux bible" pdfs its also a good resource to have it index.
DocFetcher. It's a small but powerful document search engine that can work with docs, PDFs, TXT files and many other formats. You create a full-text index once, and after that the program will find all keyword/phrase occurrences in no time, with a preview snippet and a link to the file.
And best of all - it's free...
If you're looking for a powerful yet free solution, I'd recommend DocFetcher.
DocFetcher is a document search engine. It works for PDFs, but also for text files, PST files, e-books and many other formats. You create a full-text index once, and the program finds all keyword occurrences, with a preview. In addition to advanced search operators, it also supports wildcards.
It seems you need more than a simple library tool, as most can only do simple searches (unless they support regex).
Another problem can be the scanned books. If by that you mean that the pages themselves are images instead of text, you would need an OCR tool to convert them before thye can be searched.
A document management system might have all of these features and more, but I couldn't really give you a recommendation among the free ones (the pricey ones should all be able to do all of that). Alfresco Community could be worth a try, though I can't guarantee that it will meet all of your requirements, as I haven't used it in years.
If no OCR is needed, DocFetcher might be useful. It allows proximity searches, for example "extracted individual regressor"~30 searches for all three words with a maximum distance of 30 words distance between them.
If that isn't enough, Apache Solr might help, but is a bit more complicated. It should have a search UI included, but if that isn't the case, one should easily be found somewhere else. There should be a couple of more tools based on lucene and/or solr, so you might have more look searching there.
I know this doesn't meet your requirement of 3rd party tool access, but I use DocFetcher (http://docfetcher.sourceforge.net/en/index.html). It works quite well at indexing files and will even search the contents in addition to the file name!
Where are your backups? I never been without a backup since 1998, with my first major hard drive crash. I been using a external hard drive since then. Now with the Internet I have two photo accounts online. I have Photobucket and Flickr. Any time you mess with your hard drive in a such way, like mess around with partitions. It is wise to have a backup before you do anything.
For finding the locations of images do this in the Terminal.
I personally use Docfetcher (a portable full-text indexer) for this kind of thing. No tagging but excellent Lucene based full-text search. For scans/photos I just give them a meaningful name that can be searched for.
Puggle is another option.