In my last project I was using Lucene for indexing and searching the documents. For parsing of documents I was using external libraries like PDFBox, POI and J-Tidy, which first parse the document and extract the data out of it then lucene was adding that extracted text to index.
But while searching, I wasn't able to search certain words which were at the bottom of the document. Almost for a day I was not able to find any reason for that.
Finally I came to know that By default Lucene can index only first 10000 words. So thats why I wasn't able to search few words. As there wont be any error's, it is difficult for User/developers to know about real problem.
So in order to index whole document you have to set the max_field_length of IndexWriter explicitly otherwise by default lucene will index only 10,000 words.
Sample Code Snippet :-
private static final int MAX_FIELD_LENGTH = 100000;//user configurable(By default 10000)
IndexWriter indexWriter = new IndexWriter(LUCENE_DIR,new StandardAnalyzer(),false);
indexWriter.setMaxFieldLength(MAX_FIELD_LENGTH);