lucene - Why is my index directory so small when fields are compressed? -
The answer is probably in question, but I would like to ensure: -)
I have received 10000 documents Indexed Each has an area that stores a text which is actually 100KB larger (it comes from a text file that uses UTF-8). When this field is uncompressed, the index is 436 MB large, but when the field is compressed, it is only 11.4 MB, this will be a compression ratio of 37.5 - is it good to be true or is it not? Or is it possible that the data from the index directory is stored somewhere else on my computer?
When I retrieve the field then there is no error, everything is fine, but I certainly know from life that if anything is true, then there is definitely something wrong. : D
Here is the code:
/ / raw, do not search field type field type 2 = new field type (); FieldType2.setIndexed (incorrect); FieldType2.setTokenized (incorrect); FieldType2.setStored (true); FieldType2.setOmitNorms (true); FieldType2.setIndexOptions (FieldInfo.IndexOptions.DOCS_ONLY); FieldType2.freeze (); Field raw = new field ("raw", compression tool, compressress (text), field type 2); Doc.add (raw); The author of Compression Facility recommends: 76m -> 1.7m, so that your results can be comparable
.
And of course, it does not write files outside the configured directory, it will be a big bug.
Comments
Post a Comment