I am currently doing a performance test on our server for classifying office files. We are trying to use FSRM for the job but so far, we are not very impressed and infact disappointed.
We setup 2 systems for this purpose and created 2 lac sample files (1 MB appx each) summing up to 200 GB data.
Windows 2012 R2 i3 3.30 GHz (2 core 4 logical processor) with 16 GB RAM. We have applied upto 7 Regular Expressions. It's taking us 24 hours to do the job.
The same task took us 11.5 hours on a Windows 2012 R2 Standard i7 3.40 GHz system with 32 GB RAM (4 core 8 logical processor)
At this rate, to classify a realistic 1 TB data, it will take 57 to 60 hours or 120 hours on the slower system. Optimistically, it's taking 4.6 seconds to classify 1MB file or 0.216 MB/sec
I have seen an article which mentioned FSRM speeds.
'Approxiamately 40 files are classified per second when the content classifier is used to classify office 2007 documents that are under 1 MB'.
The article also said that it depends on the number of regexs run against the data.
This will take forever to classify our office files if we are to depend on FSRM.
Has anyone tried it on your systesms? What are the speeds you got? Is it this slow? Is there anyway to improve our performance? Does FSRM classification depend on RAM, number of cores, number of files or it's content?