Efficiently Reduce Document Sets in a Defensible Manner
In typical document collections, as much as 30% - 60% of all documents are similar, with only iterative differences in content or formatting. IDM's Near Duplicate Detection groups near duplicate documents by percentage of similarity so reviewers can quickly review and code similar documents for responsiveness or privilege.
IDM's process utilizes Context Triggered Piecewise Hashing (CTPH) technology to identify similar documents. Unlike traditional de-duplication methods where the hash value of an entire file is compared against the values of other entire files, CTPH assigns fuzzy hash values based on distinct segments within each file. The fuzzy hash value of the file is compared against values for segments in other files. If a specified similarity threshold is met, the files are identified as duplicate or near duplicate.
IDM's near duplicate detection solution can be loaded into any server or web-based review application so you can sort your documents into near duplicate groups for review. Your review team will quickly become familiar with the content of similar documents and can code them efficiently and consistently.
IDM's near duplicate detection is charged per file at rates as much as 90% lower than other service providers…just pennies per file, while processing time is extremely fast and will not hold up your time-sensitive review.

