Published: November 20, 2022 (2 weeks ago)

Fast Duplicate File Finder 11

22 but the workload grows steadily: already requiring speed before the new wave of big data hits, and handling a much bigger number of cases; a new wave of back-ups, data privacy concerns, and digital misuse; and a shift toward using metadata rather than content to access, understand and make sense of content or data sets. this includes an increased need to find duplicate content rather than duplicates. a problem that big data systems are able to solve by providing far superior estimates to them, even though their costs are still higher than typical and commercially available solutions.

23 as data consumers and producers become increasingly sophisticated, they are producing data sets that are exponentially greater and more disparate. the advent of the web has resulted in the production of almost unimaginable quantities of information to support a plethora of activities that in aggregate would exceed the volumes of the internet’s total traffic. it is the data’s invisibility that causes it to accumulate; the lack of organization and information preventing easy discernment. even duplicates have been reported to have a threefold increase since the start of 1990s, and this rise has been attributed to replication of documents for legal purposes and volume growth. this makes the problem of eliminating duplicates a pressing one.

24 such a broad and diverse pool of data calls for a new-generation of file duplicates prevention solutions, with features that deliver the low overhead of the usual duplicates checks and the high precision of deep-seated duplicates detection. with that in mind, i have started a number of investigations, including: 1) the evaluation of commercial products, 2) deep-seated duplicate detection, and 3) the development of a new duplicates detection component that can be combined with existing duplicate prevention solutions.

