# Find Folder Duplicates

> Drop files to find exact duplicates by comparing cryptographic hashes, saving disk space

**Category:** Utility
**Keywords:** duplicate, files, folder, hash, compare, deduplicate, disk space, cleanup
**URL:** https://complete.tools/find-folder-duplicates

## How it works

When you drop files onto the tool, each file is read into memory as an ArrayBuffer using the browser's File API. The raw bytes are then passed to the Web Crypto API's crypto.subtle.digest function, which computes a SHA-256 hash. SHA-256 is a member of the SHA-2 family of cryptographic hash functions that produces a fixed 256-bit (32-byte) output regardless of input size. The probability of two different files producing the same hash (a collision) is astronomically small, approximately 1 in 2^128, making it a reliable method for content comparison. Once all files have been hashed, the tool groups them by their hash value. Any group containing more than one file represents a set of exact duplicates. The wasted space for each group is calculated as the file size multiplied by the number of extra copies (total copies minus one). A secondary analysis scans for files sharing the same name or the same byte size but having different hash values, surfacing potential version conflicts or near-misses that deserve manual review.

## Who should use this

This tool is valuable for anyone looking to reclaim storage space on their computer or external drives. Photographers who import batches of images from multiple memory cards often end up with the same photos scattered across several folders. Musicians and producers accumulate duplicate audio samples and project files over time. Office workers may have multiple copies of the same document saved under slightly different names after rounds of email attachments and shared drive syncing. System administrators performing storage audits on shared network folders can use the tool to quickly quantify duplicate waste before running a cleanup. Students organizing research papers and lecture notes across semesters can identify redundant downloads. Anyone performing a migration from one cloud service to another, or consolidating multiple backup drives into one, will find this tool indispensable for verifying which files are truly unique before deleting anything.

## Worked examples

Example 1: A photographer drops 200 vacation photos from two different import folders. The tool hashes all 200 files and discovers 45 exact duplicate pairs. Each photo averages 8 MB, so the tool reports 45 duplicate files totaling 360 MB of recoverable space. The photographer can now safely delete one copy from each pair.

Example 2: A software developer drops the contents of two project backup folders containing 150 files each. The tool finds 120 files that are identical across both backups, 5 files with the same name but different content (indicating code files that were modified between backups), and 3 pairs of files with the same size but different hashes (config files with different settings). The developer uses the name collision report to identify which files changed and the exact duplicate list to prune the redundant backup.

Example 3: A student drops 50 PDF lecture notes from three semesters into the tool. Seven files turn out to be exact duplicates downloaded multiple times, totaling 84 MB of wasted space. Two files share the name "syllabus.pdf" but have different hashes, revealing updated syllabi from different semesters that should both be kept.

## Limitations

This tool operates entirely in the browser, which means it is subject to the memory constraints of the browser tab. Scanning thousands of very large files (such as multi-gigabyte video files) may cause the browser to slow down or run out of memory. The tool compares files based on their full SHA-256 hash, so it will only detect exact byte-for-byte duplicates. Two images that look visually identical but have been saved with different compression settings or metadata will not be flagged as duplicates. Similarly, two documents with the same text content but different formatting will produce different hashes. The tool cannot compare files across separate drop operations; all files to be compared must be dropped in a single batch. Because browsers do not provide folder path information for security reasons, the tool cannot display the original folder location of each file.

## FAQs

**Q:** Is SHA-256 hashing reliable for finding duplicates?


**A:** SHA-256 is a cryptographic hash function designed so that even a single bit difference in the input produces a completely different hash output. The chance of two different files producing the same SHA-256 hash is approximately 1 in 2^128, which is effectively zero for practical purposes. Major version control systems like Git use SHA-based hashing for the same reason.


**Q:** Are my files uploaded to a server?


**A:** No. All file reading and hashing happens locally in your web browser using the File API and the Web Crypto API. Your files never leave your device. You can verify this by disconnecting from the internet before using the tool.


**Q:** Can I scan an entire folder at once?


**A:** You can select multiple files from a folder using your operating system's file picker (click the dropzone and use Ctrl+A or Cmd+A to select all files). Some browsers also support dragging an entire folder onto the dropzone, which will include all files within it.


**Q:** Why do some files with the same name show up as different?


**A:** Files can share a name but contain different data. This happens frequently with files like "readme.txt", "config.json", or "index.html" that exist in many projects. The tool flags these as name collisions so you can review whether they are truly different versions or accidental duplicates with modified content.

---

*Generated from [complete.tools/find-folder-duplicates](https://complete.tools/find-folder-duplicates)*