|
NAMEdupd - find duplicate filesSYNOPSISdupd COMMAND [OPTIONS]DESCRIPTIONdupd scans all the files in the given path(s) to find files with duplicate content.The sets of duplicate files are not displayed during a scan. Instead, the duplicate info is saved into a database which can be queried with subsequent commands without having to scan all files again. Even though dupd can be used as a simple duplicate reporting tool similar to how other duplicate finders work (by running dupd scan ; dupd report), the real power of dupd comes from interactively exploring the filesystem for duplicates after the scan has completed. See the file, ls, dups, uniques and refresh commands. Additional documentation and examples are available under the docs directory in the source tree. If you don't have the source tree available, see https://github.com/jvirkki/dupd/blob/master/docs/index.md COMMANDSAs noted in the synopsis, the first argument to dupd must be the command to run. The command is one of:scan - scan files looking for duplicates report - show duplicate report from last scan file - check for duplicates of one file ls - list info about every file dups - list all duplicate files uniques - list all unique files refresh - remove deleted files from the database validate - revalidate all duplicates in database rmsh - create shell script to delete all duplicates (use with care!) help - show brief usage info usage - show this documentation man - show this documentation license - show license info version - show version and exit OPTIONSscan - Perform the filesystem scan for duplicates.
report - Display the list of duplicates.
Note: The database format generated by scan is not guaranteed to be compatible with future versions. You should run report (and all the other commands below which access the database) using the same version of dupd that was used to generate the database. file - Report duplicate status of one file. To check whether one given file still has known duplicates use the file operation. Note that this does not do a new scan so it will not find new duplicates. This checks whether the duplicates identified during the previous scan still exist and verifies (by hash) whether they are still duplicates.
ls, uniques, dups - List matching files. While the file command checks the duplicate status of a single file, these commands do the same for all the files in a given directory tree. ls - List all files, show whether they have duplicates or not. uniques - List all unique files. dups - List all files which have known duplicates.
refresh - Refreshing the database. As you remove duplicate files these are still listed in the dupd database. Ideally you'd run the scan again to rebuild the database. Note that re-running the scan after deleting some duplicates can be very fast because the files are in the cache, so that is the best option. However, when dealing with a set of files large enough that they don't fit in the cache, re-running the scan may take a long time. For those cases the refresh command offers a much faster alternative. The refresh command checks whether all the files in the dupd database still exist and removes those which do not. Be sure to consider the limitations of this approach. The refresh command does not re-verify whether all files listed as duplicates are still duplicates. It also, of course, does not detect any new duplicates which may have appeared since the last scan. In summary, if you have only been deleting duplicates since the previous scan, run the refresh command. It will prune all the deleted files from the database and will be much faster than a scan. However, if you have been adding and/or modifying files since the last scan, it is best to run a new scan. validate - Validating the database. The validate operation is primarily for testing but is documented here as it may be useful if you want to reconfirm that all duplicates in the database are still truly duplicates. In most cases you will be better off re-running the scan operation instead of using validate. Validate is fairly slow as it will fully hash every file in the database. rmsh - Create shell scrip to remove duplicate files. As a policy dupd never modifies the filesystem! As a convenience for those times when it is desirable to automatically remove files, this operation can create a shell script to do so. The output is a shell script (to stdout) which can you run to delete your files (if you're feeling lucky). Review the generated script carefully to see if it truly does what you want! Automated deletion is generally not very useful because it takes human intervention to decide which of the duplicates is the best one to keep in each case. While the content is the same, one of them may have a better file name and/or location. Optionally, the shell script can create either soft or hard links from each removed file to the copy being kept. The options are mutually exclusive.
Additional global options
HARD LINKSAre hard links duplicates or not? The answer depends on "what do you mean by duplicates?" and "what are you trying to do?"If your primary goal for removing duplicates is to save disk space then it makes sense to ignore hardlinks. If, on the other hand, your primary goal is to reduce filesystem clutter then it makes more sense to think of hardlinks as duplicates. By default dupd considers hardlinks as duplicates. You can switch this around with the --hardlink-is-unique option. This option can be given either during scan or to the interactive reporting commands (file, ls, uniques, dups). EXAMPLESScan all files in your home directory and then show the sets of duplicates found:% dupd scan --path $HOME
% dupd report Show duplicate status (duplicate or unique) for all files in docs subdirectory: % dupd ls --path docs
I'm about to delete docs/old.doc but want to check one last time that it is a duplicate and I want to review where those duplicates are: % dupd file --file docs/old.doc -v
Read the documentation in the dupd 'docs' directory or online documentation for more usage examples. EXITdupd exits with status code 0 on success, non-zero on error.SEE ALSOsqlite3(1)https://github.com/jvirkki/dupd/blob/master/docs/index.md Visit the GSP FreeBSD Man Page Interface. |