|
NAMEddpt - copies data between files and storage devices. Support for devices that understand the SCSI command set.SYNOPSISddpt [bpt=BPT[,OBPC]] [bs=BS] [cdbsz=IO_CDBSZ] [cdl=CDL] [coe={0|1}] [coe_limit=CL] [conv=CONVS] [count=COUNT] [ddpt=VERS] [delay=MS[,W_MS]] [ibs=IBS] [id_usage=LIU] if=IFILE [iflag=FLAGS] [intio={0|1}] [iseek=SKIP] [ito=ITO] [list_id=LID] [obs=OBS] [of=OFILE] [of2=OFILE2] [oflag=FLAGS] [oseek=SEEK] [prio=PRIO] [protect=RDP[,WRP]] [retries=RETR] [rtf=RTF] [rtype=RTYPE] [seek=SEEK] [skip=SKIP] [status=STAT] [to=TO] [verbose=VERB] [--dry-run] [--flexible] [--help] [--job=JF] [--odx] [--prefetch] [--progress] [--quiet] [--verbose] [--verify] [--version] [--wscan] [--xcopy] [ddpt] [JF]For comparison here is the synopsis for GNU's dd command: dd [bs=BS] [cbs=CBS] [conv=CONVS] [count=COUNT] [ibs=IBS] [if=IFILE] [iflag=FLAGS] [obs=OBS] [of=OFILE] [oflag=FLAGS] [seek=SEEK] [skip=SKIP] [status=STAT] [--help] [--version] DESCRIPTIONCopies data between files or simply reads data from a file. Alternatively if the --verify option is given, the IFILE and OFILE contents are compared, stopping if an inequality is found. This utility is specialized for "files" that are storage devices, especially those that can use the SCSI command sets (e.g. SATA and SAS disks). It can issue SCSI commands in pass-through ("pt") mode. Similar syntax and semantics to the Unix dd(1) command.For comparison, the SYNOPSIS section above shows both the ddpt command line operands and options followed by GNU's dd(1) command line operands and options. Broadly speaking ddpt can be considered a super-set of dd. See the section on DD DIFFERENCES for significant differences between ddpt and dd. This utility either does direct copies, based on read-write sequences, or offloaded copies. In an offloaded copy the data being copied does not necessarily pass through the memory of the the machine originating the copy operation; this can save a significant amount of time and lessen CPU usage. When doing a direct copy, this utility breaks the copy into segments since computer RAM is typically a scarce resource. First it reads in BPT*IBS bytes from IFILE (or less if near the end of the copy) into a copy buffer. In the absence of the various operand and flags that bypass the write operation, the copy buffer is then written out to OFILE. The copy process continues working its way along IFILE and OFILE until either COUNT is exhausted, an end of file is detected, or an error occurs. If IBS and OBS are different, ddpt restricts the value of OBS such that the copy buffer is an integral number of output blocks (i.e. (((IBS * BPT) % OBS) == 0) ). In the following descriptions, "segment" refers to all or part of a copy buffer. The term "pt device" is used for a pass-through device to which SCSI commands like READ(10), WRITE(10) or POPULATE TOKEN may be sent. A pt device may only be able to process SCSI commands in which case the "pt" flag is assumed. The ability to recognize such a pt only device may vary depending on the operating system (e.g. in Linux /dev/sg2 and /dev/bsg/3:0:1:0 are recognized). However if a device can process either normal UNIX read()/ write() calls or pass-through SCSI commands then the default is to use UNIX read()/write() calls. That default can be overridden by using the "pt" flag (e.g. "if=/dev/sdc iflag=pt"). When pt access is specified any partition information is ignored. So "if=/dev/sdc2 iflag=pt skip=3" will start at logical block address 3 of '/dev/sdc'. As a protection measure ddpt will only accept that if the force flag is also given (i.e. 'iflag=pt,force'). This utility supports two types of offloaded copies. Both are based on the EXTENDED COPY (XCOPY or xcopy) family of SCSI commands. The first uses the XCOPY(LID1) command to do a disk to disk copy. LID1 stands for List IDentifier length of 1 byte and the commands are described in the SPC-4 and earlier SPC-3 and SPC-2 standards. The SPC-4 standard (ANSI INCITS 513-2015) added the XCOPY(LID4) sub-family of copy offloaded commands. Now SPC-5 drafts have dropped the LID1 variants and removed the LID4 suffix on the remaining XCOPY family of commands. To differentiate, this man page will continue to use the LID1 and LID4 suffixes. There is a subset of XCOPY(LID4), specialized for offloaded disk to disk copies, that is known by the market name: ODX. In the descriptions below "xcopy" refers to copies based on XCOPY(LID1) while "odx" refers to either full or partial ODX copies. See the XCOPY and ODX sections below for more information. The syntax of the dd command is somewhat unique in Unix and ddpt follows in a similar fashion. Operands (i.e. those with the <name>=<something> structure) are shown in OPERANDS section. The more familiar Unix options (i.e. those starting with one or two hyphens) are shown in the OPTIONS section. Then there are a few arguments which are command line entities that are neither operands nor options, see the ARGUMENTS section. OPERANDSThe operands are listed alphabetically (by <name>) below. The <name> is the part that is to the left of the equal sign. All <names> start with a lower case alphabetical character.
OPTIONSOptions are listed in alphabetical order, sorted by their long name.
ARGUMENTSArguments do not start with hyphen nor contain a "=".
CONVERSIONSOne or more conversions can be given to the "conv=" option. If more than one is given, they should be comma separated. ddpt does not perform the traditional dd conversions (e.g. ASCII to EBCDIC). Recently added conversions inherited from GNU's dd overlap somewhat with the some of ddpt flags.
FLAGSA list of flags and their meanings follow. The flag name is followed by one or two indications in square brackets. The first indication is either "[i]", "[o]" or "[io]" indicating this flag is active for the IFILE, OFILE or both the IFILE and the OFILE. The second indication contains some combination of "reg", "blk" "pt", "odx", or "xcopy". These indicate whether the flag applies to a regular file, a block device (accessed via Unix read() and write() commands, a pass-through device, an ODX offloaded copy or a XCOPY(LID1) offloaded copy respectively. Other special file types that are sometimes referred to are "fifo" and "tape".
COUNTWhen the count=COUNT option is not given (or COUNT is '-1') then an attempt is made to deduce COUNT as follows.When both or either IFILE and OFILE are block devices, then the minimum size, expressed in units of input blocks, is used. When both or either IFILE and OFILE are pass-through devices, then the minimum size, expressed in units of input blocks, is used. If a regular file is used as input, its size, expressed in units of input blocks (and rounded up if necessary) is used. Note that the rounding up of the deduced COUNT may result in a partial read of the last input block and a corresponding partial write to OFILE if it is a regular file. After a regular file to regular file copy the length of OFILE will be the same as IFILE unless OFILE existed and its length was already greater than that of IFILE. To get a copy like the standard Unix cp command, use oflag=trunc with ddpt. The size of pt devices is deduced from the SCSI READ CAPACITY command. Block device sizes (or their partition sizes) are obtained from the operating system, if available. If skip=SKIP or seek=SEEK are given and the COUNT is deduced (i.e. not explicitly given) then that size is scaled back so that the copy will not overrun the file or device. If COUNT is not given and IFILE is a fifo (and stdin is treated as a fifo) then IFILE is read until an EOF is detected. If COUNT is not given and IFILE is a /dev/zero (or equivalent) then zeros are read until an error occurs (e.g. file system full). If COUNT is not given and cannot be deduced then an error message is issued and no copy takes place. JOB FILESSome operands can have long arguments (e.g. skip=SKIP and iflag=FLAGS) so that the command line can become quite long. Also scatter gather lists can be arbitrarily long and may be generated by a program; then it would be tiresome and error-prone to re-type them on the command line. So the job file was introduced to hold this utility's operands and options.A job file is invoked by either the --job=JF option or by placing the job filename (JF) unadorned on the command line. The job filename cannot contain a "=", start with a hyphen nor be called "ddpt". It is parsed when it is detected, in a left to right scan of the command line. The JF file must contain the string "ddpt" and may invoke other job files (to a maximum depth of 4). A job file should not invoke itself. Also the first line of the job file should not contain any characters (bytes) with their top bit set; in other words it should be restricted to 7 bit ASCII (otherwise sanity checks might think it is a binary file and reject it). The operands and options within a job file are processed in the order they are found (i.e. parsing lines left to right, top (of file) to bottom). The operands and options may contradict (and cause a syntax error), override or accumulate with earlier ones, the same as if they appeared on the command line. For example '-v' on the command line followed by a job file containing '-vv' will result in a verbosity level of '-vvv' during the copy phase. Empty lines, lines only containing whitespace(s) and anything from and including a '#' in a job file line are ignored. SCATTER GATHER LISTSEach element of a scatter gather list (sgl, plural: sgl_s) is made up of a starting logical block address (LBA, plural: LBAs), and a number of blocks (NUM) to be accessed from that starting LBA.The skip=SKIP and seek=SEEK options (and their aliases) can take scatter gather lists. These can be explicit on the command line, fed in through stdin or in a file whose name is prefixed by "@" or "H@" on the command line. For large scatter gather lists, placing them in a file is the most practical as command lines are limited in length. Scatter gather list (sgl) is a collective term for either a scatter list or a gather list. The actual implementation of each sgl is an array. Syntactically a scatter list and a gather list are the same. Conceptually these sgl_s refer to what happens at the "far end" (e.g. within a hard disk or SSD), not what happens in the computer's memory. So a gather list is associated with the read part of a copy (i.e. the first half) where a list of Logical Blocks (LBs, identified by their addresses, hence LBAs) and a number of consecutive, following blocks are "gathered" from the medium (e.g. a SSD). They are formed into a linear sequence of bytes that is transferred into a segment in the computer's RAM. The second half of the copy, the write part, may use a scatter list. A scatter list starts with a linear sequence of bytes, taken from the segment, that is transferred to the device and then "scattered" on the medium as indicated by the list of LBA,NUM pairs. In the simplest case a sgl is given on the command line and has the form: LBA1,NUM1[,LBA2,NUM2[,LBA3,NUM3...]]. There must be an even number of items (i.e. for every LBAn there should be a following NUMn) with one exception: when LBA1 alone is given, in which case the value 0 is assumed for NUM1. Comma is the simplest separator for the command line, but whitespace may also be used (but needs to be escaped because the shell usually interprets whitespace as an option separator). In a file (or read from stdin or file redirection) more flexibility is permitted in the format. The LBA,NUM pairs could all appear on one line in a file but the line length is limited to 1024 characters (with a maximum of 256 parseable items on it). So for longer sgl_s one pair per line is recommended in file format. Also in file format everything from and including '#' to the end of that line is ignored as are lines that are empty or only contain whitespace(s). Each pair becomes one element (or more, see below) of the sgl. By default all numbers given for LBA and NUM items are in decimal with optional suffix multipliers. Hex numbers use either a "0x" prefix or a 'h' suffix (hex notation and suffix multipliers cannot be mixed). In the case of a 'H@' lead-in to the filename on the command line, all numbers are interpreted as hex with no suffix multipliers permitted. Further, with the 'H@' lead-in the file may contain the string 'HEX' before any numbers are given. The 'HEX' is ignored. The point of this is to catch when a sgl file with default hexadecimal numbers is given without the 'H@' lead-in; in this case this utility will exit saying that file is in the wrong format. This "wrong format" action can be bypassed with the --flexible option. Allowing sgl_s brings lots of flexibility (including the possibility to use the SCSI WRITE SCATTERED command) but with that comes complexity. Every sgl is scanned to determine if it is monotonic and whether it has overlapping elements. The term monotonic is used to indicate whether each LBA is in ascending order, with each LBA greater than the previous element's LBA. Overlapping refers to the situation when any element's LBA range intersects with any other element's range. Elements that have zero number of blocks (described here as "degenerate") are ignored for determining monotonic and overlapping (and the lowest LBA). Overlapping elements are not ideal (but not necessarily fatal). The above mentioned WRITE SCATTERED command allows the medium's logic to write elements in any order it prefers. That means if elements overlap, then the user doesn't know which one gets written last (overwriting the one written to the same LBA earlier). Determining whether element ranges overlap is difficult in the general case (so this utility doesn't do it) but easy in the case of a monotonic sgl (so this utility does do it). Warnings are issued in dangerous situations, with the force flag allowing the warning to be overridden. A degenerate sgl element is one that has zero in its NUM field. Normally degenerate elements are ignored with some exceptions. The definition of the SCSI WRITE SCATTERED command clearly states that degenerate elements are valid, thus do not cause an error, but cause no associated action. This utility uses the concept of a 'hard' and 'soft' sgl: a 'soft' sgl is one in which the last element's NUM is zero (i.e. its last element is degenerate). A sgl with a non-zero NUM in its last element is considered 'hard'. In a 'soft' sgl the LBA of the last element should be greater than or equal to any LBA+NUM of earlier elements. Because this is hard to check it is not enforced, so the decision is made on whether a sgl is hard or soft simply by checking the NUM of that last element. The difference between a hard and soft sgl is the way the sum of NUM of all elements is used by this utility. For a 'hard' sgl that sum is used for COUNT when the count=COUNT option is not given; and if count=COUNT is given and the counts differ then those two values are output and this utility exits with a syntax error. For a 'soft' sgl the degenerate last element is interpreted as "from the highest LBA in the list to the end of the copy" where the COUNT is determined some other way. The "highest LBA" is calculated from all elements that have a non-zero number of blocks plus the LBA of the last element (regardless of whether it is degenerate or not). The rules in the above paragraph make a one item skip or seek argument (e.g. skip=0x123) in this utility first become a one element sgl (e.g. containing the pair [0x123, 0x0]). Since this is the last element, it is a soft sgl and the transfer will start from the given lba (i.e. 0x123) and continues for the number of blocks indicated by some other mechanism (e.g. an option such as count=COUNT or the length of IFILE). This mirrors what the classic dd command does with its skip= and seek= options. Some sgl implementation details: LBAs are stored in 64 bit integers which is more than sufficient to span even the largest disk array behind a logical device, even if the block size is one byte, which is unlikely. The NUM field is a 32 bit integer and this is more problematic. The reason is that SCSI WRITE commands (and their variants) only allocate at most a 32 bit integer for this value. Further, modern operating systems do not allow any driver to get large amounts of contiguous system RAM, even if the machine has it available. A 32 bit integer for NUM with each block at 512 bytes is around 2 TB of storage. Unix system calls (in Linux) also limit each read(2) and write(2) system call to 32 bits of single bytes which is 4 GB. The problem for this utility is that the NUM can easily exceed 32 bits when a single scatter gather list element refers to the whole device. The action taken by this utility is to allow larger than 32 bit NUM values to be given on the command line (or in a scatter gather list file). However such a large element will be split into multiple elements internally. This will be visible to the user when the verbose=VERB option (or one of its variants) is used with an elevated value. There is a helper utility called ddpt_sgl in this package for generating, manipulating and checking scatter gather lists. See its manpage. SANITY CHECKSWith powerful data tools, the ability to accidentally overwrite and hence lose important data is ever present. So a significant portion of the code is dedicated to checking the input arguments for duplications and contradictions. Still nothing is better than re-reading the command line (which can be quite long) before hitting the enter key.Other useful possibilities are to use job files (see the JF argument and the --job=JF option) and the --dry-run option. The "dry run" option is becoming popular in modern command line utilities and more or less does what the user would expect. Firstly it parses all the command line arguments then opens IFILE, OFILE and OFILE2 as directed by the command line and does any meta-data operations that it would typically do (e.g. check a pass-though or block device's logical block size and object if it differs from BS, IBS or OBS (whichever applies)). Then just at the point where the code would commence the actual copy (or read) it does a premature exit. If the --dry-run option is given twice, the code continues into the copy logic and bypasses the low level read and write calls (and file repositioning). That inner level of "dry run" is useful for debugging and can be used with multiple verbose=VERB options. The verbose=VERB option sends diagnostic messages to stderr. The higher value of VERB (in verbose=VERB) or the more times that -v is used, the greater the volume of diagnostic messages. When use three or more times then diagnostic messages are generated for each read to, and write from, the working copy buffer; so the volume of messages is proportional to the number of reads and writes that are done; this can easily be in the megabyte range. If used less than three times, the reads and writes associated with the copy do not generate diagnostic messages (unless abnormal situations are encountered). These diagnostic messages are mainly associated with command line parsing and fetching meta-data about the given files, plus messages from the cleanup at the end of the copy. The following command line arguments are checked that they don't appear more than once: bpt=BPT[,OBPC], bs=BS, count=COUNT, ibs=IBS, if=IFILE, iseek=SKIP, obs=OBS, of=OFILE, of2=OFILE2, oseek=SEEK, seek=SEEK and skip=SKIP. On the other hand, some arguments are additive, for example iflag=FLAGS, oflag=FLAGS, status=STAT and --verbose and may appear as many times as required. XCOPYThis section describes XCOPY(LID1) support with this utility. For ODX support (XCOPY(LID4) subset) see the ODX section.A device (logical unit (LU)) that supports XCOPY operations should set the 3PC field (3PC stands for Third Party Copy) in its standard INQUIRY response. That is not checked when this utility does an xcopy operation but if it fails, that is one thing that the user may want to check. If the xcopy starts and fails while underway, then 'sg_copy_results -s' may be useful to view the copy status. It might also be used from a different process with the same I_T nexus (i.e. the same machine) to check status during an xcopy operation. The pad and cat flags control the handling of residual data. As the data can be specified either in terms of source or target block size and both might have different block sizes residual data is likely to happen in these cases. If both block sizes are identical these bits have no effect as residual data will not occur. If neither of these flags are set, the EXTENDED COPY command will be aborted with additional sense 'UNEXPECTED INEXACT SEGMENT'. If only the cat flag is set the residual data will be retained and made available for subsequent segment descriptors. Residual data will be discarded for the last segment descriptor. If the pad flag is set for the source descriptor only, any residual data for both source or destination will be discarded. If the pad flag is set for the target descriptor only any residual source data will be handled as if the cat flag is set, but any residual destination data will be padded to make a whole block transfer. If the pad flag is set for both source and target any residual source data will be discarded, and any residual destination data will be padded. There is a web page discussing ddpt, XCOPY and ODX at https://sg.danny.cz/sg/ddpt_xcopy_odx.html ODXThis section describes ODX support (an XCOPY(LID4) subset) for this utility. ODX descriptions use the following command name abbreviations: PT for the POPULATE TOKEN command, RRTI for the READ ROD TOKEN INFORMATION command, and WUT for the WRITE USING TOKEN command.A device (logical unit (LU)) that supports ODX operations is required to set the 3PC field (3PC stands for Third Party Copy) in its standard INQUIRY response and support the Third Party Copy VPD page. If this utility generates errors noting the absence of these then the device in question probably does not support ODX. There a four variants of ODX supported by ddpt: full copy : ddpt --odx if=/dev/sg3 bs=512 of=/dev/sg4 zero output blocks : ddpt if=/dev/null rtype=zero bs=512 of=/dev/sg4 read to tokens : ddpt if=/dev/sg3 bs=512 skip=@gath.lst rtf=a.rt write from tokens : ddpt rtf=a.rt bs=512 of=/dev/sg4 seek=@scat.lst The full copy will call PT and WUT commands repeatedly until the copy is complete. More precisely the full copy will make the largest single call to PT allowed by the input's Third Party Copy VPD page (and, if given, allowed by the BPT argument in the bpt=BPT[,OBPC] option). Then one or more WUT calls are made to write out from the ROD created by the PT step. The largest single WUT call is constrained by the output's Third Party Copy VPD page (and, if given, allowed by the OBPC argument in the bpt=BPT[,OBPC] option). This sequence continues until the requested copy is complete. The zero output blocks variant is a special case of the full copy in which only WUT calls are made. ODX defines a special ROD Token to zero blocks. That special ROD Token has a fixed pattern (shown in SBC-3) and does not need to be created by a PT command like normal ROD Tokens. The read to tokens and the write from tokens variants are designed to be the read (input) and write (output) sides respectively of a network copy. Each can run on different machines by sending the RTF file from the machine doing the read to the machine doing the write. The read to tokens will make one or more PT calls and output the resulting ROD Tokens to the RTF file. RTF might be a regular file or a named pipe. All four variants can have the immed flag set. Then the PT and/or WUT commands are issued with the IMMED bit set and the RRTI command is used to poll for completion. The delay between the polls is as suggested by the RRTI command (or if no suggestion is made, 500 milliseconds). Either iflag=immed, oflag=immed or both can be given but are only effective if the corresponding IFILE or OFILE sends a PT or WUT command. Typically there is no need to give the list_id=LID option. If this option is not given then 257 is chosen. If that is busy then 258 is tried. That continues until a usable LID is found or 10 LIDs have been tried. In the latter case ddpt exits with status of 55 (operation in progress). If the user gives list_id=LID option and LID is busy then ddpt exits with exit status 55. If the block size of the input and output are different (i.e. IBS is not equal to OBS) then one must be a multiple of the other. So an input block size of 512 bytes and an output block size of 4096 bytes (or vice versa) is acceptable. The four ODX variants are distinguished as follows: if OFILE is a pass-through device, if=/dev/null (or equivalent) and rtype=zero then the zero output blocks variant is selected. If both IFILE and OFILE are pass-through devices and there is some indication of an ODX request (e.g. the --odx option), then the full copy variant is selected. The read to tokens and the write from token variants are indicated by the absence of either a of=OFILE or a if=IFILE option, respectively, plus the presence of a rtf=RTF option. The helper utility ddptctl contains options to issue a single PT, RRTI, WUT or COPY OPERATION ABORT command. It can also issue a series of polling RRTI commands. It can decode information in ROD Tokens (which is not as informative as it should be) and print the number of blocks and block size of a disk, plus protection information if available. See ddptctl. There is a web page discussing ddpt, XCOPY and ODX at https://sg.danny.cz/sg/ddpt_xcopy_odx.html SPARSE WRITESBypassing writes of blocks full of zeros can save a lot of IO. However with regular files, bypassed writes at the end of the copy can lead to an OFILE which is shorter than it would have been without sparse writes. This can lead to integrity checking programs like md5sum and sha1sum generating different values.This utility has two ways of handling this file length problem: writing the last block (even if it is full of zeros) or using the ftruncate system call. A third approach is to ignore the problem (i.e. leaving OFILE shorter). The ftruncate approach is used when "oflag=strunc" while the last block is written when "oflag=sparse". To ignore the file length issue use "oflag=sparse,sparse". Note that if OFILE's length is already correct or longer than required, no action is taken. The support for sparse writing of regular files may depend on the OS, the file system and the settings of OFILE. POSIX makes few guarantees when the ftruncate system call is used to extend a file's length, as may occur when "oflag=strunc". Further, primitive file systems like VFAT may not accept sparse writes or simulate the effect by writing blocks of zeros. The latter approach will defeat any sparse writing performance gain. TRIM, UNMAP AND WRITE SAMEThis is a new storage feature often associated with Solid State Disks (SSDs) or disk arrays with "thin provisioning". In the ATA command set (ACS-2) the relevant command is DATA SET MANAGEMENT with the TRIM bit set. In the SCSI command set (SBC-3) it is either the UNMAP or WRITE SAME command. Note there is no TRIM command however the term is frequently used in the technical press.Trim is a way of telling a storage device that blocks are no longer needed. Keeping the pool of unwritten blocks large is important for the write performance of SSDs and the thrifty use of real storage in thin provisioned arrays. Currently file systems in recent OSes may issue trims associated with file deletes. The trim option in ddpt may be useful when a partition or a whole SSD is to be "deleted". Note that ddpt is bypassing file systems in that it only offers trim on pass-through (pt) devices. This utility issues SCSI commands to pt devices and for "trim" currently issues a SCSI WRITE SAME(16) command with the UNMAP bit set. If the pt device is a SSD with a ATA interface then recent versions of Linux will translate the SCSI WRITE SAME to the ATA DATA SET MANAGEMENT command with the TRIM bit set. The maximum size of each "trim" command sent is the size of the copy buffer (i.e. IBS * BPT bytes). And that maximum can be reduced with the OBPC argument of the "bpt=" option. The trim can be used various ways. One way is a copy where the copy buffer (or some part of it) is checked for zeros as is done by the sparse oflag. When a zero segment is found, a trim "command" is sent to the OFILE. For example: ddpt if=dsk.img bs=512 of=/dev/sdc oflag=pt,trim The copy buffer is 64 KiB (since BPT and OBPC default to 128 when "bs=512") and it is checked for all zeros. If it is all zeros then a trim command is sent to the corresponding location of /dev/sdc which is accessed via the pt interface. If it is not all zeros then a SCSI WRITE command is sent. Another way is to trim all or part of a disk. To trim a whole disk (i.e. deleting all its data): ddpt if=/dev/zero bs=512 of=/dev/sdc oflag=pt,trim A third way is to "self-trim" which is to only trim those parts of a disk that contain segments full of zeros: ddpt if=/dev/sdc skip=0x2300 bs=512 iflag=pt,self,trim count=0x1234f0 The "self" oflag automatically sets up the output side of the copy to send trim commands (if required) back the the same device (i.e. /dev/sdc). If this example was self-trimming a partition then the partition would start at LBA 0x2300 and be 0x1234f0 blocks long. Some random product examples: the Intel X25-M G2 SSDs have trim with recent firmware and they do deterministic read zero after trim. The Seagate Pulsar SSD has an ATA interface which supports the deterministic reads of zero after the DATA SET MANAGEMENT command with the TRIM option. NVME SUPPORTThe following information is Linux specific at this time. NVMe devices in Linux have names like /dev/nvme0, /dev/nvme0n1 and /dev/nvme0n1p3. The first device name is a character device and some "Admin" commands can be sent to it (e.g. Identify) but no media access commands (which the NVMe specification calls the "NVM" Command set). The number given is a controller identifier. Storage in NVMe is associated with namespaces which are numbered within a controller, starting at 1 (e.g. /dev/nvme0n1 is controller 0, namespace 1). These device nodes are block devices and can be given as IFILE and/or OFILE. The third type of NVMe device node selects a partition (within a namespace, within a controller). Partition numbers also start with 1.By default ddpt will treat the second and third form (of NVMe device nodes) as standard Linux block devices. So ddpt will act in the same as the dd utility would. In a similar fashion to accessing SCSI block devices (e.g. /dev/sdc3) get access NVMe block devices the "pt" flag is required, either with iflag=FLAGS and/or oflag=FLAGS. There is a SCSI to NVMe Translation Layer (SNTL) in the sg3_utils library which underpins this utility. DD DIFFERENCESdd defaults "if=" and "of=" to stdin and stdout respectively. This follows Unix filter conventions. However since dd and ddpt are often used to read binary data for timing purposes, having to supply "of=/dev/null" can be easily forgotten. Without it dd will typically spew binary data on the console. So ddpt has changed its defaults: the "if=IFILE" is now mandatory for direct copies and to read from stdin "if=-" can be used; "of=OFILE" remains optional but its default changes to "/dev/null" (or "NUL" in Windows). To send output to stdout ddpt accepts "of=-".dd truncates OFILE unless "conv=notrunc" is given. When dd truncates, it truncates to zero length unless SEEK is greater than zero. ddpt does not truncate OFILE by default. If OFILE exists it will be overwritten. The overwrite starts at block zero unless SEEK or "oflag=append" is given. If OFILE is a regular file then "oflag=trunc" (or "conv=trunc") will truncate OFILE prior to the copy. Numeric arguments to ddpt can be given in hexadecimal, either with a leading "0x" or "0X" or with a trailing "h". Note that dd accepts "0x123" but interprets it as "0 * 123" (i.e. zero). ddpt will also interpret "x" as multiplies unless the left operand is zero (e.g. "0x123"). So both dd and ddpt will interpret "skip=2x123" as "skip=246". Terabyte size disks make it impractical to copy all the data into a single buffer of 512 bytes length before writing it out. Therefore both dd and ddpt read a relatively small amount of data into a copy (or transfer) buffer then write it out to the destination, repeating this process until the COUNT is exhausted. A major difference in ddpt is the addition of BPT (Blocks Per Transfer) to control the size of the copy buffer. With dd, IBS is the size of the copy buffer and the unit of SKIP and COUNT. With ddpt, IBS * BPT is the size of the copy buffer and IBS is the unit of SKIP and COUNT. This allows ddpt to have its IBS set to the logical block size of IFILE without unduly restricting the size of the copy buffer. And setting IBS (and OBS for OFILE) accurately is required when the pass-through interface is used since with the SCSI READ and WRITE commands the logical block size is implicit. The way dd handles its copy buffer (outlined in SUSv4 description of dd) is relatively complex, especially when IBS and OBS are different sizes. The restriction that ddpt places on IBS and OBS ( i.e. (((IBS * BPT) % OBS) == 0) ) means that a single copy buffer can be used since its size is a multiple of both IBS and OBS. Being able to precisely define the copy buffer size in ddpt makes sparse writing, write sparing and trim operations simpler to define and the user to control. ddpt does not support dd's "cbs=" option (conversion block size). If the "cbs=" option is given to ddpt then it is ignored. ddpt adds two types of disk to disk, offloaded copies: XCOPY(LID1) first introduced in SPC-2 (standardized in 2001), and ODX which is a subset of XCOPY(LID4) first introduced in SPC-4 draft (revision 34, 2012). PROTECTION INFORMATIONThis section is about protection information which is typically an extra 8 bytes associated with each logical block. Those 8 byte are divided into 3 fields: logical block guard (16 bit (2 byte) CRC), logical block application tag (2 bytes) and the logical block reference tag (4 bytes). The acronym DIF is sometimes used for protection information.The feature to read and/or write protection information by using the protect=RDP[,WRP] option is currently experimental. It should be used with care and may not "play well" with some other features such as write sparing and sparse writing. It should be used to copy user data plus the associated protection information to or from a regular file. It could also be used for a device to device copy assuming the "pt" interface is used for both. Also only modern SCSI disks support protection information. When RDP or WRP is greater than 0 then a copy with associated protection information is active. In this state IBS and OBS must be the same and equal to the logical block size of the device(s) formatted with protection information. If a SCSI disk with 512 byte logical block size has protection information then the actual number of bytes transferred for each logical block is typically 520 bytes. For such a disk BS=512 is required even when additional protection information is being transferred. When protection type 2 is used, the "normal" READ, WRITE and VERIFY SCSI commands are disallowed. In this context "normal" means the 6, 10, 12, and 16 byte variants. Only READ(32) and WRITE(32) can be used. The 32 byte variants can be selected in this utility by using the operand 'cdbsz=32'. MULTIPLIERSBy default numeric arguments to options are assumed to be decimal. Almost all numeric arguments to options (e.g. COUNT in the count=COUNT option) may include one of these multiplicative suffixes: c C *1; w W *2; b B *512; k K KiB *1,024; KB *1,000; m M MiB *1,048,576; MB *1,000,000 . This pattern continues for "G", "T" and "P". The latter two suffixes can only be used for 64 bit values. Some numeric arguments are limited to 32 bit values (e.g. BSin the bs=BS option). Also a suffix of the form "x<n>" multiplies the leading number by <n>; however the combinations "0x" and "0X" are treated differently, see the next paragraph. These multiplicative suffixes are compatible with GNU's dd command (since 2002) which claims compliance with the SI and with IEC 60027-2 standards.Alternatively numerical values can be given in hexadecimal indicated by either a leading "0x" or "0X", or by a trailing "h" or "H". When hex numbers are given, suffix multipliers cannot be used. If a numeric argument is required to fit in 32 bits and is too large then an error is reported. Usually negative numbers are not permitted but "count=-1" is a special case and means "all available"; "verbose=-1" is another special case. NOTESCopying data behind an Operating System's back can cause problems. In the case of Linux, users should look at this link: https://linux-mm.org/Drop_CachesThis command sequence may be useful: sync; echo 3 > /proc/sys/vm/drop_caches A partial write is a write to the OFILE of less than OBS bytes. This typically occurs at the end of a copy. dd can do partial writes. ddpt does partial writes to regular files and fifos (including stdout). However ddpt ignores partial writes when OFILE is a block device or a pt device. When ddpt ignores a partial write, it sends a warning to the console (stderr). At the end of the copy two lines are reported to the console: <in_full>+<in_partial> records in <out_full>+<out_partial> records out The "records in" line is the number of full input blocks (each of IBS bytes) that have been read plus the number of partial blocks ( usually less than IBS bytes) that have been read. Following the lead of dd when 'iflag=coe' is active a block that cannot be read (and has zeros substituted for its output) is regarded as a partial read. The "records out" line is the number of full output blocks (each of OBS bytes) that have been written plus the number of partial blocks (usually less than OBS bytes) that have been written. Block devices (e.g. /dev/sda and /dev/hda) can be given for IFILE. If neither 'iflag=direct' nor 'iflag=pt' is given then normal block IO involving buffering and caching is performed. If 'iflag=direct' is given then the buffering and caching is bypassed (this is applicable to both SCSI devices and ATA disks). When 'iflag=pt' is given SCSI commands are sent to the device which bypasses most of the actions performed by the block layer. The same applies for block devices given for OFILE. All informative, warning and error reports are sent to stderr so that dd's output file can be stdout and remain unpolluted. If no options are given, then no copying (nor reading) takes place and a brief message is sent to stderr inviting the user to invoke ddpt again but with '--help' option to get the usage message. Disk partition information can often be found with fdisk(8) [the "-ul" argument is useful in this respect]. Also parted(8) can be used like this: 'parted /dev/sda unit s print' . For pt devices this utility issues SCSI READ and WRITE (SBC) commands which are appropriate for disks and reading from CD/DVD/BD drives. Those commands are not formatted correctly for tape drives so ddpt cannot be used on tape drives via a pt device. If the largest block address of the requested transfer exceeds a 32 bit block number (i.e 0xffffffff) then a warning is issued and the pt device is accessed via SCSI READ(16) and WRITE(16) commands. The attributes of a block device (e.g. partitions) are ignored when the pt flag is used. Hence the whole device is read (rather than just the second partition) by this invocation: ddpt if=/dev/sdb2 iflag=pt of=t bs=512 Assuming /dev/sdb and /dev/sg2 refer to the same device, then after the following two invocations, the contents of the files "t", "tt" and "ttt" should be same: ddpt if=/dev/sdb of=tt bs=512 ddpt if=/dev/sg2 of=ttt bs=512 The SCSI READ(32) and WRITE(32) commands are restricted to media that is formatted with protection type 2. This is a T10 restriction. SIGNALSThe signal handling has been borrowed from GNU's dd: SIGINT, SIGQUIT and SIGPIPE report the number of remaining blocks to be transferred and the records in + out counts; then they have their default action. SIGUSR1 (or SIGINFO) causes the same information to be output and the copy continues. All output caused by signals is sent to stderr.Like GNU's dd, ddpt respects the signal disposition of "ignored" (SIG_IGN) set by the shell, script or other program that invokes ddpt. So in that case it will ignore such signals. Further dd ignores SIGUSR1 if the environment variable POSIXLY_CORRECT is set because POSIX defines dd will only act on SIGINFO (and Linux has no such signal); ddpt ignores the POSIXLY_CORRECT environment variable. As recommended by Susv3, ddpt does not expect the signal (blocking) mask to be blocking SIGUSR1 (SIGINFO), SIGINT or SIGPIPE on entry. Unix system calls that do IO can be interrupted by signal processing, typically returning an EINTR error number. The dd utility (and many other Unix utilities) restart the IO operation that was interrupted. While this will work most of the time for disk IO it is problematic for tape drives because the implicit position pointer on the tape may have moved. So the default (i.e. "intio=0") in this utility is to mask those signals during IO operations and only check them prior to starting an IO operation. Most low level IO (e.g. using SCSI command to write to a disk) will timeout if there is a low level error. However NFS (the Network File System) will potentially wait for a long time (e.g. expecting a network problem will soon be fixed) and in this case using "intio=1" may be best. VERIFYThe usual way to check the two disks (or part of the disks) are the same is to move through the segments to be compared, reading from both and comparing the returned buffers, stopping if there is an in equality.This utility takes a different approach that relies on the OFILE being a pass through device. That pass-through device needs to support the SCSI VERIFY command with the BYTCHK field set to 1. Optionally, for the --prefetch option to improve performance that pass-through device needs to support the SCSI PRE-FETCH command with its IMMED bit set. When the --verify option is given, instead of reading both IFILE and OFILE, only the IFILE is read. Then the result of that read is sent to the OFILE device as the data-out buffer of a VERIFY(BYTCHK=1) command. So the comparison is actually done on the OFILE device rather than the host computer's main memory. If the --prefetch option is also given, then before the IFILE read, a PRE-FETCH(OFILE, IMMED) is sent. The IMMED bit will make it return more or less immediately. The effect of the PRE-FETCH should be to bring the contents of the data to be used for the OFILE side of the comparison, into the OFILE device's cache. And that should make the later VERIFY(BYTCHK=1) command faster. TAPEThere is support for copies to and from tape drives in Linux. Only the st driver device names can be used (e.g. /dev/st0 and /dev/nst2). Hence use of Linux pass-through device names (e.g. /dev/sg2) for tape drives is not supported. On Debian-based distributions, it is suggested that the mt-st package is installed as it provides a more fully-featured version of the "mt" tape control program.Tape drives can operate in fixed- or variable-length block modes. In variable-block mode, each write to the tape writes a single block of that size. In fixed-block mode, each write to the tape must be a multiple of the previously-selected block size. The block size/mode can be set with the mt command prior to
invoking ddpt. For example:
# mt -f /dev/nst0 setblk 0
Note that some tape drives support only fixed-block mode, and possibly even only one block size. (For example, QIC-150 tapes use a fixed block size of 512 bytes.) There may also be restrictions on the block size, e.g. it may have to be even. When using ddpt to write to tape, if the final read from the input is less than OBS, it is padded to OBS bytes before writing to tape to ensure that all blocks of the tape file are the same length. Having a shorter final block would fail if the drive is in fixed-block mode, and could create interchange problems. It is common to expect all blocks in a file on tape to be the same length. However, to tell ddpt to not pad the final block, use 'oflag=nopad'. The st tape driver normally writes a filemark when the file (e.g. /dev/nst0) is closed. To not have the filemark written, use 'oflag=nofm'. One use case for that might be if using ddpt several times in succession to append more data to the same file on tape. In that case it is probably desirable to write the filemark at the end of the sequence. So either omit 'oflag=nofm' on the last ddpt invocation, or manually write a filemark using mt after ddpt exits: # mt -f /dev/nst0 weof 1 For reading from an unknown tape where the block size(s) is not
known, read in variable-block mode specifying a large IBS. The st
driver returns a smaller amount of data if the size of the block read is
smaller. Thus a command like:
# ddpt if=/dev/nst0 of=output.bin bs=262144
ENVIRONMENT VARIABLESIf the command line invocation of an xcopy does not explicitly (and unambiguously) indicate whether the XCOPY SCSI command should be sent to IFILE (i.e. the source) or OFILE (i.e. the destination) then a check is made for the presence of the XCOPY_TO_SRC and XCOPY_TO_DST environment variables. If either one exists (but not both) then it indicates where the SCSI XCOPY command will be sent. By default the XCOPY command is sent to OFILE.The ODX write from tokens variant is very complex to implement if the amount of data held in each ROD is not known. The value should be found in the "number of bytes represented" field in the ROD Token but that is not well supported yet by vendors. So for such cases, that number can be appended as a big endian 8 byte integer following each ROD Token in the RTF file. The conv=rtf_len will cause that length to be appended. Specifying that option on each read to tokens and write from tokens invocation can be a nuisance. Setting the environment variable ODX_RTF_LEN will cause this utility to act as if the conv=rtf_len option has been given. Sometimes the default block size of 512 can be a nuisance. This can be overridden by the value associated with the DDPT_DEF_BS environment variable. If the environment variable is not found, the value cannot be decoded or is zero or less, then the default block size remains at 512 bytes. EXIT STATUSTo aid scripts that call ddpt, the exit status is set to indicate success (0) or failure (1 or more). Note that some of the lower values correspond to the SCSI sense key values. The exit status values are:
EXAMPLESThe examples in this page use Linux device names. For suitable device names in other supported Operating Systems see this web page: https://sg.danny.cz/sg/device_name.html . The sg3_utils(8) man page in the sg3_utils package also covers device naming.ddpt usage looks quite similar to dd: ddpt if=/dev/sg0 of=t bs=512 count=1MB This will copy 1 million 512 byte blocks from the device associated with /dev/sg0 (which should have 512 byte blocks) to a file called t. Assuming /dev/sda and /dev/sg0 are the same device then the above is equivalent to: dd if=/dev/sda iflag=direct of=t bs=512 count=1000000 although dd's speed may improve if bs was larger and count was suitably reduced. The use of the 'iflag=direct' option bypasses the buffering and caching that is usually done on a block device. The dd command's bs argument can be thought of as roughly equivalent to ddpt's bs*bpt . dd almost assumes buffering on a block device and will work as long as bs is a multiple of the actual logical block size. Since ddpt can work at a lower level in some cases the bs argument must be a disk's actual logical block size. Thus the bpt argument was introduced to make the copy more efficient. So these two invocations are roughly equivalent: dd if=/dev/sda of=t bs=8k count=64 ddpt if=/dev/sda of=t bs=512 bpt=16 count=1k In both cases the total number of bytes moved is bs*count . And that will be done by reading 8k (8192 bytes) into a buffer then writing out that buffer to the file t. The read write sequence continues until the count is complete or an error occurs. The 'of2=' option can save time when the input would otherwise need to be read twice. For example, to copy data and take a md5sum of it without needing to re-read the data: mkfifo fif md5sum fif & ddpt if=/dev/sg3 iflag=coe of=sg3.img oflag=sparse of2=fif bs=512 This will image /dev/sg3 (e.g. an unmounted disk) and place the contents in the (sparse) file sg3.img . Without re-reading the data it will also perform a md5sum calculation on the image. Now we use sparse writing logic to get some idea of how many blocks on a disk are full of zeros. After a SCSI FORMAT UNIT command or an ATA SECURITY ERASE command a disk may be all zeros. ddpt if=/dev/sdc bs=512 oflag=sparse Since no "of=" option is given, output goes to /dev/null so nothing is actually written so the "records out" will be zero. However there will be a count of "records in" and "bypassed records out". If /dev/sdc is full of zeros then "records in" and "bypassed records out" will be the same. Since the "bpt=" option is not given it defaults to "bpt=128,128" so the copy buffer will be 64 KiB and the sparse check for zeros will be done with 64 KiB (128 block) granularity. For examples of the trim and self,trim options see the section above on TRIM, UNMAP AND WRITE SAME. Following is an example run on a Windows OS using the '--wscan' option which shows the available device names (e.g. PD1) and the associated volume name(s):
ddpt -w
So, for example, volumes D: and F: reside on PhysicalDisk1 (abbreviated to "PD1") which is manufactured by WD (Western Digital). Further examples can be found on this web page: https://sg.danny.cz/sg/ddpt.html . There is a text file containing examples called ddpt_examples.txt in the "doc" directory of this package's distribution tarball. The ddpt_examples.txt file contains some examples of using job files. AUTHORSWritten by Doug GilbertREPORTING BUGSReport bugs to <dgilbert at interlog dot com>.COPYRIGHTCopyright © 2008-2021 Douglas GilbertThis software is distributed under the GPL version 2. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. SEE ALSOThis utility has companion/helper utilities ddptctl(8), ddpt_sgl(8)There is a web page discussing ddpt at https://sg.danny.cz/sg/ddpt.html The lmbench package contains lmdd which is also interesting. For moving data to and from tapes see dt which is found at http://www.scsifaq.org/RMiller_Tools/index.html To change mode parameters that effect a SCSI device's caching and error recovery see sdparm(sdparm) To verify the data on the media is readable see: sg_verify(sg3_utils) To scan and repair disk partitions see TestDisk (testdisk). Additional references: dd(1), open(2), flock(2), sg_xcopy,sg_copy_results, sg_dd(sg3_utils)
Visit the GSP FreeBSD Man Page Interface. |