1 .. SPDX-License-Identifier: GPL-2.0
7 This file contains brief information about the SCSI tape driver.
8 The driver is currently maintained by Kai Mäkisara (email
9 Kai.Makisara@kolumbus.fi)
11 Last modified: Tue Feb 9 21:54:16 2016 by kai.makisara
17 The driver is generic, i.e., it does not contain any code tailored
18 to any specific tape drive. The tape parameters can be specified with
19 one of the following three methods:
21 1. Each user can specify the tape parameters he/she wants to use
22 directly with ioctls. This is administratively a very simple and
23 flexible method and applicable to single-user workstations. However,
24 in a multiuser environment the next user finds the tape parameters in
25 state the previous user left them.
27 2. The system manager (root) can define default values for some tape
28 parameters, like block size and density using the MTSETDRVBUFFER ioctl.
29 These parameters can be programmed to come into effect either when a
30 new tape is loaded into the drive or if writing begins at the
31 beginning of the tape. The second method is applicable if the tape
32 drive performs auto-detection of the tape format well (like some
33 QIC-drives). The result is that any tape can be read, writing can be
34 continued using existing format, and the default format is used if
35 the tape is rewritten from the beginning (or a new tape is written
36 for the first time). The first method is applicable if the drive
37 does not perform auto-detection well enough and there is a single
38 "sensible" mode for the device. An example is a DAT drive that is
39 used only in variable block mode (I don't know if this is sensible
42 The user can override the parameters defined by the system
43 manager. The changes persist until the defaults again come into
46 3. By default, up to four modes can be defined and selected using the minor
47 number (bits 5 and 6). The number of modes can be changed by changing
48 ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed
49 above. Additional modes are dormant until they are defined by the
50 system manager (root). When specification of a new mode is started,
51 the configuration of mode 0 is used to provide a starting point for
52 definition of the new mode.
54 Using the modes allows the system manager to give the users choices
55 over some of the buffering parameters not directly accessible to the
56 users (buffered and asynchronous writes). The modes also allow choices
57 between formats in multi-tape operations (the explicitly overridden
58 parameters are reset when a new tape is loaded).
60 If more than one mode is used, all modes should contain definitions
61 for the same set of parameters.
63 Many Unices contain internal tables that associate different modes to
64 supported devices. The Linux SCSI tape driver does not contain such
65 tables (and will not do that in future). Instead of that, a utility
66 program can be made that fetches the inquiry data sent by the device,
67 scans its database, and sets up the modes using the ioctls. Another
68 alternative is to make a small script that uses mt to set the defaults
69 tailored to the system.
71 The driver supports fixed and variable block size (within buffer
72 limits). Both the auto-rewind (minor equals device number) and
73 non-rewind devices (minor is 128 + device number) are implemented.
75 In variable block mode, the byte count in write() determines the size
76 of the physical block on tape. When reading, the drive reads the next
77 tape block and returns to the user the data if the read() byte count
78 is at least the block size. Otherwise, error ENOMEM is returned.
80 In fixed block mode, the data transfer between the drive and the
81 driver is in multiples of the block size. The write() byte count must
82 be a multiple of the block size. This is not required when reading but
83 may be advisable for portability.
85 Support is provided for changing the tape partition and partitioning
86 of the tape with one or two partitions. By default support for
87 partitioned tape is disabled for each driver and it can be enabled
88 with the ioctl MTSETDRVBUFFER.
90 By default the driver writes one filemark when the device is closed after
91 writing and the last operation has been a write. Two filemarks can be
92 optionally written. In both cases end of data is signified by
93 returning zero bytes for two consecutive reads.
95 Writing filemarks without the immediate bit set in the SCSI command block acts
96 as a synchronization point, i.e., all remaining data form the drive buffers is
97 written to tape before the command returns. This makes sure that write errors
98 are caught at that point, but this takes time. In some applications, several
99 consecutive files must be written fast. The MTWEOFI operation can be used to
100 write the filemarks without flushing the drive buffer. Writing filemark at
101 close() is always flushing the drive buffers. However, if the previous
102 operation is MTWEOFI, close() does not write a filemark. This can be used if
103 the program wants to close/open the tape device between files and wants to
106 If rewind, offline, bsf, or seek is done and previous tape operation was
107 write, a filemark is written before moving tape.
109 The compile options are defined in the file linux/drivers/scsi/st_options.h.
111 4. If the open option O_NONBLOCK is used, open succeeds even if the
112 drive is not ready. If O_NONBLOCK is not used, the driver waits for
113 the drive to become ready. If this does not happen in ST_BLOCK_SECONDS
114 seconds, open fails with the errno value EIO. With O_NONBLOCK the
115 device can be opened for writing even if there is a write protected
116 tape in the drive (commands trying to write something return error if
123 The tape driver currently supports up to 2^17 drives if 4 modes for
126 The minor numbers consist of the following bit fields::
128 dev_upper non-rew mode dev-lower
131 The non-rewind bit is always bit 7 (the uppermost bit in the lowermost
132 byte). The bits defining the mode are below the non-rewind bit. The
133 remaining bits define the tape device number. This numbering is
134 backward compatible with the numbering used when the minor number was
141 The driver creates the directory /sys/class/scsi_tape and populates it with
142 directories corresponding to the existing tape devices. There are autorewind
143 and non-rewind entries for each mode. The names are stxy and nstxy, where x
144 is the tape number and y a character corresponding to the mode (none, l, m,
145 a). For example, the directories for the first tape device are (assuming four
146 modes): st0 nst0 st0l nst0l st0m nst0m st0a nst0a.
148 Each directory contains the entries: default_blksize default_compression
149 default_density defined dev device driver. The file 'defined' contains 1
150 if the mode is defined and zero if not defined. The files 'default_*' contain
151 the defaults set by the user. The value -1 means the default is not set. The
152 file 'dev' contains the device numbers corresponding to this device. The links
153 'device' and 'driver' point to the SCSI device and driver entries.
155 Each directory also contains the entry 'options' which shows the currently
156 enabled driver and mode options. The value in the file is a bit mask where the
157 bit definitions are the same as those used with MTSETDRVBUFFER in setting the
160 A link named 'tape' is made from the SCSI device directory to the class
161 directory corresponding to the mode 0 auto-rewind device (e.g., st0).
164 Sysfs and Statistics for Tape Devices
165 =====================================
167 The st driver maintains statistics for tape drives inside the sysfs filesystem.
168 The following method can be used to locate the statistics that are
169 available (assuming that sysfs is mounted at /sys):
171 1. Use opendir(3) on the directory /sys/class/scsi_tape
172 2. Use readdir(3) to read the directory contents
173 3. Use regcomp(3)/regexec(3) to match directory entries to the extended
174 regular expression "^st[0-9]+$"
175 4. Access the statistics from the /sys/class/scsi_tape/<match>/stats
176 directory (where <match> is a directory entry from /sys/class/scsi_tape
177 that matched the extended regular expression)
179 The reason for using this approach is that all the character devices
180 pointing to the same tape drive use the same statistics. That means
181 that st0 would have the same statistics as nst0.
183 The directory contains the following statistics files:
186 - The number of I/Os currently outstanding to this device.
188 - The amount of time spent waiting (in nanoseconds) for all I/O
189 to complete (including read and write). This includes tape movement
190 commands such as seeking between file or set marks and implicit tape
191 movement such as when rewind on close tape devices are used.
193 - The number of I/Os issued to the tape drive other than read or
194 write commands. The time taken to complete these commands uses the
195 following calculation io_ms-read_ms-write_ms.
197 - The number of bytes read from the tape drive.
199 - The number of read requests issued to the tape drive.
201 - The amount of time (in nanoseconds) spent waiting for read
202 requests to complete.
204 - The number of bytes written to the tape drive.
206 - The number of write requests issued to the tape drive.
208 - The amount of time (in nanoseconds) spent waiting for write
209 requests to complete.
211 - The number of times during a read or write we found
212 the residual amount to be non-zero. This should mean that a program
213 is issuing a read larger thean the block size on tape. For write
214 not all data made it to tape.
218 The in_flight value is incremented when an I/O starts the I/O
219 itself is not added to the statistics until it completes.
221 The total of read_cnt, write_cnt, and other_cnt may not total to the same
222 value as iodone_cnt at the device level. The tape statistics only count
223 I/O issued via the st module.
225 When read the statistics may not be temporally consistent while I/O is in
226 progress. The individual values are read and written to atomically however
227 when reading them back via sysfs they may be in the process of being
228 updated when starting an I/O or when it is completed.
230 The value shown in in_flight is incremented before any statstics are
231 updated and decremented when an I/O completes after updating statistics.
232 The value of in_flight is 0 when there are no I/Os outstanding that are
233 issued by the st driver. Tape statistics do not take into account any
234 I/O performed via the sg device.
236 BSD and Sys V Semantics
237 =======================
239 The user can choose between these two behaviours of the tape driver by
240 defining the value of the symbol ST_SYSV. The semantics differ when a
241 file being read is closed. The BSD semantics leaves the tape where it
242 currently is whereas the SYS V semantics moves the tape past the next
243 filemark unless the filemark has just been crossed.
245 The default is BSD semantics.
251 The driver tries to do transfers directly to/from user space. If this
252 is not possible, a driver buffer allocated at run-time is used. If
253 direct i/o is not possible for the whole transfer, the driver buffer
254 is used (i.e., bounce buffers for individual pages are not
255 used). Direct i/o can be impossible because of several reasons, e.g.:
257 - one or more pages are at addresses not reachable by the HBA
258 - the number of pages in the transfer exceeds the number of
259 scatter/gather segments permitted by the HBA
260 - one or more pages can't be locked into memory (should not happen in
261 any reasonable situation)
263 The size of the driver buffers is always at least one tape block. In fixed
264 block mode, the minimum buffer size is defined (in 1024 byte units) by
265 ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
266 several blocks and using one SCSI read or write to transfer all of the
267 blocks. Buffering of data across write calls in fixed block mode is
268 allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used.
269 Buffer allocation uses chunks of memory having sizes 2^n * (page
270 size). Because of this the actual buffer size may be larger than the
271 minimum allowable buffer size.
273 NOTE that if direct i/o is used, the small writes are not buffered. This may
274 cause a surprise when moving from 2.4. There small writes (e.g., tar without
275 -b option) may have had good throughput but this is not true any more with
276 2.6. Direct i/o can be turned off to solve this problem but a better solution
277 is to use bigger write() byte counts (e.g., tar -b 64).
279 Asynchronous writing. Writing the buffer contents to the tape is
280 started and the write call returns immediately. The status is checked
281 at the next tape operation. Asynchronous writes are not done with
282 direct i/o and not in fixed block mode.
284 Buffered writes and asynchronous writes may in some rare cases cause
285 problems in multivolume operations if there is not enough space on the
286 tape after the early-warning mark to flush the driver buffer.
288 Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is
289 attempted even if the user does not want to get all of the data at
290 this read command. Should be disabled for those drives that don't like
291 a filemark to truncate a read request or that don't like backspacing.
293 Scatter/gather buffers (buffers that consist of chunks non-contiguous
294 in the physical memory) are used if contiguous buffers can't be
295 allocated. To support all SCSI adapters (including those not
296 supporting scatter/gather), buffer allocation is using the following
297 three kinds of chunks:
299 1. The initial segment that is used for all SCSI adapters including
300 those not supporting scatter/gather. The size of this buffer will be
301 (PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of
302 this size (and it is not larger than the buffer size specified by
303 ST_BUFFER_BLOCKS). If this size is not available, the driver halves
304 the size and tries again until the size of one page. The default
305 settings in st_options.h make the driver to try to allocate all of the
307 2. The scatter/gather segments to fill the specified buffer size are
308 allocated so that as many segments as possible are used but the number
309 of segments does not exceed ST_FIRST_SG.
310 3. The remaining segments between ST_MAX_SG (or the module parameter
311 max_sg_segs) and the number of segments used in phases 1 and 2
312 are used to extend the buffer at run-time if this is necessary. The
313 number of scatter/gather segments allowed for the SCSI adapter is not
314 exceeded if it is smaller than the maximum number of scatter/gather
315 segments specified. If the maximum number allowed for the SCSI adapter
316 is smaller than the number of segments used in phases 1 and 2,
317 extending the buffer will always fail.
320 EOM Behaviour When Writing
321 ==========================
323 When the end of medium early warning is encountered, the current write
324 is finished and the number of bytes is returned. The next write
325 returns -1 and errno is set to ENOSPC. To enable writing a trailer,
326 the next write is allowed to proceed and, if successful, the number of
327 bytes is returned. After this, -1 and the number of bytes are
328 alternately returned until the physical end of medium (or some other
329 error) is encountered.
334 The buffer size, write threshold, and the maximum number of allocated buffers
335 are configurable when the driver is loaded as a module. The keywords are:
337 ========================== ===========================================
338 buffer_kbs=xxx the buffer size for fixed block mode is set
340 write_threshold_kbs=xxx the write threshold in kilobytes set to xxx
341 max_sg_segs=xxx the maximum number of scatter/gather
343 try_direct_io=x try direct transfer between user buffer and
344 tape drive if this is non-zero
345 ========================== ===========================================
347 Note that if the buffer size is changed but the write threshold is not
348 set, the write threshold is set to the new buffer size - 2 kB.
351 Boot Time Configuration
352 =======================
354 If the driver is compiled into the kernel, the same parameters can be
355 also set using, e.g., the LILO command line. The preferred syntax is
356 to use the same keyword used when loading as module but prepended
357 with 'st.'. For instance, to set the maximum number of scatter/gather
358 segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the
359 number of scatter/gather segments).
361 For compatibility, the old syntax from early 2.5 and 2.4 kernel
362 versions is supported. The same keywords can be used as when loading
363 the driver as module. If several parameters are set, the keyword-value
364 pairs are separated with a comma (no spaces allowed). A colon can be
365 used instead of the equal mark. The definition is prepended by the
366 string st=. Here is an example::
368 st=buffer_kbs:64,write_threshold_kbs:60
370 The following syntax used by the old kernel versions is also supported::
376 - aa is the buffer size for fixed block mode in 1024 byte units
377 - bb is the write threshold in 1024 byte units
378 - dd is the maximum number of scatter/gather segments
384 The tape is positioned and the drive parameters are set with ioctls
385 defined in mtio.h The tape control program 'mt' uses these ioctls. Try
386 to find an mt that supports all of the Linux SCSI tape ioctls and
387 opens the device for writing if the tape contents will be modified
388 (look for a package mt-st* from the Linux ftp sites; the GNU mt does
389 not open for writing for, e.g., erase).
391 The supported ioctls are:
393 The following use the structure mtop:
396 Space forward over count filemarks. Tape positioned after filemark.
398 As above but tape positioned before filemark.
400 Space backward over count filemarks. Tape positioned before
403 As above but ape positioned after filemark.
405 Space forward over count records.
407 Space backward over count records.
409 Space forward over count setmarks.
411 Space backward over count setmarks.
413 Write count filemarks.
415 Write count filemarks with immediate bit set (i.e., does not
416 wait until data is on tape)
418 Write count setmarks.
422 Set device off line (often rewind plus eject).
424 Do nothing except flush the buffers.
428 Space to end of recorded data.
430 Erase tape. If the argument is zero, the short erase command
431 is used. The long erase command is used with all other values
434 Seek to tape block count. Uses Tandberg-compatible seek (QFA)
435 for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and
436 block numbers in the status are not valid after a seek.
438 Set the drive block size. Setting to zero sets the drive into
439 variable block mode (if applicable).
441 Sets the drive density code to arg. See drive
442 documentation for available codes.
444 Explicitly lock/unlock the tape drive door.
446 Explicitly load and unload the tape. If the
447 command argument x is between MT_ST_HPLOADER_OFFSET + 1 and
448 MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the
449 drive with the command and it selects the tape slot to use of
452 Sets compressing or uncompressing drive mode using the
453 SCSI mode page 15. Note that some drives other methods for
454 control of compression. Some drives (like the Exabytes) use
455 density codes for compression control. Some drives use another
456 mode page but this page has not been implemented in the
457 driver. Some drives without compression capability will accept
458 any compression mode without error.
460 Moves the tape to the partition given by the argument at the
461 next tape operation. The block at which the tape is positioned
462 is the block where the tape was previously positioned in the
463 new active partition unless the next tape operation is
464 MTSEEK. In this case the tape is moved directly to the block
465 specified by MTSEEK. MTSETPART is inactive unless
466 MT_ST_CAN_PARTITIONS set.
468 Formats the tape with one partition (argument zero) or two
469 partitions (argument non-zero). If the argument is positive,
470 it specifies the size of partition 1 in megabytes. For DDS
471 drives and several early drives this is the physically first
472 partition of the tape. If the argument is negative, its absolute
473 value specifies the size of partition 0 in megabytes. This is
474 the physically first partition of many later drives, like the
475 LTO drives from LTO-5 upwards. The drive has to support partitions
476 with size specified by the initiator. Inactive unless
477 MT_ST_CAN_PARTITIONS set.
479 Is used for several purposes. The command is obtained from count
480 with mask MT_SET_OPTIONS, the low order bits are used as argument.
481 This command is only allowed for the superuser (root). The
485 The drive buffer option is set to the argument. Zero means
488 Sets the buffering options. The bits are the new states
489 (enabled/disabled) the following options (in the
490 parenthesis is specified whether the option is global or
491 can be specified differently for each mode):
494 write buffering (mode)
496 asynchronous writes (mode)
500 writing of two filemarks (global)
502 using the SCSI spacing to EOD (global)
504 automatic locking of the drive door (global)
506 the defaults are meant only for writes (mode)
508 backspacing over more than one records can
509 be used for repositioning the tape (global)
511 the driver does not ask the block limits
512 from the drive (block size can be changed only to
515 enables support for partitioned
518 the logical block number is used in
519 the MTSEEK and MTIOCPOS for SCSI-2 drives instead of
520 the device dependent address. It is recommended to set
521 this flag unless there are tapes using the device
522 dependent (from the old times) (global)
524 sets the SYSV semantics (mode)
526 enables immediate mode (i.e., don't wait for
527 the command to finish) for some commands (e.g., rewind)
529 enables immediate filemark mode (i.e. when
530 writing a filemark, don't wait for it to complete). Please
531 see the BASICS note about MTWEOFI with respect to the
532 possible dangers of writing immediate filemarks.
534 enables setting the SILI bit in SCSI commands when
535 reading in variable block mode to enhance performance when
536 reading blocks shorter than the byte count; set this only
537 if you are sure that the drive supports SILI and the HBA
538 correctly returns transfer residuals
540 debugging (global; debugging must be
541 compiled into the driver)
543 * MT_ST_SETBOOLEANS, MT_ST_CLEARBOOLEANS
544 Sets or clears the option bits.
545 * MT_ST_WRITE_THRESHOLD
546 Sets the write threshold for this device to kilobytes
547 specified by the lowest bits.
549 Defines the default block size set automatically. Value
550 0xffffff means that the default is not used any more.
551 * MT_ST_DEF_DENSITY, MT_ST_DEF_DRVBUFFER
552 Used to set or clear the density (8 bits), and drive buffer
553 state (3 bits). If the value is MT_ST_CLEAR_DEFAULT
554 (0xfffff) the default will not be used any more. Otherwise
555 the lowermost bits of the value contain the new value of
557 * MT_ST_DEF_COMPRESSION
558 The compression default will not be used if the value of
559 the lowermost byte is 0xff. Otherwise the lowermost bit
560 contains the new default. If the bits 8-15 are set to a
561 non-zero number, and this number is not 0xff, the number is
562 used as the compression algorithm. The value
563 MT_ST_CLEAR_DEFAULT can be used to clear the compression
566 Set the normal timeout in seconds for this device. The
567 default is 900 seconds (15 minutes). The timeout should be
568 long enough for the retries done by the device while
570 * MT_ST_SET_LONG_TIMEOUT
571 Set the long timeout that is used for operations that are
572 known to take a long time. The default is 14000 seconds
573 (3.9 hours). For erase this value is further multiplied by
576 Set the cleaning request interpretation parameters using
577 the lowest 24 bits of the argument. The driver can set the
578 generic status bit GMT_CLN if a cleaning request bit pattern
579 is found from the extended sense data. Many drives set one or
580 more bits in the extended sense data when the drive needs
581 cleaning. The bits are device-dependent. The driver is
582 given the number of the sense data byte (the lowest eight
583 bits of the argument; must be >= 18 (values 1 - 17
584 reserved) and <= the maximum requested sense data sixe),
585 a mask to select the relevant bits (the bits 9-16), and the
586 bit pattern (bits 17-23). If the bit pattern is zero, one
587 or more bits under the mask indicate cleaning request. If
588 the pattern is non-zero, the pattern must match the masked
591 (The cleaning bit is set if the additional sense code and
592 qualifier 00h 17h are seen regardless of the setting of
595 The following ioctl uses the structure mtpos:
598 Reads the current position from the drive. Uses
599 Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2
600 command for the SCSI-2 drives.
602 The following ioctl uses the structure mtget to return the status:
605 Returns some status information.
606 The file number and block number within file are returned. The
607 block is -1 when it can't be determined (e.g., after MTBSF).
608 The drive type is either MTISSCSI1 or MTISSCSI2.
609 The number of recovered errors since the previous status call
610 is stored in the lower word of the field mt_erreg.
611 The current block size and the density code are stored in the field
612 mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and
613 MT_ST_DENSITY_SHIFT).
614 The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN
615 is set if there is no tape in the drive. GMT_EOD means either
616 end of recorded data or end of tape. GMT_EOT means end of tape.
619 Miscellaneous Compile Options
620 =============================
622 The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL
625 The maximum number of tape devices is determined by the define
626 ST_MAX_TAPES. If more tapes are detected at driver initialization, the
627 maximum is adjusted accordingly.
629 Immediate return from tape positioning SCSI commands can be enabled by
630 defining ST_NOWAIT. If this is defined, the user should take care that
631 the next tape operation is not started before the previous one has
632 finished. The drives and SCSI adapters should handle this condition
633 gracefully, but some drive/adapter combinations are known to hang the
634 SCSI bus in this case.
636 The MTEOM command is by default implemented as spacing over 32767
637 filemarks. With this method the file number in the status is
638 correct. The user can request using direct spacing to EOD by setting
639 ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file
640 number will be invalid.
642 When using read ahead or buffered writes the position within the file
643 may not be correct after the file is closed (correct position may
644 require backspacing over more than one record). The correct position
645 within file can be obtained if ST_IN_FILE_POS is defined at compile
646 time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl.
647 (The driver always backs over a filemark crossed by read ahead if the
648 user does not request data that far.)
654 Debugging code is now compiled in by default but debugging is turned off
655 with the kernel module parameter debug_flag defaulting to 0. Debugging
656 can still be switched on and off with an ioctl. To enable debug at
657 module load time add debug_flag=1 to the module load options, the
658 debugging output is not voluminous. Debugging can also be enabled
659 and disabled by writing a '0' (disable) or '1' (enable) to the sysfs
660 file /sys/bus/scsi/drivers/st/debug_flag.
662 If the tape seems to hang, I would be very interested to hear where
663 the driver is waiting. With the command 'ps -l' you can see the state
664 of the process using the tape. If the state is D, the process is
665 waiting for something. The field WCHAN tells where the driver is
666 waiting. If you have the current System.map in the correct place (in
667 /boot for the procps I use) or have updated /etc/psdatabase (for kmem
668 ps), ps writes the function name in the WCHAN field. If not, you have
669 to look up the function from System.map.
671 Note also that the timeouts are very long compared to most other
672 drivers. This means that the Linux driver may appear hung although the
673 real reason is that the tape firmware has got confused.