14    Administering Events and Errors

This chapter first describes how to use the system exercizers to discover potential system problems. Then, the chapter describes how the Digital UNIX operating system records information about system events and explains the basic administrative tasks that you use to set up and maintain the event-logging system.


14.1    Using the System Exercisers

The Digital UNIX system provides a set of exercisers that you can use to troubleshoot your system. The exercisers test specific areas of your system, such as file systems or system memory. This chapter explains how to use the exercisers by addressing the following topics:
In addition to the exercisers documented in this chapter, your system might also support the DEC Verifier and Exerciser Tool (VET), which provides a similar set of exercisers. With the release of Digital UNIX Version 4.0, VET is no longer present on the installation kit as an optional subset. Instead, VET software is on the Digital UNIX Firmware CD-ROM.


14.1.1    Running System Exercisers

To run a system exerciser, you must be logged in as superuser and /usr/field must be your current directory.

The commands that invoke the system exercisers provide a flag for specifying a file where diagnostic output is saved when the exerciser completes its task.

Most of the exerciser commands have an online help flag that displays a description of how to use that exerciser. To access online help, use the -h flag with a command. For example, to access help for the diskx exerciser, use the following command:

# diskx -h 

The exercisers can be run in the foreground or the background and can be canceled at any time by pressing Ctrl/C in the foreground. You can run more than one exerciser at the same time; keep in mind, however, that the more processes you have running, the slower the system performs. Thus, before exercising the system extensively, make sure that no other users are on the system.

There are some restrictions when you run a system exerciser over an NFS link or on a diskless system. For exercisers such as fsx that need to write to a file system, the target file system must be writable by root. Also, the directory from which an exerciser is executed must be writable by root because temporary files are written to the directory.

These restrictions can be difficult to adhere to because NFS file systems are often mounted in a way that prevents root from writing to them. Some of the restrictions may be adhered to by copying the exerciser into another directory and then executing it.


14.1.2    Using Exerciser Diagnostics

When an exerciser is halted (either by pressing Ctrl/C or by timing out), diagnostics are displayed and are stored in the exerciser's most recent log file. The diagnostics inform you of the test results.

Each time an exerciser is invoked, a new log file is created in the /usr/field directory. For example, when you execute the fsx command for the first time, a log file named #LOG_FSX_01 is created. The log files contain records of each exerciser's results and consist of the starting and stopping times, and error and statistical information. The starting and stopping times are also logged into the default system error log file, /var/adm/binary.errlog. This file also contains information on errors reported by the device drivers or by the system.

The log files provide a record of the diagnostics. However, after reading a log file, you should delete it because an exerciser can have only nine log files. If you attempt to run an exerciser that has accumulated nine log files, the exerciser tells you to remove some of the old log files so that it can create a new one.

If an exerciser finds errors, you can determine which device or area of the system has the difficulty by looking at the /var/adm/binary.errlog file, using either the dia command (preferred) or the uerf command. For information on the error logger, see the Section 14.2. For the meanings of the error numbers and signal numbers, see the intro(2) and sigvec(2) reference pages.


14.1.3    Exercising a File System

Use the fsx command to exercise the local file systems. The fsx command exercises the specified local file system by initiating multiple processes, each of which creates, writes, closes, opens, reads, validates, and unlinks a test file of random data. For more information, see the fsx(8) reference page.


Note

Do not test NFS file systems with the fsx command.


The fsx command has the following syntax:

fsx[-fpath] [-h] [-ofile] [-pnum] [-tmin]

You can specify one or more of the following flags:

-fpath

Specifies the pathname of the file system directory you want to test. For example, -f/usr or -f/mnt. The default is /usr/field.

-h

Displays the help message for the fsx command.

-ofile

Saves the output diagnostics in file.

-pnum

Specifies the number of fsxr processes you want fsx to initiate. The maximum number of processes is 250. The default is 20.

-tmin

Specifies how many minutes you want the fsx command to exercise the file system. If you do not specify the -t flag, the fsx command runs until you terminate it by pressing Ctrl/C in the foreground.

The following example of the fsx command tests the /usr file system with five fsxr processes running for 60 minutes in the background:

# fsx -p5 -f/usr -t60 & 


14.1.4    Exercising System Memory

Use the memx command to exercise the system memory. The memx command exercises the system memory by initiating multiple processes. By default, the size of each process is defined as the total system memory in bytes divided by 20. The minimum allowable number of bytes per process is 4095. The memx command runs 1s and 0s, 0s and 1s, and random data patterns in the allocated memory being tested.

The files that you need to run the memx exerciser include the following:

For more information, see the
memx(8) reference page

The memx command is restricted by the amount of available swap space. The size of the swap space and the available internal memory determine how many processes can run simultaneously on your system. For example, if there are 16 MB of swap space and 16 MB of memory, all of the swap space will be used if all 20 initiated processes (the default) run simultaneously. This would prevent execution of other process. Therefore, on systems with large amounts of memory and small amounts of swap space, you must use the -p or -m flag, or both, to restrict the number of memx processes or to restrict the size of the memory being tested.

The memx command has the following syntax:

memx-s [-h] [-msize] [-ofile] [-pnum] [-tmin]

You can specify one or more of the following flags:

-s

Disables the automatic invocation of the shared memory exerciser, shmx.

-h

Displays the help message for the memx command.

-msize

Specifies the amount of memory in bytes for each process you want to test. The default is the total amount of memory divided by 20, with a minimum size of 4095 bytes.

-ofile

Saves the output diagnostics in file.

-pnum

Specifies the number of memxr processes to initiate. The maximum number is 20, which is also the default.

-tmin

Specifies how many minutes you want the memx command to exercise the memory. If you do not specify the -t flag, the memx command runs until you terminate it by pressing Ctrl/C in the foreground.

The following example of the memx command initiates five memxr processes that test 4095 bytes of memory and runs in the background for 60 minutes:

# memx -m4095 -p5 -t60 & 


14.1.5    Exercising Shared Memory

Use the shmx command to exercise the shared memory segments. The shmx command spawns a background process called shmxb. The shmx command writes and reads the shmxb data in the segments, and the shmxb process writes and reads the shmx data in the segments.

Using shmx, you can test the number and the size of memory segments and shmxb processes. The shmx exerciser runs until the process is killed or until the time specified by the -t flag is exhausted.

You automatically invoke the shmx exerciser when you start the memx exerciser, unless you specify the memx command with the -s flag. You can also invoke the shmx exerciser manually. The shmx command has the following syntax:

/usr/field/shmx[-h] [-ofile] [-v] [-ttime] [-msize] [-sn]

The shmx command flags are as follows:

-h

Prints the help message for the shmx command.

-ofile

Saves diagnostic output in file.

-v

Uses the fork system call instead of the vfork system call to spawn the shmxb process.

-ttime

Specifies time as the run time in minutes. The default is to run until the process is killed.

-msize

Specifies size as the memory segment size, in bytes, to be tested by the processes. The size value must be greater than zero. The default is the value of the SHMMAX and SHMSEG system parameters, which are set in the /sys/include/sys/param.h file.

-sn

Specifies n as the number of memory segments. The default (and maximum) number of segments is 3.

The following example tests the default number of memory segments, each with a default segment size:

# shmx &

The following example runs three memory segments of 100,000 bytes for 180 minutes:

# shmx -t180 -m100000 -s3 &


14.1.6    Exercising a Disk Drive

Use the diskx command to exercise the disk drives. The main areas that are tested include the following:

Caution

Some of the tests involve writing to the disk; for this reason, use the exerciser cautiously on disks that contain useful data that the exerciser could overwrite. Tests that write to the disk first check for the existence of file systems on the test partitions and partitions that overlap the test partitions. If a file system is found on these partitions, you are prompted to determine if testing should continue.


You can use the diskx command flags to specify the tests that you want performed and to specify the parameters for the tests. For more information, see the
diskx(8) reference page.

The diskx command has the following syntax:

diskx[flags] [parameters] -f devname

The -f devname flag specifies the device special file on which to perform testing. The devname variable specifies the name of the block or character special file that represents the disk to be tested. The file name must begin with an r (for example, rz1). The last character of the file name can specify the disk partition to test.

If a partition is not specified, all partitions are tested. For example, if the devname variable is /dev/rra0, all partitions are tested. If the devname variable is /dev/rra0a, the a partition is tested. This parameter must be specified and can be used with all test flags.

The following flags specify the tests to be run on disk:

-d

Tests the disk's disktab file entry. The disktab entry is obtained by using the getdiskbyname library routine. This test only works if the specified disk is a character special file. See the disktab(4) reference page for more information.

-h

Displays a help message describing test flags and parameters.

-p

Specifies a performance test. Read and write transfers are timed to measure device throughput. Data validation is not performed as part of this test. Testing uses a range of transfer sizes if the -F flag is not specified.

The range of transfer sizes is divided by the number specified with the perf_splits parameter to obtain a transfer size increment. For example, if the perf_splits parameter is set to 10, tests are run starting with the minimum transfer size and increasing the transfer size by 1/10th of the range of values for each test repetition. The last transfer size is set to the specified maximum transfer size.

If you do not specify a number of transfers, the transfer count is set to allow the entire partition to be read or written. In this case, the transfer count varies, depending on the transfer size and the partition size.

The performance test runs until completed or until interrupted; the time is not limited by the -minutes parameter. This test can take a long time to complete, depending on the test parameters.

To achieve maximum throughput, specify the -S flag to cause sequential transfers. If the -S flag is not specified, transfers are done to random locations. This may slow down the observed throughput because of associated head seeks on the device.

-r

Specifies a read-only test. This test reads from the specified partitions. Specify the -n flag to run this test on the block special file.

This test is useful for generating system I/O activity. Because it is a read-only test, you can run more than one instance of the exerciser on the same disk.

-w

Specifies a write test. This test verifies that data can be written to the disk and can be read back to verify the data. Seeks are also done as part of this test. This test provides the most comprehensive coverage of disk transfer functions because it uses reads, writes, and seeks. This test also combines sequential and random access patterns.

This test performs the following operations using a range of transfer sizes; a single transfer size is used if the -F attribute is specified:

The data read from the disk is examined to verify it. Then, if random transfer testing has not been disabled (using the -S attribute), writes are issued to random locations on the partition. After the random writes are completed, reads are issued to random locations on the partition. The data read from random locations is examined to verify it.

The following flags modify the behavior of the test:

-F

Performs fixed size transfers. If this flag is not specified, transfers are done using random sizes. This flag can be used with the -p, -r, and -w test flags.

-i

Specifies interactive mode. In this mode, you are prompted for various test parameters. Typical parameters include the transfer size and the number of transfers. The following scaling factors are allowed: For example 10K would specify 10,240 bytes.

-Q

Suppresses performance analysis of read transfers. This flag only performs write performance testing. To perform only read testing and to skip the write performance tests, specify the -R flag. The -Q flag can be used with the -p test flag.

-R

Opens the disk in read-only mode. This flag can be used with all test flags.

-S

Performs transfers to sequential disk locations. If this flag is not specified, transfers are done to random disk locations. This flag can be used with the -p, -r, and -w test flags.

-T

Directs output to the terminal. This flag is useful if output is directed to a log file by using the -o flag. If you specify the -T flag after the -o flag, output is directed to both the terminal and the log file. The -T flag can be used with all test flags.

-Y

Does not prompt you to confirm that you want to continue the test if file systems are found when the disk is examined; testing proceeds.

In addition to the flags, you can also specify test parameters. You can specify test parameters on the diskx command line or interactively with the -i flag. If you do not specify test parameters, default values are used.

To use a parameter, specify the parameter name, a space, and the numeric value. For example, you could specify the following parameter:

-perf_min 512

You can use the following scaling factors:

For example, 10K would specify 10,240 bytes, and -perf_min 10K causes transfers to be done in sizes of 10,240 bytes.

You can specify one or more of the following parameters:

-debug

Specifies the level of diagnostic output to be produced. The greater the number specified, the more output is produced describing the exerciser operations. This parameter can be used with all test flags.

-err_lines

Specifies the maximum number of error messages that are produced as a result of an individual test. A limit on error output prevents a large number of diagnostic messages if persistent errors occur. This parameter can be used with all test flags.

-minutes

Specifies the number of minutes to test. This parameter can be used with the -r and -w test flags.

-max_xfer

Specifies the maximum transfer size to be performed. If transfers are done using random sizes, the sizes are within the range specified by the -max_xfer and -min_xfer parameters. If fixed size transfers are specified (see the -F flag), transfers are done in a size specified by the -min_xfer parameter.

Specify transfer sizes to the character special file in multiples of 512 bytes. If the specified transfer size is not an even multiple, the value is rounded down to the nearest 512 bytes. This parameter can be used with the -r and -w test flags.

-min_xfer

Specifies the minimum transfer size to be performed. This parameter can be used with the -r and -w test flags.

-num_xfer

Specifies the number of transfers to perform before changing the partition that is currently being tested. This parameter is only useful if more than one partition is being tested. If this parameter is not specified, the number of transfers is set to a number that completely covers a partition. This parameter can be used with the -r and -w test flags.

-ofilename

Sends output to the specified file name. The default is to display output on the terminal screen. This parameter can be used with all test flags.

-perf_max

Specifies the maximum transfer size to be performed. If transfers are done using random sizes, the sizes are within the range specified by the -perf_min and -perf_max parameters. If fixed size transfers are specified (see the -F flag), transfers are done in a size specified by the -perf_min parameter. This parameter can be used with the -p test flag.

-perf_min

Specifies the minimum transfer size to be performed. This parameter can be used with the -p test flag.

-perf_splits

Specifies how the transfer size will change if you test a range of transfer sizes. The range of transfer sizes is divided by the number specified with the -perf_splits parameter to obtain a transfer size increment. For example, if the -perf_splits parameter is set to 10, tests are run starting with the minimum transfer size and increasing the transfer size by 1/10th of the range of values for each test repetition. The last transfer size is set to the specified maximum transfer size. This parameter can be used with the -p test flag.

-perf_xfers

Specifies the number of transfers to be performed in performance analysis. If this value is not specified, the number of transfers is set equal to the number that is required to read the entire partition. This parameter can be used with the -p test flag.

The following example performs read-only testing on the character device special file that /dev/rrz0 represents. Because a partition is not specified, the test reads from all partitions. The default range of transfer sizes is used. Output from the exerciser program is displayed on the terminal screen:

# diskx -f /dev/rrz0 -r 

The following example runs on the a partition of /dev/rz0, and program output is logged to the diskx.out file. The program output level is set to 10 and causes additional output to be generated:

# diskx -f /dev/rz0a -o diskx.out -d -debug 10  

The following example shows that performance tests are run on the a partition of /dev/rz0, and program output is logged to the diskx.out file. The -S flag causes sequential transfers for the best test results. Testing is done over the default range of transfer sizes:

# diskx -f /dev/rz0a -o diskx.out -p -S 

The following command runs the read test on all partitions of the specified disks. The disk exerciser is invoked as three separate processes, which generate extensive system I/O activity. The command shown in this example can be used to test system stress:

# diskx -f /dev/rrz0 -r &; diskx -f /dev/rrz1 -r &; diskx -f /dev/rrz2 -r &


14.1.7    Exercising a Tape Drive

Use the tapex command to exercise a tape drive. The tapex command writes, reads, and validates random data on a tape device from the beginning-of-tape (BOT) to the end-of-tape (EOT). The tapex command also performs positioning tests for records and files, and tape transportability tests. For more information, refer to the tapex(8) reference page.

Some tapex flags perform specific tests (for example, an end-of-media (EOM) test). Other flags modify the tests, for example, by enabling caching.

The tapex command has the following syntax:

tapex[flags] [parameters]

You can specify one or more of the flags described in Table 14-1. In addition to flags, you can also specify test parameters. You specify parameters on the tapex command line or interactively with the -i flag. If you do not specify test parameters, default values are used.

To use a test parameter, specify the parameter name, a space, and the number value. For example, you could specify the following parameter:

-min_rs 512

Note that you can use the following scaling factors:

For example, 10K would specify 10,240 bytes.

The following parameters can be used with all tests:

-err_lines

Specifies the error printout limit.

-fixed bs

Specifies a fixed block device. Record sizes for most devices default to multiples of the blocking factor of the fixed block device as specified by the bs argument.

The following parameters can be used with the -a flag, which measures performance:

-perf_num

Specifies the number of records to write and read.

-perf_rs

Specifies the size of records.

Other parameters are restricted for use with specific tapex flags. Option-specific parameters are documented in Table 14-1.

Table 14-1: The tapex Options and Option Parameters
tapex FlagFlag and Parameter Descriptions
-a  Specifies the performance measurement test, which calculates the tape transfer bandwidth for writes and reads to the tape by timing data transfers. The following parameters can be used with the -a flag:

-perf_num

Specifies the number of records to write and read.

-perf_rs

Specifies the size of records.

 
-b  Causes the write/read tests to run continuously until the process is killed. This flag can be used with the -r and -g flags. 
-c  Enables caching on the device, if supported. This flag does not specifically test caching; it enables the use of caching on a tape device while other tests are running. 
-C  Disables caching on TMSCP tape devices. If the tape device is a TMSCP unit, then caching is the default mode of test operation. This flag causes the tests to run in noncaching mode. 
-d  Tests the ability to append records to the media. First, the test writes records to the tape. Then, it repositions itself back one record and appends additional records. Finally, the test does a read verification. This test simulates the behavior of the tar -r command. The following parameters can be used with the -d flag:

-no_overwrite

Prevents the append-to-media test from being performed on tape devices that do not support this test. Usually, you use this parameter with the -E flag.

-tar_num

Specifies the number of additional and appended records.

-tar_size

Specifies the record size for all records written in this test.

 
-e  Specifies EOM test. First, this test writes data to fill a tape; this action can take some time for long tapes. It then performs reads and writes past the EOM; these actions should fail. Finally, it enables writing past the EOM, writes to the tape, and reads back the records for validation purposes. 
   
  The following parameters can be used with the -e flag:

-end_num

Specifies the number or records to be written past EOM. (Note that specifying too much data to be written past EOM can cause a reel-to-reel tape to go off line.)

-end_rs

Specifies the record size.

 
-E  Runs an extensive series of tests in sequential order. Depending on tape type and CPU type, this series of tests can take up to 10 hours to complete. 
-f /dev/rmtl#?  Specifies the name of the device special file that corresponds to the tape unit being tested. The number sign variable (#) specifies the unit number. The question mark variable (?) specifies the letter h for the high density device or l for the low density device. The default tape device is /dev/rmt0h
-F  Specifies the file-positioning tests. First, files are written to the tape and verified. Next, every other file on the tape is read. Then, the previously unread files are read by traversing the tape backwards. Finally, random numbers are generated, the tape is positioned to those locations, and the data is verified. Each file uses a different record size. The following parameters can be used with the -F flag:

-num_fi

Specifies the number of files.

-pos_ra

Specifies the number of random repositions.

-pos_rs

Specifies the record size.

-rec_fi

Specifies the number of records per file.

 
-G  Specifies the file-positioning tests on a tape containing data. This flag can be used with the -F flag to run the file position tests on a tape that has been written to by a previous invocation of the -F test. To perform this test, you must use the same test parameters (for example, record size and number of files) that you used when you invoked the -F test to write to the tape. No other data should have been written to the tape since the previous -F test. 
-g  Specifies random record size tests. This test writes records of random sizes. It reads in the tape, specifying a large read size; however, only the amount of data in the randomly sized record should be returned. This test only checks return values; it does not validate record contents. The following parameter is used with the -g flag:

-rand_num

Specifies the number of records to write and read.

 
-h  Displays a help message describing the tape exerciser. 
-i  Specifies interactive mode. In this mode, you are prompted for various test parameters. Typical parameters include the record size and the number of records to write. The following scaling factors are allowed:

  • k or K (for kilobyte (1024 * n))

  • b or B (for block (512 * n))

  • m or M (for megabyte (1024 * 1024 * n))

For example, 10K would specify 10,240 bytes.

 

-j  Specifies the write phase of the tape-transportability tests. This test writes a number of files to the tape and then verifies the tape. After the tape has been successfully verified, it is brought off line, moved to another tape unit, and read in with the -k flag. This test proves that you can write to a tape on one drive and read from a tape on another drive. The -j flag is used with the -k flag. Note the -j flag and the -k flag must use the same parameters. The following parameters can be used with the -j and -k flags:

-tran_file

Specifies the number of files to write or read.

-tran_rec

Specifies the number of records contained in each file.

-tran_rs

Specifies the size of each record.

 
-k  Specifies the read phase of the tape-transportability tests. This test reads a tape that was written by the -j test and verifies that the expected data is read from the tape. This test proves that you can write to a tape on one drive and read from a tape on another drive. As stated in the description of the -j flag, any parameters specified with the -j flag must be specified with the -k flag. (See the description of the -j flag for information on the parameters that apply to the -j and -k flags.) 
-L  Specifies the media loader test. For sequential stack loaders, the media is loaded, written to, and verified. Then, the media is unloaded, and the test is run on the next piece of media. This verifies that all of the media in the input deck can be written to. To run this test in read-only mode, also specify the -w flag. 
-l  Specifies the EOF test. This test verifies that a zero byte count is returned when a tape mark is read and that an additional read fetches the first record of the next tape file. 
-m  Displays tape contents. This is not a test. This flag reads the tape sequentially and prints out the number of files on the tape, the number of records in each file, and the size of the records within the file. The contents of the tape records are not examined. 
-o filename  Sends output to the specified file name. The default sends output to the terminal screen. 
-p  Runs both the record-positioning and file-positioning tests. For more information, refer to descriptions of the -R and -F flags. 
-q  Specifies the command timeout test. This test verifies that the driver allows enough time for completion of long operations. This test writes files to fill the tape. It then performs a rewind, followed by a forward skip to the last file. This test is successful if the forward skip operation is completed without error. 
-r  Specifies the record size test. A number of records are written to the tape and then verified. This process is repeated over a range of record sizes. The following parameters can be used with the -r flag:

-inc

Specifies the record increment factor.

-max_rs

Specifies the maximum record size.

-min_rs

Specifies the minimum record size.

-num_rec

Specifies the number of records.

-t

Specifies a time limit (in minutes). The default is to run the test until it is complete.

 
-R  Specifies the record-positioning test. First, records are written to the tape and verified. Next, every other record on the tape is read. Then, the other records are read by traversing the tape backwards. Finally, random numbers are generated; the tape is positioned to those locations, and the data is verified. The following parameters can be used with the -R flag:

-pos_num

Specifies the number of records.

-pos_ra

Specifies the number of random repositions.

-pos_rs

Specifies the record size.

 
-s  Specifies the record size behavior test. Verifies that a record that is read returns one record (at most) or the read size, whichever is less. The following parameters can be used with the -s flag:

-num_rec

Specifies the number of records.

-size_rec

Specifies the record size.

 
-S  Specifies single record size test. This test modifies the record size test (the -r flag) to use a single record size. The following parameters can be used with the -S flag:

-inc

Specifies the record increment factor.

-max_rs

Specifies the maximum record size.

-min_rs

Specifies the minimum record size.

 
   
 
-num_rec

Specifies the number of records.

 
-T  Displays output to the terminal screen. This flag is useful if you want to log output to a file with the -o flag and also have the output displayed on your terminal screen. This flag must be specified after the -o flag in the command line. 
-v  Specifies verbose mode. This flag causes detailed information to be output. For example, it lists the operations the exerciser is performing (such as record counts), and detailed error information. Information provided by this flag can be useful for debugging purposes. 
-V  Specifies enhanced verbose mode. This flag causes output of more detailed information than the -v flag. The additional output consists of status information on exerciser operations. Information provided by this flag can be useful for debugging purposes. 
-w  Opens the tape as read-only. This mode is useful only for tests that do not write to the media. For example, it allows the -m test to be run on a write-protected media. 
-Z  Initializes the read buffer to the nonzero value 0130. This can be useful for debugging purposes. If the -Z flag is not specified, all elements of the read buffer are initialized to zero. Many of the tests first initialize their read buffer and then perform the read operation. After reading a record from the tape, some tests validate that the unused portions of the read buffer remain at the value to which they were initialized. For debugging purposes, you can set this initialized value to a number other than zero. In this case, you can use the arbitrary value 0130. 

The following example runs an extensive series of tests on tape device rmt1h and sends all output to the tapex.out file:

# tapex -f /dev/rmt1h -E -o tapex.out 

The following example performs random record size tests and outputs information in verbose mode. This test runs on the default tape device /dev/rmt0h, and the output is sent to the terminal screen.

# tapex -g -v  

The following example performs read and write record testing using record sizes in the range 10K to 20K. This test runs on the default tape device /dev/rmt0h, and the output is sent to the terminal screen.

# tapex -r -min_rs 10k -max_rs 20k 

The following example performs a series of tests on tape device /dev/rmt0h, which is treated as fixed block device in which record sizes for tests are multiples of the blocking factor 512 KB. The append-to-media test is not performed.

# tapex -f /dev/rmt0h -fixed 512 -no_overwrite


14.1.8    Exercising the Terminal Communication System

Use the cmx command to exercise the terminal communications system. The cmx command writes, reads, and validates random data and packet lengths on the specified communications lines.

The lines you exercise must have a loopback connector attached to the distribution panel or the cable. Also, the line must be disabled in the /etc/inittab file and in a nonmodem line; that is, the CLOCAL flag must be set to on. Otherwise, the cmx command repeatedly displays error messages on the terminal screen until its time expires or until you press Ctrl/C. For more information, refer to the cmx(8) reference page.

You cannot test pseudodevice lines or lta device lines. Pseudodevices have p, q, r, s, t, u, v, w, x, y, or z as the first character after tty, for example, ttyp3.

The cmx command has the following syntax:

/usr/field/cmx[-h] [-o file] [-t min] [-l line]

The cmx command flags are as follows:

-h

Prints a help message for the cmx command.

-o file

Saves output diagnostics in file.

-t min

Specifies how many minutes you want the cmx command to exercise the communications system. If you do not specify the -t flag, the cmx command runs until you terminate it by pressing Ctrl/C in the foreground.

-l line

Specifies the line or lines you want to test. The possible values for line are found in the /dev directory and are the last two characters of the tty device name. For example, if you want to test the communications system for devices named tty02, tty03, and tty14, specify 02, 03, and 14, separated by spaces, for the line variable. In addition, the line variable can specify a range of lines to test. For example, 00-08.

The following example exercises communication lines tty22 and tty34 for 45 minutes in the background:

# cmx -l 22 34 -t45 &

The following example exercises lines tty00 through tty07 until you press Ctrl/C:

# cmx -l 00-07 


14.2    Understanding the Event-Logging Facilities

The Digital UNIX operating system uses two mechanisms to log system events:

The log files that the system and binary event-logging facilities create have the default protection of 640, are owned by root, and belong to the system group. You must have the proper authority to examine the files.

The following sections describe the event-logging facilities.


14.2.1    System Event Logging

The primary systemwide event-logging facility uses the syslog function to log events in ASCII format. The
syslog function uses the syslogd daemon to collect the messages that are logged by the various kernel, command, utility, and application programs. The syslogd daemon logs the messages to a local file or forwards the messages to a remote system, as specified in the /etc/syslog.conf file.

When you install your Digital UNIX operating system, the /etc/syslog.conf file is created and specifies the default event-logging configuration. The /etc/syslog.conf file specifies the file names that are the destination for the event messages, which are in ASCII format. Section 14.3.1.1 discusses the /etc/syslog.conf file.


14.2.2    Binary Event Logging

The binary event-logging facility detects hardware and software events in the kernel and logs the detailed information in binary format records. Events that are logged by the binary event-logging facility are also logged by the syslog function in a less detailed, but still informative, summary message.

The binary event-logging facility uses the binlogd daemon to collect various event-log records. The binlogd daemon logs these records to a local file or forwards the records to a remote system, as specified in the /etc/binlog.conf default configuration file, which is created when you install your Digital UNIX system.

With Digital UNIX Version 4.0, the event management utility of choice is the DECevent component, in place of the uerf error logging facility. You can examine the binary event-log files by using the dia command (preferred) or by using the uerf command. Both commands translate the records from binary format to ASCII format.


Note

The uerf facility remains as a component of Digital UNIX, but will be retired in a future release of the operating system. See Appendix D or uerf(8) for more information about using uerf.


The DECevent utility is an event managment utility that you can use to produce ASCII reports from entries in the system's event log files. The DECevent utility can be used from the command line and it can be run by selecting it from the System Management Utilities menu box.

For information about administering the DECevent utility, see the following Digital UNIX documentation:

Section 14.3.1.2 discusses the /etc/binlog.conf file.


14.3    Configuring Event Logging

When you install your system, the default system and binary event-logging configuration is used. You can change the default configuration by modifying the configuration files. You can also modify the binary event-logging configuration, if necessary.

To enable system and binary event-logging, the special files must exist and the event-logging daemons must be running. Refer to Section 14.3.2 and Section 14.3.3 for more information.


14.3.1    Editing the Configuration Files

If you do not want to use the default system or binary event-logging configuration, edit the /etc/syslog.conf or /etc/binlog.conf configuration file to specify how the system should log events. In the files, you specify the facility, which is the source of a message or the part of the system that generates a message; the priority, which is the message's level of severity; and the destination for messages.

The following sections describe how to edit the configuration files.


14.3.1.1    The syslog.conf File

If you want the syslogd daemon to use a configuration file other than the default, you must specify the file name with the syslogd -f config_file command.

The following is an example of the default /etc/syslog.conf file:

#
# syslogd config file
#
# facilities: kern user mail daemon auth syslog lpr binary
# priorities: emerg alert crit err warning notice info debug
#
# [1]    [2]                              [3]
kern.debug               /var/adm/syslog.dated/kern.log
user.debug               /var/adm/syslog.dated/user.log
daemon.debug             /var/adm/syslog.dated/daemon.log
auth.crit;syslog.debug   /var/adm/syslog.dated/syslog.log
mail,lpr.debug           /var/adm/syslog.dated/misc.log
msgbuf.err               /var/adm/crash.dated/msgbuf.savecore
kern.debug               /var/adm/messages
kern.debug               /dev/console
*.emerg                  *

Each /etc/syslog.conf file entry has the following entry syntax:

  1. --> Specifies the facility, which is the part of the system generating the message.

  2. --> Specifies the severity level. The syslogd daemon logs all messages of the specified severity level plus all messages of greater severity. For example, if you specify level err, all messages of levels err, crit, alert, and emerg or panic are logged.

  3. --> Specifies the destination where the messages are logged.

The syslogd daemon ignores blank lines and lines that begin with a number sign (#). You can specify a number sign (#) as the first character in a line to include comments in the /etc/syslog.conf file or to disable an entry.

The facility and severity level are separated from the destination by one or more tabs.

You can specify more than one facility and its severity level by separating them with semicolons. In the preceding example, messages from the auth facility of crit severity level and higher and messages from the syslog facility of debug severity level and higher are logged to the /var/adm/syslog.dated/syslog.log file.

You can specify more than one facility by separating them with commas. In the preceding example, messages from the mail and lpr facilities of debug severity level and higher are logged to the /var/adm/syslog.dated/misc.log file.

You can specify the following facilities:

FacilityDescription
kern  Messages generated by the kernel. These messages cannot be generated by any user process. 
user  Messages generated by user processes. This is the default facility. 
mail  Messages generated by the mail system. 
daemon  Messages generated by the system daemons. 
auth  Messages generated by the authorization system (for example: login, su, and getty). 
lpr  Messages generated by the line printer spooling system (for example: lpr, lpc, and lpd). 
local0  Reserved for local use, along with local1 to local7. 
mark  Receives a message of priority info every 20 minutes, unless a different interval is specified with the syslogd -m option. 
msgbuf  Kernel syslog message buffer recovered from a system crash. The savecore command and the syslogd daemon use the msgbuf facility to recover system event messages from a crash. 
*  Messages generated by all parts of the system. 

You can specify the following severity levels, which are listed in order of highest to lowest severity:

Severity LevelDescription
emerg or panic  A panic condition. You can broadcast these messages to all users. 
alert  A condition that you should immediately correct, such as a corrupted system database. 
crit  A critical condition, such as a hard device error. 
err  Error messages. 
warning or warn  Warning messages. 
notice  Conditions that are not error conditions, but are handled as special cases. 
info  Informational messages. 
debug  Messages containing information that is used to debug a program. 
none  Disables a specific facility's messages. 

You can specify the following message destinations:

DestinationDescription
Full pathname  Appends messages to the specified file. You should direct each facility's messages to separate files (for example: kern.log, mail.log, or lpr.log). 
Host name preceded by an at sign (@)  Forwards messages to the syslogd daemon on the specified host. 
List of users separated by commas  Writes messages to the specified users if they are logged in. 
*  Writes messages to all the users who are logged in. 

You can specify in the /etc/syslog.conf file that the syslogd daemon create daily log files. To create daily log files, use the following syntax to specify the pathname of the message destination:

/var/adm/syslog.dated/{file}

The file variable specifies the name of the log file, for example, mail.log or kern.log.

If you specify a /var/adm/syslog.dated/file pathname destination, each day the syslogd daemon creates a subdirectory under the /var/adm/syslog.dated directory and a log file in the subdirectory by using the following syntax:

/var/adm/syslog.dated/date / file

The date variable specifies the day, month, and time that the log file was created.

The file variable specifies the name of the log file you previously specified in the /etc/syslog.conf file.

The syslogd daemon automatically creates a new date directory every 24 hours and also when you boot the system.

For example, to create a daily log file of all mail messages of level info or higher, edit the /etc/syslog.conf file and specify an entry similar to the following:

mail.info		/var/adm/syslog.dated/mail.log

If you specify the previous command, the syslogd daemon could create the following daily directory and file:

/var/adm/syslog.dated/11-Jan-12:10/mail.log


14.3.1.2    The binlog.conf File

If you want the binlogd daemon to use a configuration file other than the default, specify the file name with the binlogd -f config_file command.

The following is an example of a /etc/binlog.conf file:

#
# binlogd configuration file
#
# format of a line:   event_code.priority         destination
#
# where:
# event_code - see codes in binlog.h and man page, * = all events
# priority   - severe, high, low, * = all priorities
# destination - local file pathname or remote system hostname
#
#
*.*			/usr/adm/binary.errlog
dumpfile		/usr/adm/crash/binlogdumpfile
102.high		/usr/adm/disk.errlog
[1]    [2]                     [3]

Each entry in the /etc/binlog.conf file, except the dumpfile event class entry, contains three fields:

  1. --> Specifies the event class code that indicates the part of the system generating the event.

  2. --> Specifies the severity level of the event. Do not specify a severity level if you specify dumpfile for an event class.

  3. --> Specifies the destination where the binary event records are logged.

The binlogd daemon ignores blank lines and lines that begin with a number sign (#). You can specify a number sign (#) as the first character in a line to include comments in the file or to disable an entry.

The event class and severity level are separated from the destination by one or more tabs.

You can specify the following event class codes:

Class CodeGeneral
All event classes. 
dumpfile  Specifies the recovery of the kernel binary event log buffer from a crash dump. A severity level cannot be specified. 

Class CodeHardware-Detected Events
100  CPU machine checks and exceptions 
101  Memory 
102  Disks 
103  Tapes 
104  Device controller 
105  Adapters 
106  Buses 
107  Stray interrupts 
108  Console events 
109  Stack dumps 
199  SCSI CAM events 

Class CodeSoftware-Detected Events
201  CI port-to-port-driver events 
202  System communications services events 

Class CodeInformational ASCII Messages
250  Generic 

Class CodeOperational Events
300  Startup ASCII messages 
301  Shutdown ASCII messages 
302  Panic messages 
310  Time stamp 
350  Diagnostic status messages 
351  Repair and maintenance messages 

You can specify the following severity levels:

Severity LevelDescription
All severity levels 
severe  Unrecoverable events that are usually fatal to system operation 
high  Recoverable events or unrecoverable events that are not fatal to system operation 
low  Informational events 

You can specify the following destinations:

DestinationDescription
Full pathname  Specifies the file name to which the binlogd daemon appends the binary event records. 
@hostname  Specifies the name of the host (preceded by an @) to which the binlogd daemon forwards the binary event records. If you specify dumpfile for an event class, you cannot forward records to a host. 


14.3.2    Creating the Special Files

The syslogd daemon cannot log kernel messages unless the /dev/klog character special file exists. If the /dev/klog file does not exist, create it by using the following command syntax:

/dev/MAKEDEV /dev/klog

Also, the binlogd daemon cannot log local system events unless the /dev/kbinlog character special file exists. If the /dev/kbinlog file does not exist, create it by using the following command syntax:

/dev/MAKEDEV /dev/kbinlog

Refer to the MAKEDEV(8) reference page for more information.


14.3.3    Starting and Stopping the Event-Logging Daemons

The syslogd and binlogd daemons are automatically started by the init program during system startup. However, you must ensure that the daemons are started. You can also specify options with the command that starts the daemons. Refer to the init(8) reference page for more information.


14.3.3.1    The syslogd Daemon

You must ensure that the syslogd daemon is started by the init program. If the syslogd daemon is not started or if you want to specify options with the command that starts the syslogd daemon, you must edit the /sbin/init.d/syslog file and either include or modify the syslogd command line. Note that you can also invoke the command manually.

The command that starts the syslogd daemon has the following syntax:

/usr/sbin/syslogd[-d] [-fconfig_file] [-mmark_interval]

Refer to the syslogd(8) reference page for information about command options.


Note

You must ensure that the /var/adm directory is mounted, or the syslogd daemon will not work correctly.


The syslogd daemon reads messages from the following:

Messages from other programs use the openlog, syslog, and closelog calls.

When the syslogd daemon is started, it creates the /var/run/syslog.pid file, where the syslogd daemon stores its process identification number. Use the process identification number to stop the syslogd daemon before you shut down the system.

During normal system operation, the syslogd daemon is called if data is put in the kernel syslog message buffer, located in physical memory. The syslogd daemon reads the /dev/klog file and gets a copy of the kernel syslog message buffer. The syslogd daemon starts at the beginning of the buffer and sequentially processes each message that it finds. Each message is prefixed by facility and priority codes, which are the same as those specified in the /etc/syslog.conf file. The syslogd daemon then sends the messages to the destinations specified in the file.

To stop the syslogd event-logging daemon, use the following command:

# kill `cat /var/run/syslog.pid` 

You can apply changes that you make to the /etc/syslog.conf configuration file without shutting down the system by using the following command:

# kill -HUP `cat /var/run/syslog.pid` 


14.3.3.2    The binlogd Daemon

You must ensure that the init program starts the binlogd daemon. If the binlogd daemon does not start, or if you want to specify options with the command that starts the binlogd daemon, you must edit the /sbin/init.d/syslog file and either include or modify the binlogd command line. Note that you can also invoke the command manually.

The command that starts the binlogd daemon has the following syntax:

/usr/sbin/binlogd[-d] [-fconfig_file]

Refer to the binlogd(8) reference page for information on command options.

The binlogd daemon reads binary event records from the following:

When the binlogd daemon starts, it creates the /var/run/binlogd.pid file, where the binlogd daemon stores its process identification number. Use the process identification number to stop or reconfigure the binlogd daemon.

During normal system operation, the binlogd daemon is called if data is put into the kernel's binary event-log buffer or if data is received on the Internet domain socket. The binlogd daemon then reads the data from the /dev/kbinlog special file or from the socket. Each record contains an event class code and a severity level code. The binlogd daemon processes each binary event record and logs it to the destination specified in the /etc/binlog.conf file.

To stop the binlogd daemon, use the following command:

# kill `cat /var/run/binlogd.pid` 

You can apply changes that you make to the /etc/binlog.conf configuration file without shutting down the system by using the following command:

# kill -HUP `cat /var/run/binlogd.pid` 


14.3.4    Configuring the Kernel Binary Event Logger

You can configure the kernel binary event logger by modifying the default keywords and rebuilding the kernel. You can scale the size of the kernel binary event-log buffer to meet your systems needs. You can enable and disable the binary event logger and the logging of kernel ASCII messages into the binary event log.

The /sys/data/binlog_data.c file defines the binary event-logger configuration. The default configuration specifies a buffer size of 24K bytes, enables binary event logging, and disables the logging of kernel ASCII messages. You can modify the configuration by changing the values of the binlog_bufsize and binlog_status keywords in the file.

The binlog_bufsize keyword specifies the size of the kernel buffer that the binary event logger uses. The size of the buffer can be between 8 kilobytes (8192 bytes) and 48 kilobytes (49152 bytes). Small system configurations, such as workstations, can use a small buffer. Large server systems that use many disks may need a large buffer.

The binlog_status keyword specifies the behavior of the binary event logger. You can specify the following values for the binlog_status keyword:

0 (zero)

Disables the binary event logger.

BINLOG_ON

Enables the binary event logger.

BINLOG_ASCIION

Enables the logging of kernel ASCII messages into the binary event log if the binary event logger is enabled. This value must be specified with the BINLOG_ON value as follows: int binlog_status = BINLOG_ON | BINLOG_ASCII;

After you modify the /sys/data/binlog_data.c file, you must rebuild and boot the new kernel.


14.4    Recovering Event Logs After a System Crash

You can recover unprocessed messages and binary event-log records from a system crash when you reboot the system.

The msgbuf.err entry in the /etc/syslog.conf file specifies the destination of the kernel syslog message buffer msgbuf that is recovered from the dump file. The default /etc/syslog.conf file entry for the kernel syslog message buffer file is as follows:

msgbuf.err            /var/adm/crash/msgbuf.savecore

The dumpfile entry in the /etc/binlog.conf file specifies the file name destination for the kernel binary event-log buffer that is recovered from the dump file. The default /etc/binlog.conf file entry for the kernel binary event-log buffer file is as follows:

dumpfile              /usr/adm/crash/binlogdumpfile

If a crash occurs, the syslogd and binlogd daemons cannot read the /dev/klog and /dev/kbinlog special files and process the messages and binary event records. When you reboot the system, the savecore command runs and, if a dump file exists, recovers the kernel syslog message and binary event-log buffers from the dump file. After savecore runs, the syslogd and binlogd daemons are started.

The syslogd daemon reads the syslog message buffer file, checks that its data is valid, and then processes it in the same way that it normally processes data from the /dev/klog file, using the information in the /etc/syslog.conf file.

The binlogd daemon reads the binary event-log buffer file, checks that its data is valid, and then processes the file in the same way that it processes data from the /dev/kbinlog special file, using the information in the /etc/binlog.conf file.

After the syslogd and binlogd daemons are finished with the buffer files, the files are deleted.


14.5    Maintaining Log Files

If you specify full pathnames for the message destinations in the /etc/syslog.conf and /etc/binlog.conf files, the log files will grow in size. Also, if you configure the syslogd daemon to create daily directories and log files, eventually there will be many directories and files, although the files themselves will be small. Therefore, you must keep track of the size and the number of log files and daily directories and delete the files and directories if they become unwieldy.

You can also use the cron daemon to specify that log files be deleted. The following is an example of a crontab file entry:

5 1 * * * find /var/adm/syslog.dated -type d -mtime +5 -exec rm -rf '{}' \;

The previous command line causes all directories under /var/adm/syslog.dated that were modified more than 5 days ago to be deleted, along with their contents, every day at 1:05. Refer to the crontab(1) reference page for more information.


14.6    Environmental Monitoring

On any system, thermal levels can increase because of poor ventilation, overheating conditions, or fan failure. Without detection, an unscheduled shutdown could ensue causing the system's loss of data or damage to the system itself. By using Environmental Monitoring, the thermal state of AlphaServer systems can be detected and users can be alerted in time enough to recover or perform an orderly shutdown of the system.

This chapter discusses how Environmental Monitoring is implemented on AlphaServer systems.


14.6.1    Environmental Monitoring Framework

The Environmental Monitoring framework consists of four components:
loadable kernel module and its associated APIs, Server System MIB subagent daemon, the envmond daemon, and the envconfig utility.


14.6.1.1    Loadable Kernel Module

The loadable kernel module and its associated APIs contain the parameters needed to
monitor and return status on your system's threshold levels. The kernel module exports server management attributes as described in Section 14.6.1.1.1 through the kernel configuration manager (CFG) interface only. It works across all platforms that support server management, and provides compatibility for other server management systems under development. The kernel module is supported on all Alpha systems running Version 4.0A or higher of the Digital UNIX operating system.

The loadable kernel module does not include platform specific code (such as the location of status registers). It is transparent to the kernel module which options are supported by a platform. That is, the kernel module and platform are designed to return valid data if an option is supported, a fixed constant for unsupported options, or null.


14.6.1.1.1    Specifying Loadable Kernel Attributes
The loadable kernel module exports the parameters listed in
Table 14-12 to the kernel configuration manager (CFG).

Table 14-12: Parameters Defined in the Kernel Module
ParameterPurpose
env_current_temp  Specifies the current temperature of the system. If a system is configured with the KCRCM module, the temperature returned is in Celsius. If a system does not support temperature readings and a temperature threshold has not been exceeded, a value of -1 is returned. If a system does not support temperature readings and a temperature threshold is exceeded, a value of -2 is returned. 
env_high_temp_thresh  Provides a system specific operating temperature threshold. The value returned is a hardcoded, platform specific temperature in Celsius. 
env_fan_status  Specifies a noncritical fan status. The value returned is a bit value of zero (0). This value will differ when the hardware support is provided for this feature. 
env_ps_status  Provides the status of the redundant power supply. On platforms that provide interrupts for redundant power supply failures, the corresponding error status bits are read to determine the return value. A value of 1 is returned on error; otherwise, a value of zero (0) is returned. 
env_supported  Indicates whether or not the platform supports server management and environmental monitoring. 


14.6.1.1.2    Obtaining Platform Specific Functions
The loadable kernel module must return environmental status based on the platform being queried. This section describes the kernel interfaces used. To obtain environmental status, the
get_info() function is used. Calls to the get_info() function are filtered through the platform_callsw[] table.

The get_info() function obtains dynamic environmental data using the function types described in Table 14-13.

Table 14-13: get_info() Function Types
Function TypeUse of Function
GET_SYS_TEMP  Reads the system's internal temperature on platforms that have a KCRCM module configured. 
GET_FAN_STATUS  Reads fan status from error registers. 
GET_PS_STATUS  Reads redundant power supply status from error registers. 
The get_info() function obtains static data using the HIGH_TEMP_THRESH function type, which reads the platform specific upper threshold operational temperature.


14.6.1.1.3    Server System MIB Subagent
The Server System MIB Agent, (which is an eSNMP sub-agent) is used to export a subset of the Environmental Monitoring parameters specified in the Server System MIB. The Digital Server System MIB exports a common set of hardware specific
parameters across all server platforms on all operating systems offered by Digital. Table 14-14 maps the subset of Server System MIB variables that support Environmental Monitoring to the kernel parameters described in Section 14.6.1.1.1.

Table 14-14: Mapping of Server Subsystem Variables
Server System MIB Variable NameKernel Module Parameter
svrThSensorReading  env_current_temp 
svrThSensorStatus  env_current_temp 
svrThSensorHighThresh  env_high_temp_thresh 
svrPowerSupplyStatus  env_ps_temp 
svrFanStatus  env_fan_status 
An SNMP MIB compiler and other tools are used to compile the MIB description into code for a skeletal subagent daemon. Communication between the subagent daemon and the eSNMP daemon is handled by interfaces in the eSnmp shared library (libesnmp.so). The subagent daemon must be started when the system boots and after the eSNMP daemon has started.

For each Server System MIB variable listed in Table 14-14, code is provided in the subagent daemon, which accesses the appropriate parameter from the kernel module through the CFG interface.


14.6.1.2    Monitoring Environmental Thresholds

To monitor the system environment, the envmond daemon is used. You can customize the daemon by using the envconfig utility. The following sections discuss the daemon and utility. For more information, see the envmond and envconfig reference pages.


14.6.1.2.1    Environmental Monitoring Daemon
By using the Environmental Monitoring daemon, envmond, threshold levels can be checked and corrective action can ensue before damage occurs to your system. Then envmond daemon performs the following:
To query the system, the envmond daemon uses the base operating system command /usr/sbin/snmp_request to obtain the current values of the environment variables specified in the Server System MIB.

To enable Environmental Monitoring, the envmond daemon must be started during the system boot, but after the eSNMP and Server System MIB agents have been started. You can customize the envmond daemon using the envconfig utility.


14.6.1.2.2    Customizing the envmond Daemon
You can use the envconfig utility to customize how the environment is queried by the envmond daemon. These customizations are stored in the
/etc/rc.config file, which is read by the envmond daemon during startup. Use the envconfig utility to perform the following: