3    Editing Files with the sed Editor

The sed stream editor is a program that works much like the interactive ed program, but you do not need to know how to use the ed line editor to use the material presented here. Unlike ed, sed edits files by using a prepared list of commands, called a script, instead of interacting with the user. This method of operation makes sed particularly well suited for tasks like the following:


3.1    Overview of the sed Editor

The sed stream editor receives its input from standard input or from a named file, changes that input as directed by commands in a command file or on the command line, and writes the resulting stream to standard output. If you specify more than one input file, sed processes each file in sequence and concatenates the results to standard output. If you do not provide a command file and do not use any with the sed command flags, sed copies standard input to standard output without change. The editor keeps only a few lines of the file being edited in memory at one time and does not use temporary files. Therefore, the size of the file to be edited is limited only by the available disk space.

The command script for sed can be a file that you create before running the editor, a series of commands you enter as a command flag, or both. The editor cannot process more than 99 commands in a single invocation; for this reason or to accomplish certain extremely complex editing tasks, you might need to pipe the output from sed into another instance of sed.


3.2    Running the sed Editor

The syntax for the sed command is as follows:

sed [ [-n] ] [ [ [-e] ] [script] ] [ [-f script_file] ] [ [source_file1 ] [ [source-file2 ...] ] ]

Table 3-1 describes the flags for the sed command.

Table 3-1:  Flags for the sed Command

Flag Description
-e script Add the editing commands specified by the string script to the end of the script of editing commands. If you are using just one -e flag and no -f flag, you can omit the -e flag and include the single script on the command line as an argument to sed.
-f script_file Uses script_file as the source of the edit script. The script_file is a set of editing commands to be applied to the input.
-n Suppresses all information normally written to standard output.

The order of presentation of the -e and -f options is important. Usually, you create a command file containing the desired editing commands before running sed. The sed editor's command set is powerful and requires little typing. Each command in the command file can be on a separate line, or you can place multiple commands on one line by separating them with semicolons ( ; ). For example, either of the following two scripts will delete all lines beginning with .ne, .RE, or .RS:

Script 1:

/^\.ne/d
/^\.R[ES]/d

Script 2:

/^\.ne/d;/^\.R[ES]/d

Once you create the command file (cmdfile in the following example), enter the sed command as in this example:

$ sed -f cmdfile infile > outfile

This command edits infile using the commands contained in cmdfile, and writes the output to outfile. The input file is not changed.

With a short editing script, you can accomplish the same job by entering the editing commands as an argument to the -e flag on the command line:


$ sed -e '/^\.ne/d;/^\.R[ES]/d' infile > outfile

If you use the -e and -f flags together on a command line, sed applies all the commands specified by both flags, in the order in which the flags appear. For example:

$ echo "s/line/foo/" > sedx
$  echo "Test line" | sed -f sedx -e 's/line/bar/'
Test foo
$  echo "Test line" | sed -e 's/line/bar/' -f sedx
Test bar

You can use the -e and -f flags more than once with a given sed command. For example:

$ sed -f script1 -e 's/foo/bar/' -f script2 msgs > msgs2

When you start sed, the editor reads and compiles the command script, checking for syntax and organizing the commands for efficiency. It then reads the input file one line at a time into an area of memory called the pattern space. The editor then tries to match the addresses specified by the commands in the script, one after another, to the lines in the pattern space. Whenever a command's address matches any line or lines in the pattern space, sed applies that editing command to the matched text.

Commands are applied in sequence to the text, and the results of each command are used as the input for subsequent commands. When no more commands match a given line in the pattern space, sed writes that line to the output, reads more input, and repeats the process. Figure 3-1 is a flowchart of this sequence. Compare the operation of sed with the very similar operation of the awk program, shown in Figure 2-1.

Figure 3-1:  Sequence of sed Processing

Some editing commands change the way the editing process operates by causing the editor to bypass other script commands, by inhibiting the writing of certain lines (by deleting them), or by ending the process prematurely.


3.3    Selecting Lines for Editing

The sed editor identifies lines to be edited by matching addresses. An address can be either a line number or a context address:

You can specify any character as a pattern delimiter for a given command by preceding the first use of the character with a backslash. For example, the following two patterns are interpreted identically:

/abc/
\xabcx

In the second pattern, the letter x is used as the pattern delimiter. If you use an alternative pattern delimiter in this way, you can match a literal occurrence of that character by preceding it with a backslash; the pattern \x\xyzx matches the string "xyz".

The sed editor recognizes the standard set of basic regular expressions described in Chapter 1. In addition to these expressions, sed recognizes the special expressions shown in Table 3-2.

Table 3-2:  Special Regular Expressions Recognized by sed

Expression Name Rule
\n Embedded newline (a backslash followed by the letter n) Matches an embedded newline character in a line formed by joining multiple lines.
// Empty pattern delimiters (slashes by default) Matches the text that matched the most recently specified regular expression.

Some sed commands do not accept addresses. Commands that accept addresses behave differently depending on the number of addresses, as follows:

Note

If two addresses are specified but sed cannot find a line matching the ending address, sed operates on every line from the first address to the end of the file.


3.4    Summary of sed Commands

Each sed command consists of a single letter with optional addresses. Some commands require arguments and accept qualifiers that alter their behavior. Do not include any space between the addresses and the letter. If you use two addresses with a command, separate them with a comma. The r and w commands and the w flag for the s command require a single space between the letter and the argument; otherwise, do not include any space between the letter and the argument.

Table 3-3, Table 3-4, and Table 3-5 describe the individual sed commands, showing the syntax of each. In these tables, the following conventions apply:

The following example illustrates a correctly formed s command with all optional elements:

1,/^$/s/vizier//g

This example processes the header of a mail message (line 1 to the first completely blank line), replacing the string vizier with nothing wherever the string occurs on any line in the specified range.

Table 3-3:  Text Editing and Movement Commands

Command Description
Append text

[addr1]a\ text[\ text...]

Writes the specified text [Footnote 2] to the output after the line specified by addr1. See also the i command.
Change lines

[addr1[,addr2]]c\ text[\ text...]

Deletes the addressed range of lines and writes the specified text [Footnote 2] to the output in its place. [Footnote 3]
Delete lines

[addr1[,addr2]]d

Deletes the specified range of lines. [Footnote 3]
Delete the first line of the pattern space

[addr1[,addr2]]D

Deletes all text in the pattern space up to and including the first newline character. If only one line is in the pattern space, this command reads another line from the input into the pattern space. After these operations, the command starts the complete list of editing commands again from the beginning.
Insert lines

[addr1]i\ text[\ text...]

Writes the specified text [Footnote 2] to the output abefore the line specified by addr1. See also the a command.
Advance in the file

[addr1[,addr2]]n

Writes the indicated range from the pattern space (if not deleted) to the output and then reads the next line from the input into the pattern space.
Join lines

[addr1[,addr2]]N

Joins the indicated lines together as a single line with embedded newline characters. If only one address is given, the command joins the specified line to the next line in the input stream. Pattern matches for addressing or for string replacement can extend across embedded newline characters. Use \n to indicate an embedded newline character for matching.
Print lines

[addr1[,addr2]]p

Writes the specified range of lines to the output at the point in the editing process where the p command appears. This command can be used to reorder sections of a file.
Print the first line in the pattern space

[addr1[,addr2]]P

Writes all text in the pattern space, up to and including the first newline character, to the output at the point in the editing process where the P command appears.
Read and append a file

[addr1]r file

Reads the named file [Footnote 4] and writes the file's contents to the output after addr1.
Substitute text

[addr1[,addr2]]s/expr/string/[flags]

  Searches the indicated lines for a string of characters matching the expression defined by expr, and replaces that set of characters with string. This command's operation is modified by the g, p, and file flags. If either expr or string includes a slash ( / ), you must escape the literal slash with a backslash (s/path/path\/file/) or use alternative delimiters such as the at sign ( @ ) or question mark ( ? ). For example, s@path@path/file@ replaces path with path/file. [Footnote 5]
Write a named file

[addr1[,addr2]]w file

Writes the specified range of lines to the named file [Footnote 6] at the point in the editing process where the w command appears.
Print line number

[addr1]=

Writes the line number of the indicated line to the output.

Table 3-4:  Buffer Manipulation Commands

Command Description
Retrieve text from hold area

[addr1[,addr2]]g

[addr1[,addr2]]G

Copies the contents of the hold area to the pattern space indicated by addr1 and addr2, if present. The g command destroys the existing contents of the pattern space; the G command appends the held text to the contents of the pattern space, separating the previous text from the appended text with a newline character.
Move text to the hold area

[addr1[,addr2]]h

[addr1[,addr2]]H

Copies the indicated range from the pattern space to the hold area. The h command destroys the existing contents of the hold area; the H command appends the text in the pattern space to the contents of the hold area, separating the previous text from the appended text with a newline character.
Exchange pattern space and hold area

[addr1[,addr2]]x

Exchanges the contents of the pattern space with those of the hold area.

Table 3-5:  Flow-of-Control Commands

Command Description
Range negation

[addr1[,addr2]]!cmd

The exclamation point ( ! ) instructs sed to apply the command following it on the same line to the parts of the input file that are not selected by addr1 and addr2 instead of applying it to the selected range.
Command grouping

[addr1[,addr2]]{ nested commands }

The left and right braces enclose a group of commands to be applied as a set to the range specified by addr1 and addr2. The first command in the set can be on the line following the left brace, as illustrated in this table, or it can be on the same line with the brace. The right brace must be on a line by itself. Groups can be nested within other groups.
Label

:label

Marks a place in the stream of editing commands to be used as a destination of a branch command. The label is a string of up to 8 bytes. Each label in the editing stream must be unique. For a related discussion, see the description of the t command in the sed(1) reference page.
Branch

blabel

Branches to the point in the editing script indicated by label and continues processing the current input line with the commands following the label. If label is null, the b command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.
Conditional branch

tlabel

If any successful substitutions were made on the current input line, branches to label; otherwise, the command does not branch. In either case, the command clears the flag that indicates a substitution was made. This flag is also cleared at the start of each new input line. If label is null and the branch is taken, the t command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.
Stop

[addr1]q

Stops editing in an orderly fashion by writing the current line to the output, writing any appended or read text to the output, and then exiting.


3.5    String Replacement

The s command performs string replacement on the indicated lines in the input file. If the editor finds a string of characters in the input file that satisfies the pattern expression expr, it replaces that string with the set of characters specified in string. The string argument is not a regular expression, and it is not scanned or otherwise interpreted except as follows:

You can modify the behavior of the s command with flags, as follows:

Any or all of these flags can be used with a given s command; in combinations, the w flag must be the last flag specified.