3 Editing Files with the sed Editor

The sed stream editor is a program that works much like the interactive ed program, but you do not need to know how to use the ed line editor to use the material presented here. Unlike ed, sed edits files by using a prepared list of commands, called a script, instead of interacting with the user. This method of operation makes sed particularly well suited for tasks like the following:

Editing large files

Performing complex editing operations many times without extensive retyping and cursor positioning

Performing global changes in one pass through the input

3.1 Overview of the sed Editor

The sed stream editor receives its input from standard input or from a named file, changes that input as directed by commands in a command file or on the command line, and writes the resulting stream to standard output. If you specify more than one input file, sed processes each file in sequence and concatenates the results to standard output. If you do not provide a command file and do not use any with the sed command flags, sed copies standard input to standard output without change. The editor keeps only a few lines of the file being edited in memory at one time and does not use temporary files. Therefore, the size of the file to be edited is limited only by the available disk space.

The command script for sed can be a file that you create before running the editor, a series of commands you enter as a command flag, or both. The editor cannot process more than 99 commands in a single invocation; for this reason or to accomplish certain extremely complex editing tasks, you might need to pipe the output from sed into another instance of sed.

3.2 Running the sed Editor

The syntax for the sed command is as follows:

sed [ [-n] ] [ [ [-e] ] [script] ] [ [-f script_file] ] [ [source_file1 ] [ [source-file2 ...] ] ]

Table 3-1 describes the flags for the sed command.

Table 3-1: Flags for the sed Command

Flag Description

-e script Add the editing commands specified by the string script to the end of the script of editing commands. If you are using just one -e flag and no -f flag, you can omit the -e flag and include the single script on the command line as an argument to sed.

-f script_file Uses script_file as the source of the edit script. The script_file is a set of editing commands to be applied to the input.

-n Suppresses all information normally written to standard output.

Flag	Description
`-e` `script`	Add the editing commands specified by the string script to the end of the script of editing commands. If you are using just one `-e` flag and no `-f` flag, you can omit the `-e` flag and include the single `script` on the command line as an argument to `sed`.
`-f` `script_file`	Uses `script_file` as the source of the edit script. The `script_file` is a set of editing commands to be applied to the input.
`-n`	Suppresses all information normally written to standard output.

The order of presentation of the -e and -f options is important. Usually, you create a command file containing the desired editing commands before running sed. The sed editor's command set is powerful and requires little typing. Each command in the command file can be on a separate line, or you can place multiple commands on one line by separating them with semicolons ( ; ). For example, either of the following two scripts will delete all lines beginning with .ne, .RE, or .RS:

Script 1:

/^\.ne/d
/^\.R[ES]/d

Script 2:

/^\.ne/d;/^\.R[ES]/d

Once you create the command file (cmdfile in the following example), enter the sed command as in this example:

$ sed -f cmdfile infile > outfile

This command edits infile using the commands contained in cmdfile, and writes the output to outfile. The input file is not changed.

With a short editing script, you can accomplish the same job by entering the editing commands as an argument to the -e flag on the command line:


$ sed -e '/^\.ne/d;/^\.R[ES]/d' infile > outfile

If you use the -e and -f flags together on a command line, sed applies all the commands specified by both flags, in the order in which the flags appear. For example:

$ echo "s/line/foo/" > sedx
$  echo "Test line" | sed -f sedx -e 's/line/bar/'
Test foo
$  echo "Test line" | sed -e 's/line/bar/' -f sedx
Test bar

You can use the -e and -f flags more than once with a given sed command. For example:

$ sed -f script1 -e 's/foo/bar/' -f script2 msgs > msgs2

When you start sed, the editor reads and compiles the command script, checking for syntax and organizing the commands for efficiency. It then reads the input file one line at a time into an area of memory called the pattern space. The editor then tries to match the addresses specified by the commands in the script, one after another, to the lines in the pattern space. Whenever a command's address matches any line or lines in the pattern space, sed applies that editing command to the matched text.

Commands are applied in sequence to the text, and the results of each command are used as the input for subsequent commands. When no more commands match a given line in the pattern space, sed writes that line to the output, reads more input, and repeats the process. Figure 3-1 is a flowchart of this sequence. Compare the operation of sed with the very similar operation of the awk program, shown in Figure 2-1.

Figure 3-1: Sequence of sed Processing

Some editing commands change the way the editing process operates by causing the editor to bypass other script commands, by inhibiting the writing of certain lines (by deleting them), or by ending the process prematurely.

3.3 Selecting Lines for Editing

The sed editor identifies lines to be edited by matching addresses. An address can be either a line number or a context address:

Line numbers The first line in the input stream is line 1, and each successive line increments the line counter by one. The dollar sign ($) is a shorthand way to specify the last line of the input stream. If you edit more than one file in a single invocation of sed, the line counter is cumulative across all the files edited; for example, if the first file contains 100 lines, the first line of the second file is line 101.

Context addresses A context address is a regular expression enclosed in pattern delimiters (usually slashes); for example, /^\.R[ES]/ matches any line beginning with either .RE or .RS.

You can specify any character as a pattern delimiter for a given command by preceding the first use of the character with a backslash. For example, the following two patterns are interpreted identically:

/abc/
\xabcx

In the second pattern, the letter x is used as the pattern delimiter. If you use an alternative pattern delimiter in this way, you can match a literal occurrence of that character by preceding it with a backslash; the pattern \x\xyzx matches the string "xyz".

The sed editor recognizes the standard set of basic regular expressions described in Chapter 1. In addition to these expressions, sed recognizes the special expressions shown in Table 3-2.

Table 3-2: Special Regular Expressions Recognized by sed

Expression Name Rule

\n Embedded newline (a backslash followed by the letter n) Matches an embedded newline character in a line formed by joining multiple lines.

// Empty pattern delimiters (slashes by default) Matches the text that matched the most recently specified regular expression.

Expression	Name	Rule
`\n`	Embedded newline (a backslash followed by the letter `n`)	Matches an embedded newline character in a line formed by joining multiple lines.
//	Empty pattern delimiters (slashes by default)	Matches the text that matched the most recently specified regular expression.

Some sed commands do not accept addresses. Commands that accept addresses behave differently depending on the number of addresses, as follows:

If no address is specified, the command is applied to every line in the input stream.

If one address is specified, the command is applied to each line that matches the address.

If two addresses are specified, the command is applied to a group of lines starting with a line that matches the first address and ending with the first subsequent line that matches the second address. The editor then tries to match the first address again to find another range.

Note

If two addresses are specified but sed cannot find a line matching the ending address, sed operates on every line from the first address to the end of the file.

3.4 Summary of sed Commands

Each sed command consists of a single letter with optional addresses. Some commands require arguments and accept qualifiers that alter their behavior. Do not include any space between the addresses and the letter. If you use two addresses with a command, separate them with a comma. The r and w commands and the w flag for the s command require a single space between the letter and the argument; otherwise, do not include any space between the letter and the argument.

Table 3-3, Table 3-4, and Table 3-5 describe the individual sed commands, showing the syntax of each. In these tables, the following conventions apply:

The term "range of lines" can mean a single line, a group of lines, or all lines, as specified by the number of addresses given to the command.

Brackets [ ] enclose optional elements. Nested brackets indicate that the nested element can be used only if the enclosing element is present.

Italic (slanted) type indicates a general name for an object that you specify; for example, file represents a command argument that must be the name of a file.

The following example illustrates a correctly formed s command with all optional elements:

1,/^$/s/vizier//g

This example processes the header of a mail message (line 1 to the first completely blank line), replacing the string vizier with nothing wherever the string occurs on any line in the specified range.

Table 3-3: Text Editing and Movement Commands

Command Description

Append text

[addr1]a\ text[\ text...]
Writes the specified text ^{[Footnote 2]} to the output after the line specified by addr1. See also the i command.

Change lines

[addr1[,addr2]]c\ text[\ text...]

Deletes the addressed range of lines and writes the specified text ^{[Footnote 2]} to the output in its place. ^{[Footnote 3]}

Delete lines

[addr1[,addr2]]d

Deletes the specified range of lines. ^{[Footnote 3]}

Delete the first line of the pattern space

[addr1[,addr2]]D

Deletes all text in the pattern space up to and including the first newline character. If only one line is in the pattern space, this command reads another line from the input into the pattern space. After these operations, the command starts the complete list of editing commands again from the beginning.

Insert lines

[addr1]i\ text[\ text...]

Writes the specified text ^{[Footnote 2]} to the output abefore the line specified by addr1. See also the a command.

Advance in the file

[addr1[,addr2]]n

Writes the indicated range from the pattern space (if not deleted) to the output and then reads the next line from the input into the pattern space.

Join lines

[addr1[,addr2]]N

Joins the indicated lines together as a single line with embedded newline characters. If only one address is given, the command joins the specified line to the next line in the input stream. Pattern matches for addressing or for string replacement can extend across embedded newline characters. Use \n to indicate an embedded newline character for matching.

Print lines

[addr1[,addr2]]p

Writes the specified range of lines to the output at the point in the editing process where the p command appears. This command can be used to reorder sections of a file.

Print the first line in the pattern space

[addr1[,addr2]]P

Writes all text in the pattern space, up to and including the first newline character, to the output at the point in the editing process where the P command appears.

Read and append a file

[addr1]r file

Reads the named file ^{[Footnote 4]} and writes the file's contents to the output after addr1.

Substitute text

[addr1[,addr2]]s/expr/string/[flags]

Searches the indicated lines for a string of characters matching the expression defined by expr, and replaces that set of characters with string. This command's operation is modified by the g, p, and w file flags. If either expr or string includes a slash ( / ), you must escape the literal slash with a backslash (s/path/path\/file/) or use alternative delimiters such as the at sign ( @ ) or question mark ( ? ). For example, s@path@path/file@ replaces path with path/file. ^{[Footnote 5]}

Write a named file

[addr1[,addr2]]w file

Writes the specified range of lines to the named file ^{[Footnote 6]} at the point in the editing process where the w command appears.

Print line number

[addr1]=

Writes the line number of the indicated line to the output.

Command	Description
Append text
`[addr1]a\` `text[\` `text...]`	Writes the specified text ^{[Footnote 2]} to the output after the line specified by `addr1`. See also the `i` command.
Change lines
`[addr1[,addr2]]c\` `text[\` `text...]`	Deletes the addressed range of lines and writes the specified text ^{[Footnote 2]} to the output in its place. ^{[Footnote 3]}
Delete lines
`[addr1[,addr2]]d`	Deletes the specified range of lines. ^{[Footnote 3]}
Delete the first line of the pattern space
`[addr1[,addr2]]D`	Deletes all text in the pattern space up to and including the first newline character. If only one line is in the pattern space, this command reads another line from the input into the pattern space. After these operations, the command starts the complete list of editing commands again from the beginning.
Insert lines
`[addr1]i\` `text[\` `text...]`	Writes the specified text ^{[Footnote 2]} to the output abefore the line specified by `addr1`. See also the `a` command.
Advance in the file
`[addr1[,addr2]]n`	Writes the indicated range from the pattern space (if not deleted) to the output and then reads the next line from the input into the pattern space.
Join lines
`[addr1[,addr2]]N`	Joins the indicated lines together as a single line with embedded newline characters. If only one address is given, the command joins the specified line to the next line in the input stream. Pattern matches for addressing or for string replacement can extend across embedded newline characters. Use `\n` to indicate an embedded newline character for matching.
Print lines
`[addr1[,addr2]]p`	Writes the specified range of lines to the output at the point in the editing process where the `p` command appears. This command can be used to reorder sections of a file.
Print the first line in the pattern space
`[addr1[,addr2]]P`	Writes all text in the pattern space, up to and including the first newline character, to the output at the point in the editing process where the `P` command appears.
Read and append a file
`[addr1]r` `file`	Reads the named file ^{[Footnote 4]} and writes the file's contents to the output after `addr1`.
Substitute text
`[addr1[,addr2]]s/expr/string/[flags]`
	Searches the indicated lines for a string of characters matching the expression defined by `expr`, and replaces that set of characters with `string`. This command's operation is modified by the `g`, `p`, and `w file` flags. If either `expr` or `string` includes a slash ( `/` ), you must escape the literal slash with a backslash (`s/path/path\/file/`) or use alternative delimiters such as the at sign ( `@` ) or question mark ( `?` ). For example, `s@path@path/file@` replaces `path` with `path/file`. ^{[Footnote 5]}
Write a named file
`[addr1[,addr2]]w file`	Writes the specified range of lines to the named file ^{[Footnote 6]} at the point in the editing process where the `w` command appears.
Print line number
`[addr1]=`	Writes the line number of the indicated line to the output.

Table 3-4: Buffer Manipulation Commands

Command Description

Retrieve text from hold area

[addr1[,addr2]]g

[addr1[,addr2]]G

Copies the contents of the hold area to the pattern space indicated by addr1 and addr2, if present. The g command destroys the existing contents of the pattern space; the G command appends the held text to the contents of the pattern space, separating the previous text from the appended text with a newline character.

Move text to the hold area

[addr1[,addr2]]h

[addr1[,addr2]]H

Copies the indicated range from the pattern space to the hold area. The h command destroys the existing contents of the hold area; the H command appends the text in the pattern space to the contents of the hold area, separating the previous text from the appended text with a newline character.

Exchange pattern space and hold area

[addr1[,addr2]]x

Exchanges the contents of the pattern space with those of the hold area.

Command	Description
Retrieve text from hold area
`[addr1[,addr2]]g` `[addr1[,addr2]]G`	Copies the contents of the hold area to the pattern space indicated by `addr1` and `addr2`, if present. The `g` command destroys the existing contents of the pattern space; the `G` command appends the held text to the contents of the pattern space, separating the previous text from the appended text with a newline character.
Move text to the hold area
`[addr1[,addr2]]h` `[addr1[,addr2]]H`	Copies the indicated range from the pattern space to the hold area. The `h` command destroys the existing contents of the hold area; the `H` command appends the text in the pattern space to the contents of the hold area, separating the previous text from the appended text with a newline character.
Exchange pattern space and hold area
`[addr1[,addr2]]x`	Exchanges the contents of the pattern space with those of the hold area.

Table 3-5: Flow-of-Control Commands

Command Description

Range negation

[addr1[,addr2]]!cmd

The exclamation point ( ! ) instructs sed to apply the command following it on the same line to the parts of the input file that are not selected by addr1 and addr2 instead of applying it to the selected range.

Command grouping

[addr1[,addr2]]{ nested commands }

The left and right braces enclose a group of commands to be applied as a set to the range specified by addr1 and addr2. The first command in the set can be on the line following the left brace, as illustrated in this table, or it can be on the same line with the brace. The right brace must be on a line by itself. Groups can be nested within other groups.

Label

:label

Marks a place in the stream of editing commands to be used as a destination of a branch command. The label is a string of up to 8 bytes. Each label in the editing stream must be unique. For a related discussion, see the description of the t command in the sed(1) reference page.

Branch

blabel

Branches to the point in the editing script indicated by label and continues processing the current input line with the commands following the label. If label is null, the b command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.

Conditional branch

tlabel

If any successful substitutions were made on the current input line, branches to label; otherwise, the command does not branch. In either case, the command clears the flag that indicates a substitution was made. This flag is also cleared at the start of each new input line. If label is null and the branch is taken, the t command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.

Stop

[addr1]q

Stops editing in an orderly fashion by writing the current line to the output, writing any appended or read text to the output, and then exiting.

Command	Description
Range negation
`[addr1[,addr2]]!cmd`	The exclamation point ( `!` ) instructs `sed` to apply the command following it on the same line to the parts of the input file that are not selected by `addr1` and `addr2` instead of applying it to the selected range.
Command grouping
`[addr1[,addr2]]{` `nested commands` `}`	The left and right braces enclose a group of commands to be applied as a set to the range specified by `addr1` and `addr2`. The first command in the set can be on the line following the left brace, as illustrated in this table, or it can be on the same line with the brace. The right brace must be on a line by itself. Groups can be nested within other groups.
Label
`:label`	Marks a place in the stream of editing commands to be used as a destination of a branch command. The label is a string of up to 8 bytes. Each label in the editing stream must be unique. For a related discussion, see the description of the `t` command in the `sed`(1) reference page.
Branch
`blabel`	Branches to the point in the editing script indicated by `label` and continues processing the current input line with the commands following the label. If `label` is null, the `b` command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.
Conditional branch
`tlabel`	If any successful substitutions were made on the current input line, branches to `label`; otherwise, the command does not branch. In either case, the command clears the flag that indicates a substitution was made. This flag is also cleared at the start of each new input line. If `label` is null and the branch is taken, the `t` command bypasses the rest of the editing script, reads a new input line, and starts the editing script over from the beginning.
Stop
`[addr1]q`	Stops editing in an orderly fashion by writing the current line to the output, writing any appended or read text to the output, and then exiting.

3.5 String Replacement

The s command performs string replacement on the indicated lines in the input file. If the editor finds a string of characters in the input file that satisfies the pattern expression expr, it replaces that string with the set of characters specified in string. The string argument is not a regular expression, and it is not scanned or otherwise interpreted except as follows:

Any backslash characters ( \ ) appearing in string must be escaped. See Table 3-3 for an explanation of how to handle slash characters ( / ) in string.

The following two special symbols can be used in string:
- Ampersand ( & ) This symbol in string is replaced by the exact string of characters in the input lines that matched expr. For example, apply the command s/[Bb]oy/&s/ to the following line:
```
The boy watched the game.
```
  This command tells sed to find either Boy or boy in the input line and copy whichever pattern it finds to the output with an appended "s". Because the command finds boy, it transfers that string to the output with the modification, and the result is as follows:
```
The boys watched the game.
```
- Back-reference expression (\n) The number n is a single digit. This symbol in string is replaced by the string in the input line that matches the nth subexpression in expr. Subexpressions in basic regular expressions are delimited by backslash-parentheses sets, $ and $. For example, apply the command s/$stu$$dy$/\1r\2/ to the following line:
```
The study chair.
```
  This command tells sed to find study in the input line and copy that pattern to the output with an "r" inserted in the middle. The result is as follows:
```
The sturdy chair.
```

You can modify the behavior of the s command with flags, as follows:

Usually, only the first matching string in each line of the range is replaced. The g (global) flag causes sed to make the substitution for all matching strings anywhere on any line in the range. Note that the matching strings do not have to be identical; the expression expr is evaluated again for each potential match.

The p (print) flag instructs sed to write the indicated lines explicitly after making any substitutions; this writing action is in addition to sed's normal operation.

The w file (write) flag instructs sed to write the indicated lines to the named file after making any substitutions. Include exactly one space between the w flag and the file name.

Any or all of these flags can be used with a given s command; in combinations, the w flag must be the last flag specified.