5    Using m4 Macros in Your Programs

This chapter describes the m4 macro preprocessor, a front-end filter that lets you define macros by placing m4 macro definitions at the beginning of your source files. You can use the m4 preprocessor with either program source files or document source files.


5.1    Using Macros

Macros ease your programming or writing tasks by allowing you to substitute a simple word or two for a great amount of material. Macro calls in a source file have the following form:

name [ ( arg1[ , arg2 ] ) ]

For example, suppose you have a C program in which you want to print the same message at several points. You could code a series of printf statements like the following:

printf("\nThese %d files are in %s:\n",cnt,dir);

As your program evolves, you decide to change the wording; but you have to edit each instance of the message. Defining a macro like the following will save you a great deal of work:

define(filmsg,`printf("\nThese %d files are in %s:\n",$1,$2)')

Then, everywhere you want to output this message, you use the macro this way:

filmsg(cnt,dir);

With this implementation, you need only edit the message in one place.

A macro definition consists of a symbolic name (called a token) and the character string that is to replace it. A token is any string of alphanumeric characters (letters, numbers, and underscores) beginning with a letter or an underscore and delimited by nonalphanumeric characters (punctuation or white space). For example, N12 and N are both tokens but A+B is not a token. When you process your file through m4, each occurrence of a recognized macro is replaced by its definition. In addition to replacing symbolic names with text, m4 can also perform the following operations:

The m4 program reads each token in the file and determines if the token is a macro name. Macro names that are embedded in other tokens are not recognized; for example, m4 does not interpret N12 as containing an occurrence of the token N. If the token is a macro name, m4 replaces it with its defining text and pushes the resulting string back onto the input to be rescanned.

Macro expansion is thus recursive; macro definitions can include nested occurrences of other macros to any depth of nesting. You can call macros with arguments, in which case the arguments are collected and substituted into the right places in the defining text before the defining text is rescanned.

The m4 preprocessor is a standard UNIX filter. It accepts input from standard input or from a list of input files and writes its output to standard output. The following lines illustrate correct m4 usage:

% grep -v '#include' file1 file2 | m4 > outfile
% m4 file1 file2 | cc

The m4 program processes each argument in order. If there are no arguments, or if an argument is a minus sign ( - ), m4 reads standard input as its input file.


5.2    Defining Macros

You create a macro definition with the define command, one of about 20 built-in macros provided by m4. For example:

define(N,100)

The open parenthesis must follow the word define with no intervening space.

Given this macro definition, the token N will be replaced by 100 wherever it appears in the file being processed. The defining text can be any text, except that if the text contains parentheses, the number of open (left) parentheses must match the number of close (right) parentheses unless you protect an unmatched parenthesis by quoting it. See Section 5.2.1 for an explanation of quoting.

Built-in and user-defined macros work the same way except that some of the built-in macros change the state of the process. Refer to Section 5.3 for a list of the built-in macros.

You can define macros in terms of other macros. For example:



define(N,100)
define(M,N)

This example defines both M and N to be 100. If you later change the definition of N and assign it a new value, M retains the value of 100, not the new value you give N. The value of M does not track that of N because the m4 preprocessor expands macro names into their defining text as soon as possible. The overall result, as far as M is concerned, is the same as using the following input in the first place: define(M,100) If you want the value of M to track the value of N, you can reverse the order of the definitions, as follows:

define(M,N)
define(N,100)

Now M is defined to be the string N. When the value of M is requested later, the M is replaced by N, which is then rescanned and replaced by whatever value N has at that time.

Macro definitions made with the define command do not delete characters following the close parenthesis. For example:

Now is the time for all good persons.
define(N,100)
Testing N definition.

This example produces the following result:

Now is the time for all good persons.
 
Testing 100 definition.

The blank line results from the presence of a newline character at the end of the line containing the define macro. The built-in dnl macro deletes all characters that follow it, up to and including the next newline character. Use this macro to delete empty lines. For example:

Now is the time for all good persons.
define(N,100)dnl
Testing N definition.

This example produces the following result:

Now is the time for all good persons.
Testing 100 definition.


5.2.1    Using the Quote Characters

To delay the expansion of a define macro's arguments, enclose them in a matched pair of quote characters. The default quote characters are left and right single quotation marks (` and '), but you can use the built-in changequote macro to specify different characters. (See Section 5.3.) Any text surrounded by quote characters is not expanded immediately, but the quote characters are removed. The value of a quoted string is the string with the quote characters removed. Consider the following example:

define(N,100)
define(M,`N')

The quote characters around the N are removed as the argument is being collected. The result of using quote characters is to define M as the string N, not 100. This example makes the value of M track that of N, and it is thus another way to accomplish the effect of the following definitions, shown in Section 5.2:

define(M,N)
define(N,100)

The general rule is that m4 always strips off one level of quote characters whenever it evaluates something. This is true even outside macros. For example, to make the word "define" appear in the output, enter the word in quote characters, as follows:

`define' = 1

Because of the way m4 handles quoted strings, you must be careful about nesting macros. For example:

define(dog,canine)
define(cat,animal chased by `dog')
define(mouse,animal chased by cat)

When the definition of cat is processed, dog is not replaced with canine because it is quoted. But when mouse is processed, the definition of cat (animal chased by dog) is used; this time, dog is not quoted, and the definition of mouse becomes animal chased by animal chased by canine.

When you redefine an existing macro, you must quote the first argument (the macro name), as follows:

define(N,100)

.
.
.
define(`N',200)

Without the quote characters, the second define macro sees N, recognizes it, and substitutes its value, producing the following result:

define(100,200)

The m4 program ignores this statement because it can only define names, not numbers.


5.2.2    Macro Arguments

The simplest form of macro processing is replacing one string with another (fixed) string as illustrated in the previous sections. However, macros can also have arguments, so that you can use a given macro in different places with different results. To indicate where an argument is to be used within the replacement text for a macro (the second argument of its definition), use the symbol $n to indicate the nth argument. For example, the symbol $1 refers to the first argument of a macro. When the macro is used, m4 replaces the symbol with the value of the indicated argument. For example:

define(bump,$1=$1+1)

.
.
.
bump(x);

In this example, m4 will replace the bump(x) statement with x=x+1.

A macro can have as many arguments as needed. However, you can access only nine arguments by using the $n symbols ($1 through $9). To access arguments past the ninth argument, use the shift macro, which drops the first argument and reassigns the remaining arguments to the $n symbols (second argument to $1, third to $2, and so on). Using the shift macro more than once allows access to all arguments used with the macro.

The symbol $0 returns the name of the macro. Arguments that are not supplied are replaced by null strings, so that you can define a macro that concatenates its arguments as follows:

define(cat,$1$2$3$4$5$6$7$8$9)

.
.
.
cat(x,y,z)

This example replaces the cat(x,y,z) statement with xyz. Arguments $4 through $9 in this example are null because corresponding arguments were not provided.

When scanning a macro, the m4 program discards leading unquoted blanks, tabs, or newline characters in arguments, but keeps all other white space. For example:

define(a,     "$1 $2$3")

.
.
.
a(b, c, d)

This example expands the a macro to be "b cd". In the define macro, however, newline characters are meaningful. For example:

define(a,$1
$2$3)

.
.
.
a(b,c,d)

This latter example expands the a macro as follows:

b
cd

Macro arguments are separated by commas. Use parentheses to enclose arguments containing commas, so that the commas are not misinterpreted as ending the arguments containing them. For example, the following statement has only two arguments:

define(a, (b,c))

The first argument is a, and the second is (b,c). To use a single parenthesis in an argument, enclose it in quote characters:

define(a,b`)'c)

In this example, b)c is the second argument.


5.3    Using Other m4 Macros

The m4 program provides a set of macros that are already defined (built-in macros). Table 5-1 lists all of these macros and describes them briefly. The following sections further explain many of the macros and how to use them.

Table 5-1:  Built-In m4 Macros

Macro Description
changecom(l,r) Changes the left and right comment characters to the characters represented by l and r. The two characters must be different.
changequote(l,r) Changes the left and right quote characters to the characters represented by l and r. The two characters must be different.
decr(n) Returns the value of n-1.
define(name,replacement) Defines a new macro, named name, with a value of replacement.
defn(name) Returns the quoted definition of name.
divert(n) Changes the output stream to the temporary file number n.
divnum Returns the number of the currently active temporary file.
dnl Deletes text up to a newline character.
dumpdef(`name'[,`name'...]) Prints the names and current definitions of the named macros.
errprint(str) Prints str to the standard error file.
eval(expr) Evaluates expr as a 32-bit arithmetic expression.
ifdef(`name',arg1,arg2) If macro name is defined, returns arg1; otherwise, returns arg2.
ifelse(str1,str2,arg1,arg2) Compares the strings str1 and str2. If they match, ifelse returns the value of arg1; otherwise, it returns the value of arg2.
include(file) sinclude(file) Returns the contents of file. The sinclude macro does not report an error if it cannot access the file.
incr(n) Returns the value of n+1.
index(str1,str2) Returns the character position in string str1 where str2 starts, or -1 if str1 does not contain str2.
len(str) dlen(str) Returns the number of characters in str. The dlen macro operates on strings containing 2-byte representations of international characters.
m4exit(code) Exits m4 with a return code of code.
m4wrap(name) Runs macro name before exiting, after completing all other processing.
maketemp(strXXXXXstr) Creates a unique file name by replacing the literal string XXXXX in the argument string with the current process ID.
popdef(name) Replaces the current definition of name with the previous definition, saved with the pushdef macro.
pushdef(name,replacement) Saves the current definition of name and then defines name to be replacement in the same way as define.
shift(param_list) Shifts the parameter list leftward one position, destroying the original first element of the list.
substr(string,pos,len) Returns the substring of string that begins at character position pos and is len characters long.
syscmd(command) Executes the specified system command with no return value.
sysval Gets the return code from the last use of the syscmd macro.
traceoff(macro_list) Turns off trace for any macro in the list. If macro_list is null, turns off all tracing.
traceon(name) Turns on trace for the named macro. If name is null, turns trace on for all macros.
translit(string,set1,set2) Replaces any characters from set1 that appear in string with the corresponding characters from set2.
undefine(`name') Removes the definition of the named macro.
undivert(n,n[,n...]) Appends the contents of the indicated temporary files to the current temporary file.


5.3.1    Changing the Comment Characters

To include comments in your m4 programs, delimit the comment lines with the comment characters. The default left comment character is the number sign ( # ); the default right comment character is the newline character. If these characters are not convenient, use the built-in changecom macro. For example:

changecom({,})

This example makes the left and right braces the new comment characters. To restore the original comment characters, use changecom as follows:

changecom(#,
)

Using changecom with no arguments disables commenting.


5.3.2    Changing the Quote Characters

The default quote characters are the left and right single quotation marks (` and '). If these characters are not convenient, change the quote characters with the built-in changequote macro. For example:

changequote([,])

This example makes the left and right brackets the new quote characters. To restore the original quote characters, use changequote without arguments, as follows:

changequote


5.3.3    Removing a Macro Definition

The undefine macro removes macro definitions. For example:

undefine(`N')

This example removes the definition of N. You must quote the name of the macro to be undefined. You can use undefine to remove built-in macros, but once you remove a built-in macro, you cannot recover that macro for later use.


5.3.4    Checking for a Defined Macro

The built-in ifdef macro determines if a macro is currently defined. The ifdef macro accepts three arguments. If the first argument is defined, the value of ifdef is the second argument. If the first argument is not defined, the value of ifdef is the third argument. If there is no third argument, the value of ifdef is null.


5.3.5    Using Integer Arithmetic

The m4 program provides the following built-in functions for doing arithmetic on integers only:

incr Increments its numeric argument by 1
decr Decrements its numeric argument by 1
eval Evaluates an arithmetic expression

For example, you can create a variable N1 such that its value will always be one greater than N, as follows:

define(N,100)
define(N1,`incr(N)')

The eval function can evaluate expressions containing the following operators (listed in decreasing order of precedence):

Use parentheses to group operations where needed. All operands of an expression must be numeric. The numeric value of a true relation such as 1>0 is 1, and false is 0 (zero). The precision in eval is 32 bits. For example, to define M as 2==N+1, use eval as follows:

define(N,3)
define(M,`eval(2==N+1)')

Use quote characters around the text that defines a macro, unless the text is simple and contains no instances of macro names.


5.3.6    Manipulating Files

To merge a new file in the input, use the built-in include macro as follows:

include(myfile)

This example inserts the contents of myfile in place of the include command. As the included file is read, m4 scans it for macros as if it were part of the primary input.

With the include macro, a fatal error occurs if the named file cannot be accessed. To avoid an error, use the alternative form, sinclude (silent include). The sinclude macro continues without error if the named file cannot be accessed.


5.3.7    Redirecting Output

You can redirect the output of m4 to temporary files during processing, and the collected material can be output upon command. The m4 program can maintain up to nine temporary files, numbered 1 through 9. To redirect output, use the divert macro as in the following example:

divert(4)

When this comand is encountered, m4 begins writing its output to the end of temporary file 4. The m4 program discards the output if you redirect the output to a temporary file other than 1 through 9; you can use this feature to make m4 omit a portion of the input file. Use divert(0) or divert with no argument to return the output to the standard output stream.

At the end of its processing, m4 writes all redirected output to the standard output stream, reading from the temporary files in numeric order and then destroying the temporary files.

To retrieve the information from all temporary files in numeric order at any time before processing is completed, use the built-in undivert macro with no arguments. To retrieve selected temporary files in a specified order, use undivert with arguments. When using undivert, m4 discards the temporary files that are recovered and does not search the recovered information for macros.

The value of undivert is not the diverted text.

The built-in divnum macro returns the number of the currently active temporary file. If you do not change the output file with the divert macro, m4 puts all output in temporary file 0 (zero).


5.3.8    Using System Programs in a Program

You can run any program in the operating system from a program by using the built-in syscmd macro. If the system command returns information, that information is the value of the syscmd macro; otherwise, the macro's value is null. For example:

syscmd(date)


5.3.9    Using Unique File Names

Use the built-in maketemp macro to make a unique file name from a program. If the literal string XXXXX is present in the macro's argument, m4 replaces the XXXXX with the process ID of the current process. For example:

maketemp(myfileXXXXX)

If the current process ID is 23498, this example returns myfile23498. You can use this string to name a temporary file.


5.3.10    Using Conditional Expressions

The built-in ifelse macro performs conditional testing. The simplest form is the following:

ifelse(a,b,c,d)

This example compares the two strings a and b. If they are identical, ifelse returns string c. If they are not identical, it returns string d. For example, you can define a macro called compare to compare two strings and return yes if they are the same or no if they are different, as follows:

define(compare, `ifelse($1,$2,yes,no)')

The quote characters prevent the evaluation of ifelse from occurring too early. If the fourth argument is missing, it is treated as empty.

The ifelse macro can have any number of arguments, and it therefore provides a limited form of multiple path decision capability. For example:

ifelse(a,b,c,d,e,f,g)

This statement is logically the same as the following fragment:

if(a == b) x = c;
else if(d == e) x = f;
else  x = g;
return(x);

If the final argument is omitted, the result is null.


5.3.11    Manipulating Strings

The built-in len macro returns the byte length of the string that makes up its argument. For example, len(abcdef) is 6, and len((a,b)) is 5.

The built-in dlen macro returns the length of the displayable characters in a string. In certain international usages, 2-byte codes are displayed as one character. Thus, if the string contains any 2-byte international character codes, the result of dlen will differ from the result of len.

The built-in substr macro returns the substring (beginning at the character position specified by the second argument) from a specified string (first argument). The third argument specifies the length in bytes of the returned substring. For example:

substr(Krazy Kat,6,5)

This example returns "Kat", which is the 3-character substring beginning at character position 6 of the string "Krazy Kat". The first character in the string is at position 0 (zero). If the third argument is omitted or if the string is not long enough to satisfy the third argument, as in this example, the rest of the string is returned.

The built-in index macro returns the byte position, or index, in a string (first argument) where a substring (second argument) begins. If the substring is not present, index returns -1. As with substr, the origin for strings is 0 (zero). For example:

index(Krazy Kat,Kat)

This example returns 6.

The built-in translit macro performs one-for-one character substitution, or transliteration. The first argument is a string to be processed. The second and third arguments are lists of characters. Each instance of a character from the second argument that is found in the string is replaced by the corresponding character from the third argument. For example:

translit(the quick brown fox jumps over the lazy dog,aeiou,AEIOU)

This example returns the following:

thE qUIck brOwn fOx jUmps OvEr thE lAzy dOg

If the third argument is shorter than the second argument, characters from the second argument that are not in the third argument are deleted. If the third argument is missing, all characters present in the second argument are deleted.

Note

The substr, index, and translit macros do not differentiate between 1- and 2-byte displayable characters and can return unexpected results in some international usages.


5.3.12    Printing

The built-in errprint macro writes its arguments to the standard error file. For example:

errprint (`error')

The built-in dumpdef macro dumps the current names and definitions of items named as arguments. Names must be quoted. If you supply no arguments, dumpdef prints all current names and definitions. The dumpdef macro writes to the standard error file.