This chapter describes lexical conventions associated with the following items:
You can use blank and tab characters anywhere between operators, identifiers, and constants. Adjacent identifiers or constants that are not otherwise separated must be separated by a blank or tab.
These characters can also be used within character constants; however, they are not allowed within operators and identifiers.
The number sign character
(#
)
introduces a comment. Comments that start with a number sign extend
through the end of the line on which they appear.
You can also use C language notation
(/*...*/
)
to delimit comments.
Do not start a comment with a number sign in column one;
the assembler uses
cpp
(the C language preprocessor) to preprocess assembler code and
cpp
interprets number signs in the first column as preprocessor directives.
An identifier consists of a case-sensitive sequence of alphanumeric characters (A-Z, a-z, 0-9) and the following special characters:
Identifiers can be up to 31 characters long, and the first character cannot be numeric (0-9).
If an undefined identifier is referenced,
the assembler assumes that the identifier is an external symbol.
The assembler treats the identifier like a name specified by a
.globl
directive (see
Chapter 5).
If the identifier is defined to the assembler and the identifier has not been specified as global, the assembler assumes that the identifier is a local symbol.
The assembler supports the following constants:
The assembler interprets all scalar constants as twos complement numbers. Scalar constants can be any of the digits 0123456789abcdefABCDEF.
Scalar constants can be either decimal, hexadecimal, or octal constants:
Floating-point constants can appear only in floating-point directives (see Chapter 5) and in the floating-point load immediate instructions (see Section 4.2). Floating-point constants have the following format:
±d1[.d2][e|E±d3]
d1
d2
d3
The "+" symbol (plus sign) is optional.
For example, the number .02173 can be represented as follows:
21.73E-3
The floating-point directives, such as
.float
and
.double
,
may optionally use hexadecimal floating-point constants instead of
decimal constants. A hexadecimal floating-point constant consists of
the following elements:
[+|-]0x[1|0].<hex-digits>h0x<hex-digits>
The assembler places the first set of hexadecimal digits (excluding the 0 or 1 preceding the decimal point) in the mantissa field of the floating-point format without attempting to normalize it. It stores the second set of hexadecimal digits in the exponent field without biasing them. If the mantissa appears to be denormalized, it checks to determine whether the exponent is appropriate. Hexadecimal floating-point constants are useful for generating IEEE special symbols and for writing hardware diagnostics.
For example, either of the following directives generates the single-precision number 1.0:
.float 1.0e+0 .float 0x1.0h0x7f
The assembler uses normal (nearest) rounding mode to convert floating-point constants.
All characters except the newline character are allowed in string constants. String constants begin and end with double quotation marks (").
The assembler observes most of the backslash conventions used by the C language. Table 2-1 shows the assembler's backslash conventions.
Convention | Meaning |
\a | Alert (0x07) |
\b | Backspace (0x08) |
\f | Form feed (0x0c) |
\n | Newline (0x0a) |
\r | Carriage return (0x0d) |
\t | Horizontal tab (0x09) |
\v | Vertical feed (0x0b) |
\\ | Backslash (0x5c) |
\" | Quotation mark (0x22) |
\' | Single quote (0x27) |
\nnn
|
Character whose octal value is
nnn
(where
n
is 0-7)
|
\Xnn
|
Character whose hexadecimal value is
nn
(where
n
is 0-9, a-f, or A-F)
|
Deviations from C conventions are as follows:
nn
and
\Xnn
are allowed.
For octal notation, the backslash conventions require three characters when the next character could be confused with the octal number.
For hexadecimal notation, the backslash conventions require two characters when the next character could be confused with the hexadecimal number. Insert a 0 (zero) as the first character of the single-character hexadecimal number when this condition occurs.
You can include multiple statements on the same line by separating
the statements with semicolons. Note, however, that the assembler
does not recognize semicolons as separators when they follow comment
symbols
(#
or
/*
).
The assembler supports the following types of statements:
Each keyword statement can include an optional label, an operation code (mnemonic or directive), and zero or more operands (with an optional comment following the last operand on the statement):
[
label:
]
opcode operand
[
; opcode operand
; ...
]
[
# comment
]
Some keyword statements also support relocation operands (see Section 2.6.4).
Labels can consist of label definitions or numeric values.
Label definitions always end with a colon. You can put a label definition on a line by itself.
f
(forward) or a
b
(backward) immediately after the referencing digit in an instruction,
for example,
br 7f
(which is a forward branch to numeric label 7).
The reference directs the assembler to look for the nearest numeric
label that corresponds to the specified number in the lexically
forward or backward direction.
A null statement is an empty statement that the assembler ignores. Null statements can have label definitions. For example, the following line has three null statements in it:
label: ; ;
A keyword statement contains a predefined keyword. The syntax for the rest of the statement depends on the keyword. Keywords are either assembler instructions (mnemonics) or directives.
Assembler instructions in the main instruction set and the floating-point instruction set are described in Chapter 3 and Chapter 4, respectively. Assembler directives are described in Chapter 5.
Relocation operands are generally useful in only two situations:
Some macro instructions (for example,
ldgp
)
require special coordination between the machine-code instructions
and the relocation sequences given to the linker. By using the macro
instructions, the assembler programmer relies on the assembler to
generate the appropriate relocation sequences.
In some instances, the use of macro instructions may be undesirable. For example, a compiler that supports the generation of assembly language files may not want to defer instruction scheduling to the assembler. Such a compiler will want to schedule some or all of the machine-code instructions. To do this, the compiler must have a mechanism for emitting an object file's relocation sequences without using macro instructions. The mechanism for establishing these sequences is the relocation operand.
A relocation operand can be placed after the normal operand on an assembly language statement:
opcode operand relocation_operand
The syntax of the
relocation_operand
is as follows:
!relocation_type
! sequence_number
relocation_type
literal
lituse_base
lituse_bytoff
lituse_jsr
gpdisp
gprelhigh
gprellow
The relocation types must be enclosed within a pair of exclamation points (!) and are not case sensitive. See Table 7-11 for descriptions of the different types of relocation operations.
sequence_number
The following examples contain relocation operands in the source code:
lituse_base
relocations
# Equivalent C statement: # sym1 += sym2 (Both external)
# Assembly statements containing macro instructions: ldq $1, sym1 ldq $2, sym2 addq $1, $2, $3 stq $3, sym1
# Assembly statements containing machine-code instructions # requiring relocation operandss: ldq $1, sym1($gp)!literal!1 ldq $2, sym2($gp)!literal!2
ldq $3, sym1($1)!lituse_base!1 ldq $4, sym2($1)!lituse_base!2 addq $3, $4, $3 stq $3, sym1($1)!lituse_base!1
The assembler stores the
sym1
and
sym2
address constants in the
.lita
section.
In this example, the code with relocation operands provides better
performance than the other code because it saves on register
usage and on the length of machine-code instruction sequences.
ldgp
sequence that is scheduled inside a
lituse_base
relocation
# Assembly statements containing macro instructions: beq $2, L stq $31, sym ldgp $gp, 0($27) ...
# Assembly statements containing machine-code instructions that # require relocation operandss: ldq $at, sym($gp)!literal!1 beq $2, L # crosses basic block boundary ldah $gp, 0($27)!gpdisp!2 stq $31, sym($at)!lituse_base!1 lda $gp, 0($gp)!gpdisp!2
In this example, the programmer has elected to schedule the load of
the address of
sym
before the conditional branch.
# Assembly statements containing macro instructions: jsr sym1 ldgp $gp, 0($ra)
.extern sym1
.text
# Assembly statements containing machine-code instructions that # require relocation operandss: ldq $27, sym1($gp)!literal!1 jsr $26, ($27), sym1!lituse_jsr!1 # as1 puts in an R_HINT for the jsr instruction ldah $gp, 0($ra)!gpdisp!2 lda $gp, 0($gp)!gpdisp!2
In this example, the code with relocation operands does not provide any significant gains over the other code. This example is only provided to show the different coding methods.
An expression is a sequence of symbols that represents a value. Each expression and its result have data types. The assembler does arithmetic in twos complement integers with 64 bits of precision. Expressions follow precedence rules and consist of the following elements:
You can also use a single character string in place of an integer within an expression. For example, the following two pairs of statements are equivalent:
.byte "a" ; .word "a"+0x19 .byte 0x61 ; .word 0x7a
The assembler supports the operators shown in Table 2-2.
Operator | Meaning |
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Remainder |
<< | Shift left |
>> | Shift right (sign is not extended) |
^ | Bitwise EXCLUSIVE OR |
& | Bitwise AND |
| | Bitwise OR |
- | Minus (unary) |
+ | Identity (unary) |
~ | Complement |
For the order of operator evaluation within expressions, you can rely on the precedence rules or you can group expressions with parentheses. Unless parentheses enforce precedence, the assembler evaluates all operators of the same precedence strictly from left to right. Because parentheses also designate index registers, ambiguity can arise from parentheses in expressions. To resolve this ambiguity, put a unary + in front of parentheses in expressions.
The assembler has three precedence levels. The following table lists the precedence rules from lowest to highest:
Precedence | Operators |
Least binding, lowest precedence | Binary +, - |
. | |
. | Binary *, /, %, <<, >>, ^, &, | |
. | |
Most binding, highest precedence | Unary -, +, ~ |
Note
The assembler's precedence scheme differs from that of the C language.
Each symbol you reference or define in an assembly program belongs to one of the type categories shown in Table 2-4.
Type | Description |
undefined |
Any symbol that is referenced but not defined becomes
global undefined.
(Declaring such a symbol in a
.globl
directive merely makes its status clearer.)
|
absolute | A constant defined in an assignment (=) expression. |
text |
Any symbol defined while the
.text
directive is in effect belongs to the text section.
The text section contains the program's instructions, which are
not modifiable during execution.
|
data |
Any symbol defined while the
.data
directive is in effect belongs to the data section.
The data section contains memory that the linker can
initialize to nonzero values before your program begins to execute.
|
sdata |
The type sdata is similar to the type data,
except that defining a symbol while the
.sdata
("small data") directive is in effect causes the linker to place
it within the small data section.
This increases the chance that the linker will be able to optimize
memory references to the item by using gp-relative addressing.
|
rdata and
rconst |
Any symbol defined while the
.rdata
or
.rconst
directives are in effect belongs to this category.
The only difference between the types rdata and rconst
is that the former is allowed to have dynamic relocations and
the latter is not.
(The types rdata and rconst are also similar to the type
data but, unlike data, cannot be modified during execution.)
|
bss and sbss |
Any symbol defined in a
.comm
or
.lcomm
directive belongs to these sections, except that a
.data ,
.sdata ,
.rdata ,
or
.rconst
directive can override a
.comm
directive.
The
.bss
and
.sbss
sections consist of memory that the kernel loader
initializes to zero before your program begins to execute.
If a symbol's size is less than the number of bytes specified by the
Local symbols in the
|
Symbols in the undefined category are always global; that is, they
are visible to the linker and can be shared with other modules of
your program. Symbols in the absolute, text, data, sdata, rdata,
rconst, bss, and sbss type categories are local unless declared in a
.globl
directive.
For any expression, the result's type depends on the types of the operands and the operator. The following type propagation rules are used in expressions:
.text
section,
.data
section, or
.bss
section, the
result has the
first operand's
type and the other operand must be absolute.
.text
section,
.data
section, or
.bss
section, the
type propagation rules can vary:
*
,
/
,
%
,
<<
,
>>
,
~
,
^
,
&
,
and
|
apply only to absolute symbols.
The assembler accepts addresses expressed in the formats described in Table 2-5.
Format | Address Description |
(base-register )
|
Specifies an indexed address, which assumes a zero offset. The base register's contents specify the address. |
expression
|
Specifies an absolute address. The assembler generates the most locally efficient code for referencing the value at the specified address. |
expression( base-register)
|
Specifies a based address. To get the address, the value of the expression is added to the contents of the base register. The assembler generates the most locally efficient code for referencing the value at the specified address. |
relocatable-symbol
|
Specifies a relocatable address. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. |
relocatable-symbol± expression
|
Specifies a relocatable address. To get the address, the value of the expression, which has an absolute value, is added or subtracted from the relocatable symbol. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |
relocatable-symbol( index-register)
|
Specifies an indexed relocatable address. To get the address, the index register is added to the relocatable symbol's address. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |
relocatable-symbol± expression( index-register)
|
Specifies an indexed relocatable address. To get the address, the assembler adds or subtracts the relocatable symbol, the expression, and the contents of index register. The assembler generates the necessary instructions to address the item and generates relocation information for the link editor. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |