Writing portable assembly code
NASM
NASM(Netwide Assembler) is cross-platform x86 assembler that supports OS-specific directives and various output file formats such as elf(Linux), mach-O(macOS). This post aims to introduce what should be considered to write Linux, macOS portable assembly code.
ABI
Output file(machine code) must be compatible with current environment to be executed. x86 machine code cannot run on ARM CPU directly. ELF file is not executable on macOS that support mach-O. Many elements including calling convention, CPU architecture, file format determine the ABI, interface between software and combination of OS and hardware.
Common
-
- Calling convention
- Calling convention of the ABI is one of the common feature shared by Linux and macOS. Calling convention defines rules for invoking a function, such as where function parameters and return value are stored.
Linux
- system call
Each OS has its own method for calling system calls. In Linux, follow the calling convention and set rax
to system call index.
Linux - system call table
mov rax, 0 ; index 0: read system call
syscall
After system call returns successfully, rax
have return values like number of bytes.
If there is an error, rax
value is negation of error number. Store negation of the rax
value in errno
to properly indicate system call error.
- global symbol
In NASM, global
directive exports a symbol so that external file can use the symbol. extern
imports a symbol from other file.
global main
extern printf
- errno
We can get error information of system call through errno
.
// errno.h
#define errno (*__errno_location())
Indeed errno
is a macro for indirection of an address returned by __errno_location
. Because Linux and macOS have different symbol for the function returning the address of errno
, Writing NASM macro for errno
can help writing portable assembly.
call __errno_location
This will return the address of the errno
in rax
.
macOS
- system call
XNU - system call master file
XNU - system call classes
There are not much differences compared to Linux except rax
value.
First, macOS has different table and index: Index for read system call is 3.
Second, macOS has system call class that is consists of mach, Unix, etc.
[24,31] bits are used to represent the class.
mov rax, 0x2000003
; class 2: Unix class
; index 3: read system call
; 0x2000003 = (2 << 24) | 3 = (CLASS << 24) | INDEX
syscall
Third, System call error is indicated by carry bit. If carry bit is not set, it is safe to return. If not, store the value of rax
to errno
. Linux uses 32th bit and macOS uses carry bit to notify the system call error.
- global symbol
In macOS, all the global symbols are prepended with an underscore. This is done by the compiler automatically, but we must set it manually while writing assembly.
global _main
extern _printf
- errno
macOS system library uses different symbol for errno
compared to Linux.
// sys/errno.h
#define errno (*__error())
Note there are three underscores because __error
symbol in C source code is before compile.
call ___errno
Position Independent Executable
Compiler generates PIE file as default because it enables ASLR(Address Space Layout Randomization). Loaded program image will have randomized addresses everytime it executed so that addresses cannot be predicted easily. Normaly libc is position independent to prevent attacks. Calling standard functions without considering PIE in assembly will cause link error.
# linker error
relocation R_X86_64_32 against `.rodata' can not be used when making a *PIE* object; recompile with `-fPIE`
Linux
Calling printf
or functions from PIE library requires WRT
operator. Operands for the WRT
are listed in NASM documentation.
NASM - PIC
extern printf
...
call printf WRT ..plt
...
Operand is ..plt
, procedure linkage table, because printf is a procedure name.
macOS
NASM will generate valid output without using WRT
.
extern _printf
...
call _printf
...
Using NASM macro
Macro is useful when writing assembly code clearly because there are many OS specific directives and conventions. Below are examples from NASM documentation.