Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

linux - Linking a program using printf with ld?

I'm getting a undefined reference to _printf when building an assembly program that defines its own _start instead of main, using NASM on x86-64 Ubuntu

Build commands:

   nasm -f elf64 hello.asm
   ld -s -o hello hello.o
   hello.o: In function `_start':
   hello.asm:(.text+0x1a): undefined reference to `_printf'
   MakeFile:4: recipe for target 'compile' failed
   make: *** [compile] Error 1

nasm source:

extern _printf

section .text
    global _start
_start:
    mov rdi, format     ; argument #1
    mov rsi, message    ; argument #2
    mov rax, 0
  call _printf            ; call printf

    mov rax, 0
    ret                 ; return 0

section .data

    message:    db "Hello, world!", 0
    format:   db "%s", 0xa, 0

Hello, World! should be the output

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

3 problems:

  • GNU/Linux using ELF object files does not decorate / mangle C names with a leading underscore. Use call printf, not _printf (Unlike MacOS X, which does decorate symbols with an _; keep that in mind if you're looking at tutorials for other OSes. Windows also uses a different calling convention, but only 32-bit Windows mangles names with _ or other decorations that encode the choice of calling convention.)

  • You didn't tell ld to link libc, and you didn't define printf yourself, so you didn't give the linker any input files that contain a definition for that symbol. printf is a library function defined in libc.so, and unlike the GCC front-end, ld doesn't include it automatically.

  • _start is not a function, you can't ret from it. RSP points to argc, not a return address. Define main instead if you want it to be a normal function.

Link with gcc -no-pie -nostartfiles hello.o -o hello if you want a dynamic executable that provides its own _start instead of main, but still uses libc.


This is safe for dynamic executables on GNU/Linux, because glibc can run its init functions via dynamic linker hooks. It's not safe on Cygwin, where its libc is only initialized by calls from its CRT start file (which do that before calling main).

Use call exit to exit, instead of making an _exit system call directly if you use printf; that lets libc flush any buffered output. (If you redirect output to a file, stdout will be full-buffered, vs. line buffered on a terminal.)

-static would not be safe; in a static executable no dynamic-linker code runs before your _start, so there's no way for libc to get itself initialized unless you call the functions manually. That's possible, but generally not recommended.

There are other libc implementations that don't need any init functions called before printf / malloc / other functions work. In glibc, stuff like the stdio buffers are allocated at runtime. (This used to be the case for MUSL libc, but that's apparently not the case anymore, according to Florian's comment on this answer.)


Normally if you want to use libc functions, it's a good idea to define a main function instead of your own _start entry point. Then you can just link with gcc normally, with no special options.

See What parts of this HelloWorld assembly code are essential if I were to write the program in assembly? for that and a version that uses Linux system calls directly, without libc.


If you wanted your code to work in a PIE executable like gcc makes by default (without --no-pie) on recent distros, you'd need call printf wrt ..plt.

Either way, you should use lea rsi, [rel message] because RIP-relative LEA is more efficient than mov r64, imm64 with a 64-bit absolute address. (In position-dependent code, the best option for putting a static address in a 64-bit register is 5-byte mov esi, message, because static addresses in non-PIE executables are known to be in the low 2GiB of virtual address space, and thus work as 32-bit sign- or zero-extended executables. But RIP-relative LEA is not much worse and works everywhere.)

;;; Defining your own _start but using libc
;;; works on Linux for non-PIE executables

default rel                ; Use RIP-relative for [symbol] addressing modes
extern printf
extern exit                ; unlike _exit, exit flushes stdio buffers

section .text
    global _start
_start:
    ;; RSP is already aligned by 16 on entry at _start, unlike in functions

    lea    rdi, [format]        ; argument #1   or better  mov edi, format
    lea    rsi, [message]       ; argument #2
    xor    eax, eax             ; no FP args to the variadic function
    call   printf               ; for a PIE executable:  call printf wrt ..plt

    xor    edi, edi             ; arg #1 = 0
    call   exit                 ; exit(0)
    ; exit definitely does not return

section .rodata        ;; read-only data can go in .rodata instead of read-write .data

    message:    db "Hello, world!", 0
    format:   db "%s", 0xa, 0

Assemble normally, link with gcc -no-pie -nostartfiles hello.o. This omits the CRT startup files that would normally define a _start that does some stuff before calling main. Libc init functions are called from dynamic linker hooks so printf is usable.

This would not be the case with gcc -static -nostartfiles hello.o. I included examples of what happens if you use the wrong options:

peter@volta:/tmp$ nasm -felf64 nopie-start.asm 
peter@volta:/tmp$ gcc -no-pie -nostartfiles nopie-start.o 
peter@volta:/tmp$ ./a.out 
Hello, world!
peter@volta:/tmp$ file a.out 
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=0cd1cd111ba0c6926d5d69f9191bdf136e098e62, not stripped

# link error without -no-pie because it doesn't automatically make PLT stubs
peter@volta:/tmp$ gcc -nostartfiles nopie-start.o 
/usr/bin/ld: nopie-start.o: relocation R_X86_64_PC32 against symbol `printf@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status


# runtime error with -static
peter@volta:/tmp$ gcc -static -no-pie -nostartfiles nopie-start.o -o static_start-hello
peter@volta:/tmp$ ./static_start-hello 
Segmentation fault (core dumped)

Alternative version, defining main instead of _start

(And simplifying by using puts instead of printf.)

default rel                ; Use RIP-relative for [symbol] addressing modes
extern puts

section .text
    global main
main:
    sub    rsp, 8    ;; RSP was 16-byte aligned *before* a call pushed a return address
                     ;; RSP is now 16-byte aligned, ready for another call

    mov    edi, message         ; argument #1, optimized to use non-PIE-only move imm32
    call   puts

    add    rsp, 8               ; restore the stack
    xor    eax, eax             ; return 0
    ret

section .rodata
    message:    db "Hello, world!", 0     ; puts appends a newline

puts pretty much exactly implements printf("%s ", string); C compilers will make this optimization for you, but in asm you should do it yourself.

Link with gcc -no-pie hello.o, or even statically link using gcc -no-pie -static hello.o. The CRT startup code will call glibc init functions.

peter@volta:/tmp$ nasm -felf64 nopie-main.asm 
peter@volta:/tmp$ gcc -no-pie nopie-main.o 
peter@volta:/tmp$ ./a.out 
Hello, world!

# link error if you leave out -no-pie  because of the imm32 absolute address
peter@volta:/tmp$ gcc nopie-main.o 
/usr/bin/ld: nopie-main.o: relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
collect2: error: ld returned 1 exit status

main is a function, so you need to re-align the stack before making another function call. A dummy push is also a valid way to align the stack on function entry, but add/sub rsp, 8 is clearer.

An alternative is jmp puts to tailcall it, so main's return value will be whatever puts returns. In this case, you must not modify rsp first: you just jump to puts with your return address still on the stack, exactly like if your caller had called puts.


PIE-compatible code defining a main

(You can make a PIE that defines its own _start. That's left as an exercise for the reader.)

default rel                ; Use RIP-relative for [symbol] addressing modes
extern puts

section .text
    global main
main:
    sub    rsp, 8    ;; RSP was 16-byte aligned *before* a call pushed a return address

    lea    rdi, [message]         ; argument #1
    call   puts  wrt ..plt

    add    rsp, 8
    xor    eax, eax               ; return 0
    ret

section .rodata
    message:    db "Hello, world!", 0     ; puts appends a newline
peter@volta:/tmp$ nasm -felf64 pie.asm
peter@volta:/tmp$ gcc pie.o
peter@volta:/tmp$ ./a.out 
Hello, world!
peter@volta:/tmp$ file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b27e6032f955d628a542f6391b50805c68541fb9, not stripped

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers

62 comments

56.6k users

...