Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
857 views
in Technique[技术] by (71.8m points)

assembly - What would happen if a system executes a part of the file that is zero-padded?

I've seen in some posts/videos/files that they are zero-padded to look bigger than they are, or match "same file size" criteria some file system utilities have for moving files, mostly they are either prank programs, or malware.

But I often wondered, what would happen if the file corrupted, and would "load" the next set of "instructions" that are in the big zero-padded space at the end of the file?

Would anything happen? What's the instruction set for 0x0?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The decoding of 0 bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00 (using hexdump notation).

On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target and llvm-objdump), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary, and my clang won't create SPARC binaries on its own.)


On x86, 00 00 (2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.

This usually segfaults right away (if eax/rax isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)


x86 64-bit mode: ndisasm -b64 /dev/zero | head:

address   machine code      disassembly
00000000  0000              add [rax],al

x86 32-bit mode (-b32):

00000000  0000              add [eax],al

x86 16-bit mode: (-b16):

00000000  0000              add [bx+si],al

AArch32 ARM mode: cd /tmp && dd if=/dev/zero of=zero bs=16 count=1 && arm-none-eabi-objdump -z -D -b binary -marm zero. (Without -z, objdump skips over large blocks of all-zero and shows ...)

addr   machine code   disassembly
0:   00000000        andeq   r0, r0, r0

ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero

0:   0000            movs    r0, r0
2:   0000            movs    r0, r0

AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero

 0:   00000000        .inst   0x00000000 ; undefined

MIPS32: echo .long 0 > zero.S && clang -c -target mips zero.S && llvm-objdump -d zero.o

zero.o: file format ELF32-mips
Disassembly of section .text:
   0:       00 00 00 00     nop

PowerPC 32 and 64-bit: -target powerpc and -target powerpc64. IDK if any extensions to PowerPC use the 00 00 00 00 instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.

zero.o: file format ELF32-ppc   (or ELF64-ppc64)
Disassembly of section .text:
   0:       00 00 00 00  <unknown>

IBM S390: clang -c -target systemz zero.S

zero.o: file format ELF64-s390
Disassembly of section .text:
   0:       00 00  <unknown>
   2:       00 00  <unknown>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...