1. ホーム
  2. Linux

プログラム下部の2つのスイスアーミーナイフをこじ開ける

2022-02-09 12:52:46
<パス

記事目次

良い仕事をするためには、まず正しい道具を使わなければならない

プログラマーにとって、コンピュータ・プログラムがどのように動くのか、その原理や根底にあるものを理解することは重要です。結局のところ、しっかりとした基礎がなければ、建物は安全ではないのです。

これらのことを学ぶとき、それをナビゲートするための適切なツールが自由に使えると、より良いですね。

readelfとobjdumpは、そんな強力なスイスアーミーナイフのようなものです。

経験値

readelf、objdumpが何を実現できるのか、小さな例で見てみましょう。

ソースコード

main.c

#include 

Compile, run

$ gcc main.c -o main.out
$ . /main.out 
hello world!


readelf

The above is the simplest hello world program that will eventually run and output hello world. let's look at how this program can output hello world.

This is where the readelf command comes into play

$ readelf -hs main.out 
ELF header.
  Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 00 00 00 
  Category: ELF64
  Data: 2's complement, little endian
  Version: 1 (current)
  OS/ABI: UNIX - System V
  ABI Version: 0
  Type: DYN (shared target file)
  System Architecture: Advanced Micro Devices X86-64
  Version: 0x1
  Entry point address: 0x1060
  Start of program headers: 64 (bytes into file)
  Start of section headers: 14712 (bytes into file)
  Flag: 0x0
  Size of this header: 64 (bytes)
  Size of program headers: 56 (bytes)
  Number of program headers: 13
  Size of section headers: 64 (bytes)
  Number of section headers: 31
  Section header string table index: 30

...
Symbol table '.symtab' contains 65 entries:
...
__libc_start_main@@GLIBC_
    53: 0000000000004000 0 NOTYPE GLOBAL DEFAULT 25 __data_start
    54: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
    55: 0000000000004008 0 OBJECT GLOBAL HIDDEN 25 __dso_handle
    56: 0000000000002000 4 OBJECT GLOBAL DEFAULT 18 _IO_stdin_used
    57: 0000000000001170 101 FUNC GLOBAL DEFAULT 16 __libc_csu_init
    58: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 26 _end
    59: 0000000000001060 47 FUNC GLOBAL DEFAULT 16 _start
    60: 0000000000004010 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
    61: 0000000000001149 38 FUNC GLOBAL DEFAULT 16 main
    62: 0000000000004010 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__
    63: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
    64: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@@GLIBC_2.2


-h
--file-header Show elf file header information

-s
--syms
--symbols Show symbol table entries

As you can see, the ELF header describes the program entry point as 0x1060, and we find the symbol in the symbol table at 0x1060 for the _start function. Here is a brief introduction to _start, which I plan to explain in more detail later.

Under Linux, the entry point for a typical program is _start, which is part of the Linux system library (Glibc). Once our program is linked with the Glibc library to form the final executable, this function is the entry point for the initialization part of the program, which completes a series of initialization procedures and then calls the main function to execute the main body of the program. After the main function finishes executing, it returns to the initialization section, does some cleanup, and then ends the process.

-- "Programmer's Self - Linking, Loading, and Libraries

We know that _start calls the main function, but that's just theoretical knowledge from the book. This is where the objdump command comes into play.

objdump

$ objdump -Sd main.out

main.out: file format elf64-x86-64
...
Disassembly of section .text:

0000000000001060 <_start>:
    1060: f3 0f 1e fa endbr64 
    1064: 31 ed xor %ebp,%ebp
    1066: 49 89 d1 mov %rdx,%r9
    1069: 5e pop %rsi
    106a: 48 89 e2 mov %rsp,%rdx
    106d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
    1071: 50 push %rax
    1072: 54 push %rsp
    1073: 4c 8d 05 66 01 00 00 lea 0x166(%rip),%r8 # 11e0 <__libc_csu_fini>
    107a: 48 8d 0d ef 00 00 00 lea 0xef(%rip),%rcx # 1170 <__libc_csu_init>
    1081: 48 8d 3d c1 00 00 00 lea 0xc1(%rip),%rdi # 1149 

1088: ff 15 52 2f 00 00 callq *0x2f52(%rip) # 3fe0 <__libc_start_main@GLIBC_2.2.5> 108e: f4 hlt 108f: 90 nop 0000000000001090 <deregister_tm_clones>: 1090: 48 8d 3d 79 2f 00 00 lea 0x2f79(%rip),%rdi # 4010 <__TMC_END__> 1097: 48 8d 05 72 2f 00 00 lea 0x2f72(%rip),%rax # 4010 <__TMC_END__> 109e: 48 39 f8 cmp %rdi,%rax 10a1: 74 15 je 10b8 <deregister_tm_clones+0x28> 10a3: 48 8b 05 2e 2f 00 00 mov 0x2f2e(%rip),%rax # 3fd8 <_ITM_deregisterTMCloneTable> 10aa: 48 85 c0 test %rax,%rax 10ad: 74 09 je 10b8 <deregister_tm_clones+0x28> 10af: ff e0 jmpq *%rax 10b1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 10b8: c3 retq 10b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) ... 0000000000001149 <main>: 1149: f3 0f 1e fa endbr64 114d: 55 push %rbp 114e: 48 89 e5 mov %rsp,%rbp 1151: 48 83 ec 10 sub $0x10,%rsp 1155: 89 7d fc mov %edi,-0x4(%rbp) 1158: 48 89 75 f0 mov %rsi,-0x10(%rbp) 115c: 48 8d 3d a1 0e 00 00 lea 0xea1(%rip),%rdi # 2004 <_IO_stdin_used+0x4> 1163: e8 e8 fe ff ff callq 1050 <puts@plt> 1168: b8 00 00 00 00 mov $0x0,%eax 116d: c9 leaveq 116e: c3 retq 116f: 90 nop ...
-S disassembles the source code as much as possible, especially if the -g debugging parameter is specified at compile time. The -d argument is implied.
-d to see the assembly of each segment

We can see that.

The _start function actually does one thing, it calls the __libc_start_main function and passes it arguments: main, argc, argv, etc.

__libc_start_main eventually calls the main function.

Revenue

In the analysis of the above example, we used the readelf and objdump commands and briefly introduced several parameters. With the help of these commands and parameters, we were able to analyze the underlying principles of the program in a smooth way, so that we can understand more deeply the process of compiling and running the program, the composition of the executable file and other underlying principles, which is very helpful for us to learn about computers and can even This helps us learn about computers and can even further influence our upper-level code.


Detailed parameters

Here we list their detailed parameters again.

readelf


Used to display information about elf formatted files.
This program provides similar functionality to objdump, but it displays more specific information
-a 
--all displays all information, equivalent to -h -l -S -s -r -d -V -A -I. 

-h 
--file-header Displays the file header information at the beginning of the elf file. 

-l 
--program-headers  
--segments Display program headers (segment headers) information (if any). 

-S 
--section-headers  
--sections Show section headers information (if any). 

-g 
--section-groups Show section group information (if any). 

-t 
--section-details Show section details (for -S). 

-s 
--syms        
--symbols Show the entries in the symbol table section (if any). 

-e 
--headers Show all header information, equals to: -h -l -S 

-n 
--notes Show information about note segments (kernel comments). 

-r 
--relocs Show information about relocatable segments. 

-u 
--unwind Display information about unwind segments. Currently only IA64 ELF unwind segments are supported. 

-d 
--dynamic Show information about dynamic segments. 

-V 
--version-info Show version segment information. 

-A 
--arch-specific Display CPU architecture information. 

-D 
--use-dynamic Display symbols using the symbol table in dynamic segments instead of using symbol segments. 

-x <number or name> 
--hex-dump=<number or name> Displays the contents of the specified segment in hexadecimal. number specifies the index of the segment in the segment table, or a string specifies the segment name in the file. 

-w[liaprmfFsoR] or 
--debug-dump[=line,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=loc,=Ranges] Displays the contents of the specified debug segment. 

-I 
--histogram Displays a histogram of the length of the bucket list when displaying symbols. 

-v 
--version Show the version information of readelf. 

-H 
--help Show the command line options supported by readelf. 

-W 
--wide Wide Wide line output. 

@file Allows you to group options into a file and load it with this @file option. 



Display binary file information
The objdump command is a gcc utility used to view the composition of the target file or executable target file.
--archive-headers 
-a 
Displays information about the members of the archive, similar to ls -l which lists lib*.a. 

-b bfdname 
--target=bfdname 
Specify the target code format. This is not required, objdump automatically recognizes many formats, e.g. 

objdump -b oasys -m vax -h fu.o 
Displays a header summary message for fu.o, specifying that the file is a target file generated with the Oasys compiler on a Vax system. objdump -i will give a list of target code formats that can be specified here. 

-C 
--demangle 
Decodes the underlying symbolic names into user-level names, making C++ function names appear in an understandable way, in addition to removing the underscore at the beginning. 

--debugging 
-g 
Display debugging information. Attempts to parse debugging information saved in files and display it in C syntax. Only certain types of debugging information are supported. Some other formats are supported by readelf -w. 

-e 
--debugging-tags 
Similar to the -g option, but the generated information is in a format compatible with the ctags tool. 

--disassemble 
-d 
Disassemble sections of machine code from objfile for those specific instructions. 

-D 
--disassemble-all 
Similar to -d, but disassembles all sections. 

--prefix-addresses 
Disassembles each line with its full address. This is an older disassembly format. 

-EB 
-EL 
--endian={big|little} 
Specifies the small end of the target file. This term will affect the instructions that are disassembled. It is used when the disassembled file does not describe the small end information. For example, S-records. 

-f 
--file-headers 
Show the overall header summary information for each file in the objfile. 

-h 
--section-headers 
--headers 
Show header summary information for each section of the target file. 

-H 
--help 
Short help message. 

-i 
--info 
Displays a list of architectures and target formats available for the -b or -m options. 

--j name
--section=name 
Display information only for the section with the specified name 

-l
--line-numbers 
Use -ld only with -d, -D or -r. The difference between -ld and -d is not significant, and is useful for source-level debugging, requiring a debugging compilation option such as -g to be used at compile time. 

-m machine 
--architecture=machine 
Specify the architecture to be used when disassembling the target file. This option is useful when the file to be disassembled does not describe the architecture itself (e.g. S-records). You can use the -i option to list the architectures that can be specified here. 

--reloc 
-r 
Show the relocated entry of the file. If used together with -d or -D, the relocated part is displayed in disassembled format. 

--dynamic-reloc 
-R 
Show the dynamic relocation entry of a file, meaning only for dynamic target files, such as some shared libraries. 

-s 
--full-contents 
Show the full contents of the specified section. By default all non-empty sections will be displayed. 

-S 
--source 
Disassemble the source code if possible, especially when compiling with a debugging parameter like -g specified. The -d argument is implied. 

--show-raw-insn 
Show the machine code for each instruction when disassembling, if --prefix-addresses is not specified, this will be the default option. 

--no-show-raw-insn 
When disassembling, do not show the machine code of assembly instructions, if you do not specify --prefix-addresses, this will be the default option. 

--start-address=address 
Start displaying data from the specified address, this option affects the output of the -d, -r and -s options. 

--stop-address=address 
Display data until the specified address, this option affects the output of the -d, -r and -s options. 

-t 
--syms 
Display the symbol table entry for a file. Similar to the information provided by nm -s 

-T 
--dynamic-syms 
Show the dynamic symbol table entry of a file, meaningful only for dynamic target files, such as certain shared libraries. It displays information similar to that shown by nm -D| --dynamic. 

-V 
--version 
Version information 

--all-headers 
-x 
Show all available headers, including symbol tables, relocation entries. -x is equivalent to -a -f -h -r -t specified at the same time. 

-z 
--disassemble-zeroes 
Normal disassembly output will omit large blocks of zeros, this option makes those zero blocks also be disassembled. 

@file allows to group options into a file and load it with this @file option.


objdump


Conclusion

A solid foundation makes even one foot stand firm

The base is not firmly established, na

$ gcc main.c -o main.out $ . /main.out hello world!