Day 35 — What's inside an ELF executable? (symver edition)
29 September 2020 · recurse-center TweetI saw "ELF" being mentioned a lot while reading about linkers yesterday so I thought about looking into it. Turns out it's a file format for executables. Just hearing the word "executable" used to give me the chills because I couldn't look at what's inside one. (One time I ran cat
on an executable to look at its contents, just to get an incomprehensible wall of magic text!)
But today I learned that ELF is based off of a standard, and I can use readelf
and objdump
to look inside it! I also read chapter 42 of The Linux Programming Interface and found out about symbol versioning! In this post, I'll try to summarize some things I learned about both for future me!
What's an ELF?
ELF stands for Executable and Linkable Format and it's a file format for executables like I mentioned above.
Let's say we have a hello.c
with the following C code:
#include <stdio.h>
int main()
{
puts("Hello, world!");
return 0;
}
When we compile it into an executable with gcc
(without specifying the output name) we get an a.out
.
$ gcc hello.c
$ ./a.out
Hello, world!
If we check the file type of this a.out
executable, it's ELF!
$ file a.out
a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=<sha>, for GNU/Linux 3.2.0, not stripped
a.out
was a file format on old Unix systems. It remains the default output file name for executables created by certain modern compilers, but its file format is ELF and not the older a.out
! There are a lot of executable file formats!
It was ELF all along!
An ELF executable contains (1) program headers, (2) section headers, and (3) data associated with entries in both (1) and (2). This image has a nice walkthrough and visual representation of each component in an ELF executable. We can use readelf
to view all of these different components (headers and data).
Program headers
To view the the file header (which contains some metadata about the executable), we can use readelf
with the --file-header
command-line option! Similarly, --program-headers
for the program headers.
$ readelf --file-header a.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
...
The first field is a magic number! I found the same magic number in every ELF executable I created today. I'm not sure why it's called magic though.
$ python
>>> print(bytearray.fromhex('7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00').decode('ascii'))
ELF
It's hexadecimal and converts to "ELF" in ASCII.
Section headers
To view the the section headers, we can use readelf
with the --section-headers
command-line option!
$ readelf --section-headers a.out
There are 31 section headers, starting at offset 0x3970:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[13] .plt PROGBITS 0000000000001020 00001020
0000000000000020 0000000000000010 AX 0 0 16
...
[16] .text PROGBITS 0000000000001060 00001060
0000000000000185 0000000000000000 AX 0 0 16
...
[18] .rodata PROGBITS 0000000000002000 00002000
0000000000000012 0000000000000000 A 0 0 4
...
[24] .got PROGBITS 0000000000003fb8 00002fb8
0000000000000048 0000000000000008 WA 0 0 8
[25] .data PROGBITS 0000000000004000 00003000
0000000000000010 0000000000000000 WA 0 0 8
[26] .bss NOBITS 0000000000004010 00003010
0000000000000008 0000000000000000 WA 0 0 1
...
These are the sections I could wrap my head around, apart from the procedure linkage table (.plt
) and global offset table (.got
) sections that I mentioned yesterday!
.text
: Contains the executable instructions for a program..data
: Initialized global and static variables..bss
: Uninitialized global and static variables, filled with zeros..rodata
: Static constants (and not variables).
More about all these sections here.
Symbol table
After struggling to understand what symbols are (fancy word for a function and a variable), it was a relief to see the a full symbol table stored in the executable with readelf --symbols
!
We can see the puts
(belonging to glibc 2.2.5
) and main
functions! (right scroll in the code block)
$ readelf --symbols a.out
...
Symbol table '.symtab' contains 65 entries:
Num: Value Size Type Bind Vis Ndx Name
...
49: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@@GLIBC_2.2.5
...
61: 0000000000001149 27 FUNC GLOBAL DEFAULT 16 main
...
We can also used objdump
to look at the symbol table:
$ objdump --syms a.out
a.out: file format elf64-x86-64
SYMBOL TABLE:
...
0000000000000000 F *UND* 0000000000000000 puts@@GLIBC_2.2.5
...
0000000000001149 g F .text 000000000000001b main
...
Quoting the man
page for objdump
, here's what each column in this symbol table means:
- The first number is the symbol's value (sometimes refered to as its address).
- The next field is a set of characters and spaces indicating the flag bits that are set on the symbol.
- Next is the section with which the symbol is associated, or *ABS* if the section is absolute (ie not connected with any section), or *UND* if the section is referenced in the file being dumped but not defined there.
- After the section name comes another field, a number, which for common symbols is the alignment and for other symbols is the size.
- Finally the symbol's name is displayed.
In the example above, we can see that puts
and main
have the F
flag because they are functions. The flag characters are divided into 7 groups as follows:
Group 1
Flag | Description |
---|---|
"l" | Symbol is local |
"g" | Symbol is global |
"u" | Symbol is a unique global |
Space | Neither global nor local |
"!" | Both global and local |
Group 2
Flag | Description |
---|---|
"w" | A weak symbol |
Space | A strong symbol |
Group 3
Flag | Description |
---|---|
"C" | Symbol is constructor |
Space | An ordinary symbol |
Group 4
Flag | Description |
---|---|
"W" | Symbol is a warning |
Space | A normal symbol |
Group 5
Flag | Description |
---|---|
"I" | An indirect reference to another symbol |
"i" | A function to be evaluated during reloc processing |
Space | A normal symbol |
Group 6
Flag | Description |
---|---|
"d" | A debugging symbol |
"D" | A dynamic symbol |
Space | A normal symbol |
Group 7
Flag | Description |
---|---|
"F" | Name of a function |
"f" | Name of a file |
"O" | Name of an object |
Space | A normal symbol |
Symbol Versioning
Each symbol can be versioned while creating a shared library. This means that we can define multiple versions of the same function, and executables will use the version of the function that was "current" when it was linked against the shared library.
Let's say we have the following code which is just a hello
function that prints something:
#include <stdio.h>
void hello(void)
{
puts("Hello, v1!");
}
We can define a version script:
$ hello_v1.map
VER_1 {
global: hello;
local: *; # Hide all other symbols
};
In the script, global: hello;
ensures that only the hello
function is "exported". Everything else remains hidden because of local: *;
. We also add the VER_1
tag to our "export configuration". You can also check out glibc
s version script.
We can use this version script to create a shared library called libhello.so
:
$ gcc -c -fPIC -Wall hello.c
$ gcc -shared -o libhello.so hello.o -Wl,--version-script,hello_v1.map
Now let's say we have a prog.c
where we use this hello
function. We can create an executable p1
and link it against libhello.so
:
$ cat prog.c
void hello(void);
int main()
{
hello();
return 0;
}
$ gcc -o p1 prog.c libhello.so
$ LD_LIBRARY_PATH=. ./p1
Hello, v1!
When we look at p1
's symbol table, we can see that it's using the hello
function with the VER_1
tag:
$ objdump -syms p1 | grep hello
0000000000000000 F *UND* 0000000000000000 hello@@VER_1
But what if we want to modify the definition of this hello
function, while ensuring that p1
continues to function by using the old version?
We can rename the old function to hello_old
and define a new function called hello_new
! To do this, we need to use the .symver
assembler directive to tie both these functions to different version tags.
#include <stdio.h>
__asm__(".symver hello_old,hello@VER_1");
__asm__(".symver hello_new,hello@@VER_2");
void hello_old(void)
{
puts("Hello, v1!");
}
void hello_new(void)
{
puts("Hello, v2!");
}
void world(void)
{
puts("World, v2!");
}
Our new version script looks like this:
$ hello_v2.map
VER_1 {
global: hello;
local: *; # Hide all other symbols
};
VER_2 {
global: world;
} VER_1;
VER_2
has @@
instead of @
to make it the default version so that any new executables that are linked against our shared library use the new function definitions. The } VER_1;
in the last line indicates that VER_2
has a dependency on VER_1
, which means that VER_2
"inherits" the export configuration from VER_1
, while also exporting a new world
function.
Now when we build the new version of our library, we can use our new version script:
$ gcc -c -fPIC -Wall hello.c
$ gcc -shared -o libhello.so hello.o -Wl,--version-script,hello_v2.map
And when we create a new executable for prog.c
, it uses the new definition of hello
, while p1
continues to use the old one!
$ gcc -o p2 prog.c libhello.so
$ LD_LIBRARY_PATH=. ./p2
v2 xyz
$ LD_LIBRARY_PATH=. ./p1
v1 xyz
Below are the symbol tables for p1
and p2
. We can see that they both use different hello
functions tagged with VER_1
and VER_2
!
$ objdump --syms p1 | grep hello
0000000000000000 F *UND* 0000000000000000 hello@@VER_1
$ objdump --syms p2 | grep hello
0000000000000000 F *UND* 0000000000000000 hello@@VER_2