How To Develop A PlayStation 4 Emulator(1)
Overview
PSET4 is a tiny PlayStation 4 emulator. I wrote this project for graphics driver learning. There are many open source PS4 emulators, such as FPCS4 and GPCS4. These projects helped me greatly during PSET4 programming. Skelattack is the only game I have tested. Other games are not guaranteed to run successfully.
Our project consists of four parts. The first part is load and parse the ELF. PS4 uses an extended ELF format named SELF (Signed Executable and Linkable Format) to store the executable file. We convert SELF to ELF and parse it to get information about the game program, such as the symbols, used libraries and initial parameters. Then, we allocate virtual memory and load the modules into memory.
The second part is dynamic link. We have loaded the segment into memory. However, the memory address of the symbols using dynamic link have not been determined. We shall perform dynamic link for those symbols.
The third part is to implement PS4’s built-in libraries. Some shared libraries are exclusive to PlayStation4, such as the libSceGnmDriver library and the libSceAudioOut library. Those libraries can’t be loaded and parsed from the game’s source code, since they are a built-in library in PlayStation4 and there is no need to pack them into the game source code. Therefore, we must implement these PS4’s built-in libraries by ourselves.
The fourth part is graphics. We will detail it in the next blog.
ELF Loader & Parser
ELF Header & SELF
In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. PS4 uses an extended ELF format named SELF (Signed Executable and Linkable Format) to store the executable file. Following is the SELF data layout:
Compared to ELF, SELF has two additional parts. The first part is SELF header. It describes the basic information about the SELF file. What we care about is the Number of Segments
part. It indicates the number of the SELF segment structure followed by the SELF header.
Offset | Size | Description | Notes |
---|---|---|---|
0 | 4 | Magic | 4F 15 3D 1D |
0x4 | 4 | Unknown | Always 00 01 01 12 |
0x8 | 1 | Category | 1 on SELF, 4 on PUP Entry (probably SPP). See [https://www.psdevwiki.com/ps3/Certified_File#Category PS3/PS Vita Category]. |
0x9 | 1 | Program Type | First Half denotes version (anything between 0, oldest and F, newest), second Half denotes true type, 4 for Games, 5 for sce_module modules, 6 for Video Apps like Netflix, 8 for System/EX Apps/Executables, 9 for System/EX modules/dlls |
0xA | 2 | Padding | |
0xC | 2 | Header Size | |
0xE | 2 | Signature Size | ?Metadata Size? |
0x10 | 4 | File Size | Size of SELF |
0x14 | 4 | Padding | |
0x18 | 2 | Number of Segments | 1 Kernel, 2 SL and Secure Modules, 4 Kernel ELFs, 6 .selfs, 2 .sdll, 6 .sprx, 6 ShellCore, 6 eboot.bin, 2 sexe |
0x1A | 2 | Unknown | Always 0x22 |
0x1C | 4 | Padding |
After this, follows the segment structure. I am unable to determine the effect of the segment structure other than to guess. Due to the lack of PS4 documentation, the following content may be incorrect. The SELF segment structure is almost the same as the ELF segment. The difference may be that the data in the SELF segment structure is compressed and encrypted. These parts are used to store the exclusive segment for the PlayStation4.
1 |
|
The SELF program header is followed by the ELF header. Here is the ELF header information of the eboot.bin
and the libc.prx
.
Here is the ELF header structure member layout. It describes the basic information of the whole ELF file, such as the object file type, target instruction set architecture, entry point address, program header table and section header table.
1 |
|
The eboot.bin
shown on the above is the starup executeble file of the game. The eboot module entry point address is 99AC10
. It is a virtual address. We will allocate virtual memory and map this virtual address into virtual memory. After that, we can call the entry function to startup the program. Its ELF type is ET_SCE_DYNEXEC
. PS4 has five exclusive ELF types:
1 |
|
Another ELF file is libc.prx
. The prx
stands for Playstation Relocatable eXecutable. Its ELF type is ET_SCE_DYNAMIC
. We will link this module dynamically at runtime.
We don’t care about SELF information except the SELF segment structure. Those unused information in SELF is discarded. We create the ELF file and fill the segment data with the information contained in the SELF segment structure.
1 |
|
Parser
Parse Program Header
An executable or shared object file’s program header table is an array of structures, each describing a segment or other information the system needs to prepare the program for execution. An object file segment comprises one or more sections, though this fact is transparent to the program header. Whether the file segment holds one or many sections also is immaterial to program loading. Nonetheless, various data must be present for program execution, dynamic linking, and so on. The order and membership of sections within a segment may vary.
program header structure:
1 |
|
Segment information is described in the program header. We iterate the program header table and obtain segment information:
1 |
|
We only care about four types of segment: PT_DYNAMIC
, PT_SCE_PROCPARAM
and PT_SCE_DYNLIBDATA
. The first type is the standard ELF segment type. The latter two types are exclusive to the PlayStation4.
PT_SCE_PROCPARAM
: This field stores the offset of the process parameters. We obtain the process parameters address from the offset. The process parameters comprise the size, the magic number, the entry count and the PS SDK version.
1 |
|
When the program executes the sceKernelGetProcParam
function, we return the process parameters address retrieved from the segment.
1 |
|
PT_DYNAMIC
: The array element specifies dynamic linking information.PT_SCE_DYNLIBDATA
: The dynamic library data.
Parse Dynamic Sections
If an object file participates in dynamic linking, its program header table will have an element of type PT_DYNAMIC
. This segment contains the .dynamic
section. A special symbol, _DYNAMIC
, labels the section, which contains an array of the following structures. The _DYNAMIC
array corresponds to m_pDynamicEntry
obtained in the last step.
1 |
|
For each object with this type, d_tag
controls the interpretation of d_un
.d_val
: These objects represent integer values with various interpretations.d_ptr
: It stores the relative address to the base address. The base address is stored in ‘m_pSceDynamicLib’, which is the segment of PT_SCE_DYNLIBDATA
type.
We divided the dynamic sections into two different types. The first type is segments that don’t require access to string tables. The string table (DT_SCE_STRTAB
) is a section that contains strings representing the file name, the module name and the library name. Since some dynamic sections depend on the string stable, we access these sections after all of the first types of the section iteration have been finished.
First Iteration:
1 |
|
We iterate over the sections that don’t require string tables at first. It contains the following types of sections:
DT_INIT
: It stores the initialization function offset. We can get the initialization function address from the function offset:
1 |
|
The initialization function is executed after all modules have been loaded.
1 |
|
DT_SCE_STRTAB
DT_SCE_STRSZ
: This element holds the string table address and size, as described above. It is a type of section that is exclusive to PS4.
DT_SCE_RELA
: This element holds the address of a relocation table. Entries in the table have explicit addends, such as Elf32_Rela
for the 32-bit file class or Elf64_Rela
for the 64-bit file class. An object file may have multiple relocation sections. When building the relocation table for an executable or shared object file, the link editor catenates those sections to form a single table. Although the sections remain independent in the object file, the dynamic linker sees a single table. When the dynamic linker creates the process image for an executable file or adds a shared object to the process image, it reads the relocation table and performs the associated actions.
DT_SCE_RELASZ
: This element holds the total size, in bytes, of the DT_SCE_RELA
relocation table.
DT_SCE_JMPREL
DT_SCE_PLTRELSZ
: Lazy binding (also known as lazy linking or on-demand symbol resolution) is the process by which symbol resolution isn’t done until a symbol is actually used. Functions can be bound on-demand, but data references can’t. All dynamically resolved functions are called via a Procedure Linkage Table (PLT) stub. A PLT stub uses relative addressing, using the Global Offset Table (GOT) to retrieve the offset. The PLT knows where the GOT is, and uses the offset to this table (determined at program linking time) to read the destination function’s address and make a jump to it. In PSET4, we treat the PLT the same as the relocation table and don’t perform lazy binding.
DT_SCE_SYMTAB
DT_SCE_SYMTABSZ
: This element holds the address of the symbol table, with Elf32_Sym
entries for the 32-bit class of files and Elf64_Sym
entries for the 64-bit class of files.
Second Iteration:
1 |
|
After that, we iterate over the sections that require the string table. It contains the following types of sections:
DT_NEEDED
: This element holds the string table offset of a null-terminated string, giving the name of a needed library. The offset is an index into the table recorded in the DT_SCE_STRTAB
code. The dynamic array may contain multiple entries with this type. These entries’ relative order is significant, though their relation to entries of other types is not. Following is the needed libraries of the eBoot
module:
DT_SCE_MODULE_INFO
DT_SCE_NEEDED_MODULE
: These elements store the current module information and the needed module information. Each module contains one or more libraries. Here is the module information of the eBoot
module:
DT_SCE_IMPORT_LIB
DT_SCE_EXPORT_LIB
: Those elements hold the import and export libraries. It will be used in the latter dynamic link process.
Load ELF Module Into Memory
As we have all segment information, we can load them into memory. Only headers with a type of PT_LOAD or PT_SCE_RELRO describe a loadable segment.
1 |
|
Load each of the loadable segments. This is performed as follows:
1.Allocate virtual memory for each segment, at the address specified by the p_vaddr member in the program header. The size of the segment in memory is specified by the p_memsz
member.
1 |
|
2.Copy the segment data from the file offset specified by the p_offset member to the virtual memory address specified by the p_vaddr
member. The size of the segment in the file is contained in the p_filesz member. This can be zero.
3.The p_memsz
member specifies the size the segment occupies in memory. This can be zero. If the p_filesz and p_memsz
members differ, this indicates that the segment is padded with zeros. All bytes in memory between the ending offset of the file size, and the segment’s virtual memory size are to be cleared with zeros.
1 |
|
Dynamic Link
Dynamic linking defers much of the linking process until a program starts running. We have loaded the segment into memory. However, the memory address of the symbols using dynamic link have not been determined. We shall perform dynamic link for those symbols. These symbols are divided into two types.
The symbols of the first type are not visible outside the object file containing their definition. We can relocate after the module is loaded. Other symbols of the second type are visible to all object files being combined. They are relocated after all module loaded.
Local Symbols Relocation
The following notations are used for specifying relocations in table:A
: Represents the addend used to compute the value of the relocatable field.B
: Represents the base address at which a shared object has been loaded into memory during execution. Generally, a shared object is built with a 0 base virtual address, but the execution address will be different.G
: Represents the offset into the global offset table at which the relocation entry’s symbol will reside during execution.GOT
: Represents the address of the global offset table.L
: Represents the place (section offset or address) of the Procedure Linkage Table entry for a symbol.P
: Represents the place (section offset or address) of the storage unit being relocated (computed using r_offset
).S
: Represents the value of the symbol whose index resides in the relocation entry.
The AMD64 ABI architectures uses only Elf64_Rela
relocation entries with explicit addends. The r_addend
member serves as the relocation addend.
Table: Relocation Types
Name | Value | Field | Calculation |
---|---|---|---|
R_X86_64_NONE | 0 | none | none |
R_X86_64_64 | 1 | word64 | S + A |
R_X86_64_PC32 | 2 | word32 | S + A - P |
R_X86_64_GOT32 | 3 | word32 | G + A |
R_X86_64_PLT32 | 4 | word32 | L + A - P |
R_X86_64_COPY | 5 | none | none |
R_X86_64_GLOB_DAT | 6 | word64 | S |
R_X86_64_JUMP_SLOT | 7 | word64 | S |
R_X86_64_RELATIVE | 8 | word64 | B + A |
R_X86_64_GOTPCREL | 9 | word32 | G + GOT + A - P |
R_X86_64_32 | 10 | word32 | S + A |
R_X86_64_32S | 11 | word32 | S + A |
R_X86_64_16 | 12 | word16 | S + A |
R_X86_64_PC16 | 13 | word16 | S + A - P |
R_X86_64_8 | 14 | word8 | S + A |
R_X86_64_PC8 | 15 | word8 | S + A - P |
R_X86_64_DPTMOD64 | 16 | word64 | |
R_X86_64_DTPOFF64 | 17 | word64 | |
R_X86_64_TPOFF64 | 18 | word64 | |
R_X86_64_TLSGD | 19 | word32 | |
R_X86_64_TLSLD | 20 | word32 | |
R_X86_64_DTPOFF32 | 21 | word32 | |
R_X86_64_GOTTPOFF | 22 | word32 | |
R_X86_64_TPOFF32 | 23 | word32 |
Local symbols are not visible outside the object file containing their definition. Local symbols of the same name can exist in multiple files without interfering with each other. Currently, we only process the symbols whose binding type is R_X86_64_RELATIVE. The symbol address in memory for R_X86_64_RELATIVE type is the combination of the base memory address and the addend offset.
1 |
|
As we mentioned above, we don’t use lazy binding, which means that the PLT relocation table is the same as the common relocation table.
1 |
|
Export Symbols
For those symbols are visible to all object files being combined, we shall export them and store the relocated symbol memory address into a global address table. There are two types of symbols in them: ‘STB_GLOBAL’ and ‘STB_WEAK’.
STB_GLOBAL
: Global symbols. These symbols are visible to all object files being combined. One file’s definition of a global symbol will satisfy another file’s undefined reference to the same global symbol.
STB_WEAK
: Weak symbols. These symbols resemble global symbols, but their definitions have lower precedence.
Each symbol in PS4 has a hash value. We can get the module ID, library ID and NID by decoding the hash value. Then, the module name is retrieved from the global module name map by module ID.
1 |
|
PS4 Shared Library
Some shared libraries are exclusive to PlayStation4, such as the libSceGnmDriver library and the libSceAudioOut library. Those libraries can’t be loaded and parsed from the game’s source code, since they are a built-in library in PlayStation4 and there is no need to pack them into the game source code.
However, if a library uses a symbol implemented in the libSceGnmDriver library, where can we find and load the source code on the Windows platform? In addition, where is the symbol’s address in memory? The solution is, write an override library for those PS4 built-in libraries.
The first step is to collect and generate all symbols from the PS4 built-in libraries. Symbol information can be found here. This repository stores built-in symbol information in JSON format. Here is an example:
1 |
|
is_export
indicates if that library is exported, if false, it is imported.
type
when not present isFunction
. Can beFunction
,Object
,TLS
, orUnknown11
(TBD).name
is either not present or isnull
when the name for the symbol is unknown.hex_id
andencoded_id
are included for human convenience and are not used by tools.
We developed an automation tool to convert the PS4 symbol’s JSON file into C++ source code. It is located at tools\SceModuleGenerator
. Here is the part of the sceGnmDriver
header after converting. All generated symbols are added a prefix named Pset
.
1 |
|
Following is the sceGnmDriver
source code. Currently, these functions will do nothing when invoked except output the log since we haven’t implemented them.
1 |
|
The next step is to export these symbols into the global symbol address table. Each module comprises one or more libraries. Each library contains a set of function entries.
1 |
|
The symbol table is generated automatically using SceModuleGenerator
:
1 |
|
It contains three parts:m_nid
: The Nid is an unique value for each symbol. It’s the key of the symbol in global symbol map. The symbol address is retrieved by the Nid during the relocation process.m_funcName
: Symbol name.m_pFunction
: Symbol address.
1 |
|
Dynamic linker exports the module before loading and parsing the ELF file. The modules are exported into the global symbol address table.
1 |
|
1 |
|
Relocate Global Symbols
All symbols have been loaded. The symbols from the game library are loaded during ELF parsing. PS4 built-in library symbols are loaded during emulator initialization. The next step is to relocate the symbols retrieved from the game ELF file and replaced it with the address in the memory.
1 |
|
It contains the following steps:
1.Obtain the Nid, module ID and library ID by decoding the symbol name.
1 |
|
2.Retrieve the library name and the module name by module ID and library ID.
1 |
|
3.Get the symbol address in memory by the Nid, module name and library name
1 |
|
4.Relocate the symbol address
1 |
|
Run The Emulator
We are now able to run the emulator once the program has been loaded. Create a new thread to run the game program.
1 |
|
The Executable and Linkable Format is standardized as an adaptable file format in the System V ABI. We shall invoke the main function in sysv_abi’s calling convention.
1 |
|
The System V Application Binary Interface is a set of specifications that detail calling conventions, object file formats, executable file formats, dynamic linking semantics, and much more for systems that complies with the X/Open Common Application Environment Specification and the System V Interface Definition. It is today the standard ABI used by the major Unix operating systems such as Linux, the BSD systems, and many others. The Executable and Linkable Format (ELF) is part of the System V ABI.
The entry function needs an additional parameter that indicates the startup module name:
1 |
|
Although the emulator compiles and runs successfully, we didn’t get the expected results:
All output logs are “unimplemented functions: xxxxxx”, since we haven’t implemented them. In the next blog, we will detail how to implement the graphics driver. As we only focus on graphics, we won’t discuss how to implement other non-graphics built-in libraries in PS4, like libkernel.