Mach-O File Format

From iPhone Development Wiki
(Redirected from Mach-O)

The Mach-O File format is the standard executable format across darwin (*OS) systems.

Like most executable formats, it contains a header, set of "commands" dictating memory layout and other attributes, followed by several segments and sections containing code and data.

Header

Magic

Error creating thumbnail: File missing

The magic field, a la the first 4 bytes of the binary, can be any one of 4 values. These denote some basic info about the binary.

MH_MAGIC = FE ED FA CE // Big endian, 32 bit Mach-O

MH_CIGAM = CE FA ED FE // Little endian, 32 bit Mach-O

MH_MAGIC_64 = FE ED FA CF // Big Endian, 64 bit Mach-O

MH_CIGAM_64 = CF FA ED FE // Little endian, 64 bit Mach-O

When read in the proper endianness, they will always be represented as their FEEDFAC* variant.

CPU Type

The CPU Type contains basic info about the architecture this binary was compiled for.

The 0x01000000 bit encodes whether a given type corresponds to the 64-bit variant of an architecture. So, for example, 0x7 corresponds to x86, while 0x01000007 corresponds to x86_64.

The following are common (some less so) constants for this field:

enum MachOArchitecture {
	MachOABIMask	= 0xff000000,
	MachOABI64	= 0x01000000, // 64 bit ABI
	MachOABI6432	= 0x02000000, // "ABI for 64-bit hardware with 32-bit types; LP32"

	// Constants for the cputype field.
	MachOx86	= 7,
	MachOx64	= MachOx86 | MachOABI64,
	MachOArm	= 0xc,
	MachOAarch64	= MachOABI64 | MachOArm,
	MachOAarch6432	= MachOABI6432 | MachOArm,
	MachOSPARC	= 0xe,
	MachOPPC	= 0x12,
	MachOPPC64	= MachOABI64 | MachOPPC,
};

CPU SubType

The meaning of the CPU subtype field's value is dependent on the CPU Type field. A full list of CPU Subtypes for their corresponding CPU Types can be found here:

https://github.com/apple-oss-distributions/lldb/blob/10de1840defe0dff10b42b9c56971dbc17c1f18c/llvm/include/llvm/Support/MachO.h#L925

File Type

A description of the current MH_FILETYPE constants and their meanings:

Name Value Description
MH_OBJECT 0x1 Describes an Object (.o) file, an unlinked intermediate during compilation
MH_EXECUTE 0x2 Describes a standard Executable Mach-O
MH_FVMLIB 0x3 Fixed VM Shared Library File
MH_CORE 0x4 Core dump file
MH_PRELOAD 0x5 Preloaded executable file
MH_DYLIB 0x6 Describes a Mach-O Dynamic Library
MH_DYLINKER 0x7 Dynamic Link Editor
MH_BUNDLE 0x8 Dynamically bound bundle file
MH_DYLIB_STUB 0x9 Shared library stub for static linking only, no section contents
MH_DSYM 0xA Describes a Companion file with only debug sections
MH_KEXT_BUNDLE 0xB x86_64 Kext bundle
MH_FILESET 0xC A set of MachOs running in the same address space and sharing a linkedit

Command Info

ncmds and sizeofcmds encode the number of commands following this header, along with their size in bytes.

Flags

This section is incomplete, you can help by expanding it


Load Commands

Name Value Description
LC_SEGMENT 0x1 Describes a 32-bit Mach-O Segment Command
LC_SYMTAB 0x2 Provides the location of the Symbol table
LC_SYMSEG 0x3 An obsolete load command containing the offset and size of the (GNU style) symbol table
LC_THREAD 0x4 Describes starting conditions for the program (register values and entry point). Obsolete.
LC_UNIXTHREAD 0x5 Describes starting conditions for the program (register values and entry point). Obsolete.
LC_LOADFVMLIB 0x6 Describes a given FVM Library to load. Similar to LC_LOAD_DYLIB. Obsolete.
LC_IDFVMLIB 0x7 Identifies the install name of an FVM Library. Similar to LC_ID_DYLIB. Obsolete.
LC_IDENT 0x8 An obsolete command containing a free format string table.
LC_FVMFILE 0x9
LC_PREPAGE 0xA unk
LC_DYSYMTAB 0xB Describes attributes of symbols in provided ranges in the binary's symtab
LC_LOAD_DYLIB 0xC Info about a library this image requires to load
LC_ID_DYLIB 0xD Install Name for this image
LC_LOAD_DYLINKER 0xE
LC_ID_DYLINKER 0xF
LC_PREBOUND_DYLIB 0x10 Used to indicate dynamic libraries used in prebinding.
LC_ROUTINES 0x11 Address of the dynamic shared library initialization routine and an index into the module table for the module that defines the routine.
LC_SUB_FRAMEWORK 0x12 A load command signifying membership of a subframework containing the name of an umbrella framework.
LC_SUB_UMBRELLA 0x13 A load command signifying membership of a subumbrella containing the name of an umbrella framework
LC_SUB_CLIENT 0x14 Load command signifying processes this framework is allowed to be loaded into. Enforced loosely by dyld.
LC_SUB_LIBRARY 0x15 Load command signifying frameworks this framework is allowed to be loaded into. Enforced loosely by dyld.
LC_TWOLEVEL_HINTS 0x16 A load command containing the offset and number of hints in the two-level namespace lookup hints table.
LC_PREBIND_CKSUM 0x17 A load command containing the value of the original checksum for prebound files, or zero.
LC_LOAD_WEAK_DYLIB 0x80000018
LC_SEGMENT_64 0x19 Descriptor for a 64-bit Mach-O Segment
LC_ROUTINES_64 0x1a 64-bit variant of LC_ROUTINES
LC_UUID 0x1b Contains a unique identifier for this binary.
LC_RPATH 0x8000001c Contains a path to search for libraries linked with an install name containing "@rpath"
LC_CODE_SIGNATURE 0x1d Contains address for the code signing information for this binary
LC_SEGMENT_SPLIT_INFO 0x1e https://stackoverflow.com/a/73630043/13062807
LC_REEXPORT_DYLIB 0x8000001f
LC_LAZY_LOAD_DYLIB 0x20
LC_ENCRYPTION_INFO 0x21 Represents the offset to and size of an encrypted segment
LC_DYLD_INFO 0x22 Contains offsets for several tables of information required by dyld (exports, binding tables, etc.)
LC_DYLD_INFO_ONLY 0x80000022 Same as LC_DYLD_INFO for all intents and purposes.
LC_LOAD_UPWARD_DYLIB 0x80000023
LC_VERSION_MIN_MACOSX 0x24 Implicitly denotes this is a macOS binary, and encodes the minimum version of macOS this binary can run on
LC_VERSION_MIN_IPHONEOS 0x25 Implicitly denotes this is an iOS binary, and encodes the minimum version of iOS required to run
LC_FUNCTION_STARTS 0x26 Location of a table contianing the starts of functions within code
LC_DYLD_ENVIRONMENT 0x27
LC_MAIN 0x80000028 Encodes the entry point of the binary. Replaces LC_THREAD / LC_UNIXTHREAD
LC_DATA_IN_CODE 0x29
LC_SOURCE_VERSION 0x2a Version of the source code compiled to generate this binary. Afaik this is not enforced through LOAD_DYLIB at any point.
LC_DYLIB_CODE_SIGN_DRS 0x2b
LC_ENCRYPTION_INFO_64 0x2c 64 bit variant of LC_ENCRYPTION_INFO
LC_LINKER_OPTION 0x2d
LC_LINKER_OPTIMIZATION_HINT 0x2e
LC_VERSION_MIN_TVOS 0x2f tvOS variant of LC_VERSION_MIN_MACOSX
LC_VERSION_MIN_WATCHOS 0x30 watchOS variant of LC_VERSION_MIN_MACOSX
LC_NOTE 0x31
LC_BUILD_VERSION 0x32
LC_DYLD_EXPORTS_TRIE 0x33 As Chained fixups now obsolete the binding tables encoded in LC_DYLD_INFO, this command encodes only the address of the Export Trie
LC_CHAINED_FIXUPS 0x34 Points to the Chained Fixup table
LC_FILESET_ENTRY 0x35 Points to a Mach-O header of a fileset item contained within a MH_FILESET binary

Chained Fixups

A self-contained (C++) function for parsing Chained Fixups can be found here:

https://github.com/Vector35/view-macho/blob/d473c0c30c7d80d3e3a2ee283dd48011c1e154fb/machoview.cpp#L2600

This can be followed along with here.

---

Chained fixups are a novel method of encoding the Binding Information (rewrites of given pointers to addresses located in loaded libraries).

The LC_DYLD_CHAINED_FIXUPS load command points to an offset from the image's file base where the basic info needed to process these is located. These tables will be located within the LINKEDIT segment.

Error creating thumbnail: File missing

Header and provided tables

The structure directly pointed at by the load command is a dyld_chained_fixups_header.

Error creating thumbnail: File missing

This simply contains the offsets (from the start of the header) to several tables containing the relevant info.

Starts Tables

Error creating thumbnail: File missing

The Image Starts table contains offsets to "segments_starts" tables for every segment within the image.

Error creating thumbnail: File missing

The Segment Starts table contains offsets into every page located within a given segment.

These addresses are where the "Pointer Chains" start.

Pointer Chains

Incomplete, see above link to code.

Other Resources

Heavily referenced here: https://github.com/Homebrew/ruby-macho/blob/master/lib/macho/headers.rb