MPQ Archives
MPQ file format
Justin Olbrantz (Quantam) and Jean-Francois Roy (BahamutZERO) have written a more detailed description of MPQ format. It can be found on Devklog.com.
Great majority of file format begins with a header and MPQ format is no exception. Size of MPQ header is at least 32 bytes (0x20). When processing MPQ, the application checks every 512 (0x200), until the header has been found or until the end of the file has been reached. This feature allows to store the MPQ archives into other file types, e.g. into EXE files. Install.exe on StarCraft or BroodWar installation CD are actually MPQ archives. At this offset, there may either be MPQ header or a MPQ user data. This is determined by the 32-bit ID:
// MPQ user data
struct TMPQUserData
{
// The ID_MPQ_USERDATA ('MPQ\x1B') signature
DWORD dwID;
// Maximum size of the user data
DWORD cbUserDataSize;
// Offset of the MPQ header, relative to the begin of this header
DWORD dwHeaderOffs;
// Appears to be size of user data header (Starcraft II maps)
DWORD cbUserDataHeader;
};
// MPQ file header
struct TMPQHeader
{
// The ID_MPQ ('MPQ\x1A') signature
DWORD dwID;
// Size of the archive header
DWORD dwHeaderSize;
// Size of MPQ archive
// This field is deprecated in the Burning Crusade MoPaQ format, and the size of the archive
// is calculated as the size from the beginning of the archive to the end of the hash table,
// block table, or extended block table (whichever is largest).
DWORD dwArchiveSize;
// 0 = Original format
// 1 = Extended format (The Burning Crusade and newer)
USHORT wFormatVersion;
// Power of two exponent specifying the number of 512-byte disk sectors in each logical sector
// in the archive. The size of each logical sector in the archive is 512 * 2^wBlockSize.
// Bugs in the Storm library dictate that this should always be 3 (4096 byte sectors).
USHORT wBlockSize;
// Offset to the beginning of the hash table, relative to the beginning of the archive.
DWORD dwHashTablePos;
// Offset to the beginning of the block table, relative to the beginning of the archive.
DWORD dwBlockTablePos;
// Number of entries in the hash table. Must be a power of two, and must be less than 2^16 for
// the original MoPaQ format, or less than 2^20 for the Burning Crusade format.
DWORD dwHashTableSize;
// Number of entries in the block table
DWORD dwBlockTableSize;
};
// Extended MPQ file header. Valid only if wFormatVersion is 1 or higher
struct TMPQHeader2 : public TMPQHeader
{
// Offset to the beginning of the extended block table, relative to the beginning of the archive.
LARGE_INTEGER ExtBlockTablePos;
// High 16 bits of the hash table offset for large archives.
USHORT wHashTablePosHigh;
// High 16 bits of the block table offset for large archives.
USHORT wBlockTablePosHigh;
};
When searching large array of strings, it is necessary to do large number of string comparisons, which slows the application. To prevent this, MPQ archives contain so called hash table. Hash is a data type (integer), which represents larger data, e.g. string. Searched string is recalculated into hash value (32-bit integer), which is the compared. This means that there are no file names stored in the MPQ archives. And because the hashing algorithm is one way (it's impossible to get back the original string value), it is also no way to search the file names in the archive. Hash table in the MPQ contains also two control hash values as test for the file name, one control value and a offset info block table. Size of one entry of hash table is 16 bytes. The structure of hash table is the following:
// Hash entry. All files in the archive are searched by their hashes.
struct TMPQHash
{
// The hash of the file path, using method A.
DWORD dwName1;
// The hash of the file path, using method B.
DWORD dwName2;
// The language of the file. This is a Windows LANGID data type, and uses the same values.
// 0 indicates the default language (American English), or that the file is language-neutral.
USHORT lcLocale;
// The platform the file is used for. 0 indicates the default platform.
// No other values have been observed.
USHORT wPlatform;
// If the hash table entry is valid, this is the index into the block table of the file.
// Otherwise, one of the following two values:
// - FFFFFFFFh: Hash table entry is empty, and has always been empty.
// Terminates searches for a given file.
// - FFFFFFFEh: Hash table entry is empty, but was valid at some point (a deleted file).
// Does not terminate searches for a given file.
DWORD dwBlockIndex;
};
When more language versions of the same file exist in the archive, its hash entries follow and they differ only by value of lcLocale. Language versions are shown in this table:
| Value | Language version | Value | Language version |
|---|---|---|---|
| 0 | Neutral/English (American) | 0x404 | Chinese (Taiwan) |
| 0x405 | Czech | 0x407 | German |
| 0x409 | English | 0x40a | Spanish |
| 0x40c | French | 0x410 | Italian |
| 0x411 | Japanese | 0x412 | Korean |
| 0x415 | Polish | 0x416 | Portuguese |
| 0x419 | Russsian | 0x809 | English (UK) |
The hash table is encrypted, so it is not possible to recognize in the archive. Number of entries in this table is stored in the MPQ archive header. More informations about hash theory are described in the fundamentals chapter.
In World of Warcraft, the hash table was observed to be compressed in one of partial MPQs used by the trial version of the game. Compressed size of hash table is calculated as:
CompressedHashTableSize = (pMpqHeader->dwBlockTablePos - pMpqHeader->dwHashTablePos)
Block table contains informations about file sizes and way of their storage within the archive. It also contains the position of file content in the archive. Size of block table entry is (like hash table entry). The block table is also encrypted. The entry in the block table has the following structure:
// File description block contains informations about the file
struct TMPQBlock
{
// Offset of the beginning of the file data, relative to the beginning of the archive.
DWORD dwFilePos;
// Compressed file size
DWORD dwCSize;
// Size of uncompressed file
DWORD dwFSize;
// Flags for the file. See the table below for more informations
DWORD dwFlags;
};
Meanings of the dwFlags value:
| Flag name | Value | Meaning |
|---|---|---|
| MPQ_FILE_IMPLODE | 0x00000100 | File is compressed using PKWARE Data compression library |
| MPQ_FILE_COMPRESS | 0x00000200 | File is compressed using combination of compression methods |
| MPQ_FILE_ENCRYPTED | 0x00010000 | The file is encrypted |
| MPQ_FILE_FIX_KEY | 0x00020000 | The decryption key for the file is altered according to the position of the file in the archive |
| MPQ_FILE_SINGLE_UNIT | 0x01000000 | Instead of being divided to 0x1000-bytes blocks, the file is stored as single unit |
| MPQ_FILE_DELETE_MARKER | 0x02000000 | File is a deletion marker, indicating that the file no longer exists. This is used to allow patch archives to delete files present in lower-priority archives in the search chain. The file usually has length of 0 or 1 byte and its name is a hash |
| MPQ_FILE_SECTOR_CRC | 0x04000000 | File has checksums for each sector (explained in the File Data section). Ignored if file is not compressed or imploded. |
| MPQ_FILE_EXISTS | 0x80000000 | Set if file exists, reset when the file was deleted |
In World of Warcraft, the block table was observed to be compressed in one of partial MPQs used by the trial version of the game. Compressed size of block table is calculated as:
CompressedBlockTableSize = (pMpqHeader->dwArchiveSize - pMpqHeader->dwBlockTablePos)
Since World of Warcraft, Blizzard extended the MPQ format to support archives larger than 4GB. The Extended block table hold the higher 16-bits of the file position in the MPQ. Extended block table is plain array of 16-bit values. This table is not encrypted.
Every file, stored in the archive, is split to blocks. Size of one uncompressed block can be found in the MPQ header, usually 4 KB. If a file is compressed, the blocks are stored as compressed with the variable length. In this case, a table of block offsets (relative to the begin of the file in the MPQ is stored at the begin of the file data. Number of these entries is 1 greater that number of blockc in the file. The last one is used for getting last block size. One entry has 4 bytes (32bit value). Every block is compressed and encrypted separately (when the right bits are set in file's block table. Most the files are encrypted, except for e.g. videos (SMK file types). More information about compression and encryption can be found in the chapter Fundamentals.
Copyright (c) Ladislav Zezula 2003 - 2010