MPQ Archives

MPQ file format



Format of files stored in MPQs

Before I will descript the MPQ format itself, I will shortly describe the format of archived files

Images used in games from the Blizzard are either in the PCX format (startup screens, large areas like Diablo II Inventory). These images can be viewed with any image viewer (IrfanView, ACDSee32) or editor (Painbrush, Microsoft Proto Editor, ...). Spell icons, character images and other little images are stored in internal Blizzard formats, like CEL, CL2, GRP, DC6. A guy named himself TeLAMoN deals with these file formats and also has published some descriptions. He also wrote a tool which allows to view these images.

Music, speach sequences and sounds are in all currently released games in the WAVE or MP3 (Since Warcraft III) format. This can be played in every sound player, e.g. Winamp. It is no problem to extract the sound file(s) and use them.

Movies (incl. StarCraft animated portraits) are stored in the format of Smacker Video or Bink Video. Standard movie player cannot play this format (yet). For playing them, you have to install the software from Rad Game Tools, which is available at their web pages. Beginning with Warcraft III, Blizzard decided to buy a licence for the DivX encoder and release their game videon in the DivX format. For playing these videos, you have to install the DivX codec.

Textures for 3D games (Warcraft III and newer) are stored in BLP format.

MPQ File Header and MPQ File Shunt

Great majority of file format begins with a header and MPQ format is no exception. Size of MPQ header is at least 32 bytes (0x20). When processing MPQ, the application looks at every offsets 0, 0x200, 0x400, 0x600, and so on, until the header has been found or until the end of the file has been reached. This feature allows to store the MPQ archives into other file types, e.g. into EXE files. Install.exe on StarCraft or BroodWar installation CD are actually MPQ archives. At this offset, there may either be MPQ header or a MPQ shunt. This is determined by the 32-bit ID:

Both structures, written as C++ data types are here:
// MPQ file shunt
struct TMPQShunt
{
    // The ID_MPQ_SHUNT ('MPQ\x1B') signature
    DWORD dwID;

    DWORD dwUnknown;

    // Position of the MPQ header, relative to the begin of the shunt
    DWORD dwHeaderPos;
};
// MPQ file header
struct TMPQHeader
{
    // The ID_MPQ ('MPQ\x1A') signature
    DWORD dwID;                         

    // Size of the archive header
    DWORD dwHeaderSize;                   

    // Size of MPQ archive
    // This field is deprecated in the Burning Crusade MoPaQ format, and the size of the archive
    // is calculated as the size from the beginning of the archive to the end of the hash table,
    // block table, or extended block table (whichever is largest).
    DWORD dwArchiveSize;

    // 0 = Original format
    // 1 = Extended format (The Burning Crusade and newer)
    USHORT wFormatVersion;

    // Power of two exponent specifying the number of 512-byte disk sectors in each logical sector
    // in the archive. The size of each logical sector in the archive is 512 * 2^wBlockSize.
    // Bugs in the Storm library dictate that this should always be 3 (4096 byte sectors).
    USHORT wBlockSize;

    // Offset to the beginning of the hash table, relative to the beginning of the archive.
    DWORD dwHashTablePos;
    
    // Offset to the beginning of the block table, relative to the beginning of the archive.
    DWORD dwBlockTablePos;
    
    // Number of entries in the hash table. Must be a power of two, and must be less than 2^16 for
    // the original MoPaQ format, or less than 2^20 for the Burning Crusade format.
    DWORD dwHashTableSize;
    
    // Number of entries in the block table
    DWORD dwBlockTableSize;
};
// Extended MPQ file header. Valid only if wFormatVersion is 1 or higher
struct TMPQHeader2 : public TMPQHeader
{
    // Offset to the beginning of the extended block table, relative to the beginning of the archive.
    LARGE_INTEGER ExtBlockTablePos;

    // High 16 bits of the hash table offset for large archives.
    USHORT wHashTablePosHigh;

    // High 16 bits of the block table offset for large archives.
    USHORT wBlockTablePosHigh;
};

Hash Table

When searching large array of strings, it is necessary to do large number of string comparisons, which slows the application. To prevent this, MPQ archives contain so called hash table. Hash is a data type (integer), which represents larger data, e.g. string. Searched string is recalculated into hash value (32-bit integer), which is the compared. This means that there are no file names stored in the MPQ archives. And because the hashing algorithm is one way (it's impossible to get back the original string value), it is also no way to search the file names in the archive. Hash table in the MPQ contains also two control hash values as test for the file name, one control value and a offset info block table. Size of one entry of hash table is 16 bytes. The structure of hash table is the following:

// Hash entry. All files in the archive are searched by their hashes.
struct TMPQHash
{
    // The hash of the file path, using method A.
    DWORD dwName1;
    
    // The hash of the file path, using method B.
    DWORD dwName2;
    
    // The language of the file. This is a Windows LANGID data type, and uses the same values.
    // 0 indicates the default language (American English), or that the file is language-neutral.
    USHORT lcLocale;

    // The platform the file is used for. 0 indicates the default platform.
    // No other values have been observed.
    USHORT wPlatform;

    // If the hash table entry is valid, this is the index into the block table of the file.
    // Otherwise, one of the following two values:
    //  - FFFFFFFFh: Hash table entry is empty, and has always been empty.
    //               Terminates searches for a given file.
    //  - FFFFFFFEh: Hash table entry is empty, but was valid at some point (a deleted file).
    //               Does not terminate searches for a given file.
    DWORD dwBlockIndex;
};

When more language versions of the same file exist in the archive, its hash entries follow and they differ only by value of lcLocale. Language versions are shown in this table:

Value Language version Value Language version
0Neutral/English (American) 0x404Chinese (Taiwan)
0x405Czech 0x407German
0x409English 0x40aSpanish
0x40cFrench 0x410Italian
0x411Japanese 0x412Korean
0x415Polish 0x416Portuguese
0x419Russsian 0x809English (UK)

The hash table is encrypted, so it is not possible to recognize in the archive. Number of entries in this table is stored in the MPQ archive header. More informations about hash theory are described in the fundamentals chapter.


Block Table

Block table contains informations about file sizes and way of their storage within the archive. It also contains the position of file content in the archive. Size of block table entry is (like hash table entry). The block table is also encrypted. The entry in the block table has the following structure:

// File description block contains informations about the file
struct TMPQBlock
{
    // Offset of the beginning of the file data, relative to the beginning of the archive.
    DWORD dwFilePos;
    
    // Compressed file size
    DWORD dwCSize;
    
    // Size of uncompressed file
    DWORD dwFSize;                      
    
    // Flags for the file. See the table below for more informations
    DWORD dwFlags;                      
};

Meanings of the dwFlags value:

Flag name Value Meaning
MPQ_FILE_IMPLODE 0x00000100 File is compressed using PKWARE Data compression library
MPQ_FILE_COMPRESS 0x00000200 File is compressed using combination of compression methods
MPQ_FILE_ENCRYPTED 0x00010000 The file is encrypted
MPQ_FILE_FIXSEED 0x00020000 The decryption key for the file is altered according to the position of the file in the archive
MPQ_FILE_SINGLE_UNIT 0x01000000 Instead of being divided to 0x1000-bytes blocks, the file is stored as single unit
MPQ_FILE_DUMMY_FILE 0x02000000 The file has length of 0 or 1 byte and its name is a hash
MPQ_FILE_HAS_EXTRA 0x04000000 The file has extra data appended after regular data. Must be a compressed file
MPQ_FILE_EXISTS 0x80000000 Set if file exists, reset when the file was deleted

Extended Block Table

Since World of Warcraft, Blizzard extended the MPQ format to support archives larger than 4GB. The Extended block table hold the higher 16-bits of the file position in the MPQ. Extended block table is plain array of 16-bit values. This table is not encrypted.

Storage of files in the archive

Every file, stored in the archive, is cutted to blocks. Size of one uncompressed block can be found in the MPQ header, usually 4 KB. If a file is compressed, the blocks are stored as compressed with the variable length. In this case, a table of block offsets (relative to the begin of the file in the MPQ is stored at the begin of the file data. Number of these entries is 1 greater that number of blockc in the file. The last one is used for getting last block size. One entry has 4 bytes (32bit value). Every block is compressed and encrypted separately (when the right bits are set in file's block table. Most the files are encrypted, except for e.g. videos (SMK file types). More information about compression and encryption can be found in the chapter Fundamentals.


Example

Here is an example of a function, which extracts one file from the archive.

//-----------------------------------------------------------------------------
// Extracts one of the archived files and saves it to the disk.
//
// Parameters :
//
//   char * szArchiveName  - Archive file name
//   char * szArchivedFile - Name/number of archived file.
//   char * szFileName     - Name of the target disk file.

static int ExtractFile(char * szArchiveName, char * szArchivedFile, char * szFileName)
{
    HANDLE hMpq   = NULL;          // Open archive handle
    HANDLE hFile  = NULL;          // Archived file handle
    HANDLE handle = NULL;          // Disk file handle
    int    nError = ERROR_SUCCESS; // Result value

    // Open an archive, e.g. "d2music.mpq"
    if(nError == ERROR_SUCCESS)
    {
        if(!SFileOpenArchive(szArchiveName, 0, 0, &hMpq))
            nError = GetLastError();
    }
    
    // Open a file in the archive, e.g. "data\global\music\Act1\tristram.wav"
    if(nError == ERROR_SUCCESS)            
    {
        if(!SFileOpenFileEx(hMpq, szArchivedFile, 0, &hFile))
            nError = GetLastError()
    }

    // Create the target file
    if(nError == ERROR_SUCCESS)
    {
        handle = CreateFile(szFileName, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL);
        if(handle == INVALID_HANDLE_VALUE)
            nError = GetLastError();
    }

    // Read the file from the archive
    if(nError == ERROR_SUCCESS)
    {
        char  szBuffer[0x10000];
        DWORD dwBytes = 1;

        while(dwBytes > 0)
        {
            SFileReadFile(hFile, szBuffer, sizeof(szBuffer), &dwBytes, NULL);
            if(dwBytes > 0)
                WriteFile(handle, szBuffer, dwBytes, &dwBytes, NULL);
        }
    }        

    // Cleanup and exit
    if(handle != NULL)
        CloseHandle(handle);
    if(hFile != NULL)
        SFileCloseFile(hFile);
    if(hMpq != NULL)
        SFileCloseArchive(hMpq);

    return nError;
}

Copyright (c) Ladislav Zezula 2003 - 2006