Windows via C/C++: Synchronous and Asynchronous Device I/O

  • 11/28/2007

Working with File Devices

Because working with files is so common, I want to spend some time addressing issues that apply specifically to file devices. This section shows how to position a file’s pointer and change a file’s size.

The first issue you must be aware of is that Windows was designed to work with extremely large files. Instead of representing a file’s size using 32-bit values, the original Microsoft designers chose to use 64-bit values. This means that theoretically a file can reach a size of 16 EB (exabytes).

Dealing with 64-bit values in a 32-bit operating system makes working with files a little unpleasant because a lot of Windows functions require you to pass a 64-bit value as two separate 32-bit values. But as you’ll see, working with the values is not too difficult and, in normal day-to-day operations, you probably won’t need to work with a file greater than 4 GB. This means that the high 32 bits of the file’s 64-bit size will frequently be 0 anyway.

Getting a File’s Size

When working with files, quite often you will need to acquire the file’s size. The easiest way to do this is by calling GetFileSizeEx:

BOOL GetFileSizeEx(
   HANDLE         hFile,
   PLARGE_INTEGER pliFileSize);

The first parameter, hFile, is the handle of an opened file, and the pliFileSize parameter is the address of a LARGE_INTEGER union. This union allows a 64-bit signed value to be referenced as two 32-bit values or as a single 64-bit value, and it can be quite convenient when working with file sizes and offsets. Here is (basically) what the union looks like:

typedef union _LARGE_INTEGER {
   struct {
      DWORD LowPart;    // Low  32-bit unsigned value
      LONG HighPart;    // High 32-bit signed value
   };
   LONGLONG QuadPart;   // Full 64-bit signed value
} LARGE_INTEGER, *PLARGE_INTEGER;

In addition to LARGE_INTEGER, there is a ULARGE_INTEGER structure representing an unsigned 64-bit value:

typedef union _ULARGE_INTEGER {
   struct {
      DWORD LowPart;     // Low  32-bit unsigned value
      DWORD HighPart;    // High 32-bit unsigned value
   };
    ULONGLONG QuadPart;   // Full 64-bit unsigned value
} ULARGE_INTEGER, *PULARGE_INTEGER;

Another very useful function for getting a file’s size is GetCompressedFileSize:

DWORD GetCompressedFileSize(
   PCTSTR pszFileName,
   PDWORD pdwFileSizeHigh);

This function returns the file’s physical size, whereas GetFileSizeEx returns the file’s logical size. For example, consider a 100-KB file that has been compressed to occupy 85 KB. Calling GetFileSizeEx returns the logical size of the file–100 KB–whereas GetCompressedFileSize returns the actual number of bytes on disk occupied by the file–85 KB.

Unlike GetFileSizeEx, GetCompressedFileSize takes a filename passed as a string instead of taking a handle for the first parameter. The GetCompressedFileSize function returns the 64-bit size of the file in an unusual way: the low 32 bits of the file’s size are the function’s return value. The high 32 bits of the file’s size are placed in the DWORD pointed to by the pdwFileSizeHigh parameter. Here the use of the ULARGE_INTEGER structure comes in handy:

ULARGE_INTEGER ulFileSize;
ulFileSize.LowPart = GetCompressedFileSize(TEXT("SomeFile.dat"),
   &ulFileSize.HighPart);

// 64-bit file size is now in ulFileSize.QuadPart

Positioning a File Pointer

Calling CreateFile causes the system to create a file kernel object that manages operations on the file. Inside this kernel object is a file pointer. This file pointer indicates the 64-bit offset within the file where the next synchronous read or write should be performed. Initially, this file pointer is set to 0, so if you call ReadFile immediately after a call to CreateFile, you will start reading the file from offset 0. If you read 10 bytes of the file into memory, the system updates the pointer associated with the file handle so that the next call to ReadFile starts reading at the eleventh byte in the file at offset 10. For example, look at this code, in which the first 10 bytes from the file are read into the buffer, and then the next 10 bytes are read into the buffer:

BYTE pb[10];
DWORD dwNumBytes;
HANDLE hFile = CreateFile(TEXT("MyFile.dat"), ...); // Pointer set to 0
ReadFile(hFile, pb, 10, &dwNumBytes, NULL);   // Reads bytes  0 - 9
ReadFile(hFile, pb, 10, &dwNumBytes, NULL);   // Reads bytes 10 - 19

Because each file kernel object has its own file pointer, opening the same file twice gives slightly different results:

BYTE pb[10];
DWORD dwNumBytes;
HANDLE hFile1 = CreateFile(TEXT("MyFile.dat"), ...); // Pointer set to 0
HANDLE hFile2 = CreateFile(TEXT("MyFile.dat"), ...); // Pointer set to 0
ReadFile(hFile1, pb, 10, &dwNumBytes, NULL);   // Reads bytes 0 - 9
ReadFile(hFile2, pb, 10, &dwNumBytes, NULL);   // Reads bytes 0 - 9

In this example, two different kernel objects manage the same file. Because each kernel object has its own file pointer, manipulating the file with one file object has no effect on the file pointer maintained by the other object, and the first 10 bytes of the file are read twice.

I think one more example will help make all this clear:

BYTE pb[10];
DWORD dwNumBytes;
HANDLE hFile1 = CreateFile(TEXT("MyFile.dat"), ...); // Pointer set to 0
HANDLE hFile2;
DuplicateHandle(
   GetCurrentProcess(), hFile1,
   GetCurrentProcess(), &hFile2,
   0, FALSE, DUPLICATE_SAME_ACCESS);
ReadFile(hFile1, pb, 10, &dwNumBytes, NULL);   // Reads bytes  0 - 9
ReadFile(hFile2, pb, 10, &dwNumBytes, NULL);   // Reads bytes 10 - 19

In this example, one file kernel object is referenced by two file handles. Regardless of which handle is used to manipulate the file, the one file pointer is updated. As in the first example, different bytes are read each time.

If you need to access a file randomly, you will need to alter the file pointer associated with the file’s kernel object. You do this by calling SetFilePointerEx:

BOOL SetFilePointerEx(
   HANDLE         hFile,
   LARGE_INTEGER  liDistanceToMove,
   PLARGE_INTEGER pliNewFilePointer,
   DWORD          dwMoveMethod);

The hFile parameter identifies the file kernel object whose file pointer you want to change. The liDistanceToMove parameter tells the system by how many bytes you want to move the pointer. The number you specify is added to the current value of the file’s pointer, so a negative number has the effect of stepping backward in the file. The last parameter of SetFilePointerEx, dwMoveMethod, tells SetFilePointerEx how to interpret the liDistanceToMove parameter. Table 10-8 describes the three possible values you can pass via dwMoveMethod to specify the starting point for the move.

Table 10-8 Values That Can Be Passed for SetFilePointerEx’s dwMoveMethod Parameter

Value

Meaning

FILE_BEGIN

The file object’s file pointer is set to the value specified by the liDistanceToMove parameter. Note that liDistanceToMove is interpreted as an unsigned 64-bit value.

FILE_CURRENT

The file object’s file pointer has the value of liDistanceToMove added to it. Note that liDistanceToMove is interpreted as a signed 64-bit value, allowing you to seek backward in the file.

FILE_END

The file object’s file pointer is set to the logical file size plus the liDistanceToMove parameter. Note that liDistanceToMove is interpreted as a signed 64-bit value, allowing you to seek backward in the file.

After SetFilePointerEx has updated the file object’s file pointer, the new value of the file pointer is returned in the LARGE_INTEGER pointed to by the pliNewFilePointer parameter. You can pass NULL for pliNewFilePointer if you’re not interested in the new pointer value.

Here are a few facts to note about SetFilePointerEx:

  • Setting a file’s pointer beyond the end of the file’s current size is legal. Doing so does not actually increase the size of the file on disk unless you write to the file at this position or call SetEndOfFile.

  • When using SetFilePointerEx with a file opened with FILE_FLAG_NO_BUFFERING, the file pointer can be positioned only on sector-aligned boundaries. The FileCopy sample application later in this chapter demonstrates how to do this properly.

  • Windows does not offer a GetFilePointerEx function, but you can use SetFilePointerEx to move the pointer by 0 bytes to get the desired effect, as shown in the following code snippet:

    LARGE_INTEGER liCurrentPosition = { 0 };
    SetFilePointerEx(hFile, liCurrentPosition, &liCurrentPosition, FILE_CURRENT);

Setting the End of a File

Usually, the system takes care of setting the end of a file when the file is closed. However, you might sometimes want to force a file to be smaller or larger. On those occasions, call

BOOL SetEndOfFile(HANDLE hFile);

This SetEndOfFile function truncates or extends a file’s size to the size indicated by the file object’s file pointer. For example, if you wanted to force a file to be 1024 bytes long, you’d use SetEndOfFile this way:

HANDLE hFile = CreateFile(...);
LARGE_INTEGER liDistanceToMove;
liDistanceToMove.QuadPart = 1024;
SetFilePointerEx(hFile, liDistanceToMove, NULL, FILE_BEGIN);
SetEndOfFile(hFile);
CloseHandle(hFile);

Using Windows Explorer to examine the properties of this file reveals that the file is exactly 1024 bytes long.