Understanding values and references
- By John Sharp
- 5/1/2022
How computer memory is organized
Computers use memory to hold programs that are being executed and the data that those programs use. To understand the differences between value and reference types, it’s helpful to understand how data is organized in memory.
Operating systems and language runtimes such as those used by C# frequently divide the memory used for holding data into two separate areas, each of which is managed in a distinct manner. These two areas of memory are traditionally called the stack and the heap. The stack and the heap serve different purposes:
When you call a method, the memory required for its parameters and its local variables is acquired from the stack. When the method finishes (because it either returns or throws an exception), the memory acquired for the parameters and local variables is automatically released back to the stack to be made available again when another method is called. Method parameters and local variables on the stack have a well-defined lifespan: They come into existence when the method starts, and they disappear as soon as the method completes.
The same lifespan applies to variables defined in any block of code enclosed by opening and closing braces. In the following code example, the variable i is created when the body of the while loop starts, but it disappears when the while loop finishes, and execution continues after the closing brace:
while (...) { int i = …; // i is created on the stack here ... } // i disappears from the stack here
When you create an object (an instance of a class) by using the new keyword, the memory required to build the object is acquired from the heap. You’ve seen that the same object can be referenced from several places by using reference variables. When the last reference to an object disappears, the memory used by the object becomes available again (although it might not be reclaimed immediately). Objects created on the heap therefore have a more indeterminate lifespan; an object is created by using the new keyword, but it disappears only sometime after the last reference to the object is removed. Chapter 14 includes a more detailed discussion of how heap memory is reclaimed.
The names stack and heap come from the way in which the runtime manages the memory:
Stack memory is organized like boxes stacked neatly on top of one another. When a method is called, each parameter is placed in a box that is added to the top of the stack. Each local variable is likewise assigned a box, which is placed on top of the boxes already on the stack. When a method finishes, think of it as being like a box being removed from the stack.
Heap memory is like a large pile of boxes strewn around a room rather than stacked neatly on top of one another. Each box has a label indicating whether it is in use. When a new object is created, the runtime searches for an empty box and allocates it to the object. The reference to the object is stored in a local variable on the stack. The runtime keeps track of the number of references to each box. (Remember: two variables can refer to the same object.) When the last reference disappears, the runtime marks the box as not in use; at some point in the future, it empties the box and makes it available.
Using the stack and the heap
Now let’s examine what happens when a method named Method is called:
void Method(int param) { Circle c; c = new Circle(param); ... }
Suppose the argument passed into param is the value 42. When the method is called, a block of memory (just enough for an int) is allocated from the stack and initialized with the value 42. As execution moves inside the method, another block of memory big enough to hold a reference (a memory address) is also allocated from the stack, but left uninitialized. This is for the Circle variable, c. Next, another piece of memory big enough for a Circle object is allocated from the heap. This is what the new keyword does. The Circle constructor runs to convert this raw heap memory to a Circle object. A reference to this Circle object is stored in the variable c. The following illustration shows this process:
At this point, you should note two things:
Although the object is stored on the heap, the reference to the object (the variable c) is stored on the stack.
Heap memory is not infinite. If heap memory is exhausted, the new operator will throw an OutOfMemoryException exception, and the object will not be created.
When the method ends, the parameters and local variables go out of scope. The memory acquired for c and param is automatically released back to the stack. The runtime notes that the Circle object is no longer referenced and at some point in the future will arrange for its memory to be reclaimed by the heap. (See Chapter 14.)
The System.Object class
One of the most important reference types in .NET is the Object class in the System namespace. To fully appreciate the significance of the System.Object class, you must understand inheritance, which is described in Chapter 12, “Working with inheritance.” For now, simply accept that all classes are specialized types of System.Object and that you can use System.Object to create a variable that can refer to any reference type. System.Object is such an important class that C# provides the object keyword as an alias for System.Object. In your code, you can use object, or you can write System.Object. They mean the same thing.
In the following example, the variables c and o both refer to the same Circle object. The fact that the type of c is Circle and the type of o is object (the alias for System.Object) in effect provides two different views of the same item in memory.
Circle c; c = new Circle(42); object o; o = c;
The following diagram illustrates how the variables c and o refer to the same item on the heap:
Boxing
As you have just seen, variables of type object can refer to any item of any reference type. However, variables of type object can also refer to a value type. For example, the following two statements initialize the variable i (of type int, a value type) to 42 and then initialize the variable o (of type object, a reference type) to i:
int i = 42; object o = i;
The second statement requires a little explanation to appreciate what’s actually happening. Remember that i is a value type and that it lives on the stack. If the reference inside o referred directly to i, the reference would refer to the stack. However, references should refer to objects on the heap. Creating uncontrolled references to items on the stack could seriously compromise the robustness of the runtime and potentially create a security flaw, so it is not allowed. Therefore, the runtime allocates a piece of memory from the heap, copies the value of integer i to this piece of memory, and then refers the object o to this copy. This automatic copying of an item from the stack to the heap is called boxing. The following diagram shows the result:
Unboxing
Because a variable of type object can refer to a boxed copy of a value, it’s only reasonable to allow you to get at that boxed value through the variable. You might expect to be able to access the boxed int value that a variable o refers to by using a simple assignment statement such as this:
int i = o;
However, if you try this syntax, you’ll get a compile-time error. If you think about it, it’s fairly sensible that you can’t use the int i = o; syntax. After all, o could be referencing absolutely anything and not just an int. Consider what would happen in the following code if this statement were allowed:
Circle c = new Circle(); int i = 42; object o; o = c; // o refers to a circle i = o; // what is stored in i?
To obtain the value of the boxed copy, you must use what is known as a cast. This is an operation that checks whether converting an item of one type to another is safe before actually making the copy. You prefix the object variable with the name of the type in parentheses, as in this example:
int i = 42; object o = i; // boxes i = (int)o; // compiles okay
The effect of this cast is subtle. The compiler notices that you’ve specified the type int in the cast. Next, the compiler generates code to check what o actually refers to at runtime. It could be absolutely anything. Just because your cast says o refers to an int, that doesn’t mean it actually does. If o really does refer to a boxed int and everything matches, the cast succeeds, and the compiler-generated code extracts the value from the boxed int and copies it to i. (In this example, the boxed value is then stored in i.) This is called unboxing. The following diagram shows what’s happening:
On the other hand, if o does not refer to a boxed int, there is a type mismatch, causing the cast to fail. The compiler-generated code throws an InvalidCastException exception at runtime. Here’s an example of an unboxing cast that fails:
Circle c = new Circle(42); object o = c; // doesn’t box because Circle is a reference variable int i = (int)o; // compiles okay but throws an exception at runtime
The following diagram illustrates this case:
You’ll use boxing and unboxing in later exercises. Keep in mind that boxing and unboxing are expensive operations because of the amount of checking required and the need to allocate additional heap memory. Boxing has its uses, but injudicious use can severely impair the performance of a program. You’ll see an alternative to boxing in Chapter 17, “Introducing generics.”