Primitives and Objects in Memory

In Java, there are eight primitive types: byte, short, int, long, char, float, double, and boolean. Any type that is not one of these is an Object type. Object types are constructed and stored differently in memory compared to primitive types.

A variable of a primitive type directly holds its value within the memory allocated for that variable.

To understand this concept better, let's take an example that declares a variable of a byte type:

byte a = 10;

In this statement, a variable a is declared and initialized with the value 10. When this statement is executed, the operating system allocates a memory location to store this value. For instance, let's say the OS finds an empty location at address @8a7d67c and stores the binary representation of 10 (which is 00001010) in this location. This memory location is then directly associated with the variable a.

Whenever the variable a is accessed, the program fetches the value 10 directly from its assigned memory location.

Diagram showing memory allocation for primitive types (storing values directly) vs object types (storing references to memory locations)

In the case of objects, however, the variable does not hold the object's data directly. Instead, it holds a reference (which is the object's memory address) to where the object is stored in memory (typically in an area called the heap).

In the example above, an array for holding two bytes is declared and initialized:

byte[] myInts = {1, 2};

Arrays are objects in Java. From the figure, you can see that the variable myInts doesn't store the values 1 and 2. Instead, it stores a reference (an address, like @9a7d59b) to the actual array object in memory. This reference points to another memory location (@69b8b810), which is where the array's data (the binary representations of 1 and 2) is actually stored.

Note that the memory addresses shown here are for illustrative purposes only. In reality, these values will differ across machines, and you cannot know the exact memory address assigned to a variable beforehand.

When you print the primitive variable a, you see its value, 10, printed out. However, when you print the myInts object, you will see something like [B@9a7d59b. This string is the default output which shows the object's type ([B for a byte array) and its identity hash code.

It's important to be precise: the myInts variable holds a reference (the actual memory address), but the visible output when you print the object is this hash code. Because the hash code is derived from the memory address and is unique to the object's identity, it's often used as a convenient way to "see" the reference. You can get this integer hash code for any object using the System.identityHashCode() method. For more details, see the official documentation: https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-

In the case of an array, all its values are stored in a contiguous block of memory. The reference stored in the array variable (myInts) points to the starting address of this contiguous memory block.

Object Identity vs. Logical Equality: hashCode() vs. System.identityHashCode()

It's crucial to understand the difference between an object's identity (its location in memory) and its state (the data it holds). Java provides different methods to inspect these aspects.

System.identityHashCode(): Returns an integer based on the object's memory address. It represents object identity. Two references pointing to the exact same object will have the same identity hash code. This method cannot be overridden.
hashCode(): Returns an integer based on the object's content or state. It represents logical equality. You should override this method whenever you override equals() to define what makes two distinct objects "equal" in your program's logic.
toString(): Returns a string representation of the object. By default, it returns the class name followed by the hexadecimal representation of the object's identity hash code, but it is frequently overridden to provide more meaningful output.

Example: Default Behavior

Let's create a simple User class without overriding any methods.

class User {
    private final int id;
    private final String name;

    public User(int id, String name) {
        this.id = id;
        this.name = name;
    }
}

User user1 = new User(101, "Alice");
User user2 = new User(101, "Alice");

// Are they the same object in memory? No.
System.out.println(user1 == user2); // --> false

// Default toString() shows class name and identity hash code (in hex)
System.out.println(user1.toString()); // --> e.g., User@2f92e0f4
System.out.println(user2.toString()); // --> e.g., User@28a418fc (different)

// Default hashCode() is based on identity
System.out.println(user1.hashCode()); // --> e.g., 798154996
System.out.println(user2.hashCode()); // --> e.g., 681842940 (different)

// System.identityHashCode() confirms they are different objects
System.out.println(System.identityHashCode(user1)); // --> e.g., 798154996
System.out.println(System.identityHashCode(user2)); // --> e.g., 681842940 (different)

In this case, because hashCode() is not overridden, it behaves just like System.identityHashCode().

Example: Overridden Behavior for Logical Equality

Now, let's override equals() and hashCode() to define equality based on the id field.

class User {
    private final int id;
    private final String name;

    public User(int id, String name) {
        this.id = id;
        this.name = name;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        User user = (User) o;
        return id == user.id; // Equality is based on the id field
    }

    @Override
    public int hashCode() {
        return Integer.hashCode(id); // Hash code is also based on id
    }
}

User user1 = new User(101, "Alice");
User user2 = new User(101, "Alice");

// Are they logically equal? Yes, because their IDs are the same.
System.out.println(user1.equals(user2)); // --> true

// The overridden hashCode() is now based on content (the id)
System.out.println(user1.hashCode()); // --> 101
System.out.println(user2.hashCode()); // --> 101 (same)

// But the identity hash code is still different, because they are two separate objects
System.out.println(System.identityHashCode(user1)); // --> e.g., 798154996
System.out.println(System.identityHashCode(user2)); // --> e.g., 681842940 (different)

This demonstrates the key difference: hashCode() is for defining "sameness" based on your rules, while System.identityHashCode() is for checking if you have the exact same object instance.

The equals() and hashCode() Contract: Why You Must Override Both

A critical rule in Java is: If you override equals(), you MUST override hashCode().

Violating this rule will cause severe and confusing problems when you use your objects in any hash-based collection, such as HashMap, HashSet, or Hashtable.

The Contract:

If two objects are equal according to equals(), they MUST have the same hash code.

How Hash Collections Work Hash collections use a two-step process to manage objects:

hashCode(): First, the collection uses the object's hash code to quickly find the "bucket" where the object should be stored. This is a massive performance optimization.
equals(): Second, it searches within that specific bucket, using the equals() method to find an exact match among the (usually few) objects there.

Example: Breaking the Contract Let's see what happens if we define a User class that correctly overrides equals() but fails to override hashCode().

class User {
    private final int id;
    // ... constructor ...

    @Override
    public boolean equals(Object o) {
        // ... returns true if ids are the same ...
    }

    // We "forgot" to override hashCode()!
}

User user1 = new User(101, "Alice");
User user2 = new User(101, "Alice");

// The objects are logically equal, as expected.
System.out.println(user1.equals(user2)); // --> true

// Now, let's use a HashSet
Set<User> userSet = new HashSet<>();
userSet.add(user1);

// The big question: does the set contain an object equal to user2?
System.out.println(userSet.contains(user2)); // --> 😱 false

Why did this fail?

When userSet.add(user1) was called, the HashSet used the default hashCode() (based on user1's memory address) to place it in a bucket.
When userSet.contains(user2) was called, it calculated the hash code for user2. Since user2 is a different object in memory, it has a different default hash code.
The HashSet looked in the bucket corresponding to user2's hash code. It found that bucket was empty and immediately returned false.
The equals() method was never called because the collection looked in the wrong bucket.

By not overriding hashCode(), we broke the contract. The HashSet was unable to find a logically equivalent object because its initial search, which relies on hashCode(), sent it to the wrong place. When both methods are correctly overridden (as in the previous example), user1 and user2 produce the same hash code, the HashSet looks in the correct bucket, and equals() confirms the match.

A Deeper Look: Memory, Hash Codes, and Garbage Collection

What is Garbage Collection?

In some programming languages, developers are responsible for manually allocating and freeing memory. Forgetting to free memory that is no longer needed leads to "memory leaks," which can cause an application to run out of memory and crash.

Java eliminates this problem with a process called automatic garbage collection. The Java Virtual Machine (JVM) has a component called the Garbage Collector (GC) that runs in the background. Its job is to automatically identify which objects are no longer in use by the program (meaning they have no references pointing to them) and reclaim the memory they occupy. This makes Java development safer and less error-prone.

Memory Addresses vs. Identity Hash Codes

As we've discussed, it's important to be precise about the difference between a memory address and a hash code:

Memory Address: This is a low-level pointer to the actual location in RAM where an object is stored. Think of it as a temporary street address where the object "lives." This address is managed by the JVM and is not directly accessible in Java code.
Identity Hash Code: This is a 32-bit integer (int) that serves as a permanent, unique identifier for an object. Think of it as the object's Social Security Number—it's assigned once and never changes.

The key reason for this distinction is how the Garbage Collector works.

The Impact of Garbage Collection on Memory

To manage memory efficiently, the Garbage Collector may perform memory compaction. This involves moving objects around in memory to group them together, which eliminates empty spaces (fragmentation) and makes allocating memory for new objects faster.

This creates a critical challenge: if an object's identity were tied to its actual memory address, its identity would change every time the GC moved it. This would instantly break the functionality of hash-based collections like HashMap.

To solve this, the JVM guarantees that System.identityHashCode() will return the same value for an object for its entire lifetime, regardless of how many times the Garbage Collector moves it. The JVM achieves this by storing the identity hash code in the object's header memory the first time it's requested. From that point on, even if the object's physical memory address changes, its identity hash code remains constant.

This elegant solution allows Java to have both efficient memory management (through garbage collection and compaction) and a stable concept of object identity.