Java Data Structures and How They Internally Work
Class Hierarchies
Array
- Tuple (finite ordered list of elements)
- Sequential
- Linear Arrays — fixed size
Dynamic Arrays — reserved space for additional elements. If full, it copies its content to a larger array
- Optimal for indexing
- Bad at searching, inserting, deleting (excluding the end)
Complexity:
indexing = O(1)
search = O(n)
optimized search = O(logn)
insertion = (for dynamic) O(n)
ArrayList (Java)
- implements List
- Internally uses an Object[] of default size 10 (if not declared).
- When you add an item, it checks if there is any space left for the new element.
If space is not a problem, the new item is added at the next empty space.
If not; a larger array of 50% (using right shift operator to calculate) more the initial size is created and the current array is copied to the new one (using Arrays.copyOf). - When you remove an element, elements are shifted (using Arrays.copyOf).
- add
- addAll
- clear
- clone
- remove
- subList
- toArray
Complexity:
append / get : O(1)
add / remove / indexOf / contains : O(n)
LinkedList
- Chain of nodes
- A node holds data of its own and a reference to its next node.
- Doubly Linked List — reference to previous and next nodes
Circularly Linked List — head&tail linked
Stack — LIFO, most commonly with LinkedLists (head is the only place for insertion and removal) but also with Arrays
Queue — FIFO, implemented with LinkedLists (a doubly linked list that only adds to tail and removes from head) or Arrays
- Optimized for insertion/deletion
- Slow at indexing/searching
Complexity:
indexing = O(n)
search = O(n)
optimized search = O(n)
append = O(1)
prepend = O(1)
insertion = O(n)
Stack (Java)
- implements List and extends Vector (Vector class is thread-safe)
- empty
- peek
- pop
- push
- search
PriorityQueue (Java)
- implements Queue
- When new elements are inserted into the PriorityQueue, they are ordered (and retrieved later) based on their natural ordering or by a defined “Comparator” provided when we construct the PriorityQueue.
- The internal working of the PriorityQueue is based on the Binary Heap.
- not thread-safe
- offer
- poll
- peek
Complexity:
- enque / deque : O(log(n))
- retrieval : O(1)
- contains: O(n)
LinkedList (Java)
- implements List and Deque interfaces
- List implementation (doubly linked list)
- null elements are allowed.
- not good at iteration; best at removing the current element during the iteration
- There is a static “Node” class.
- LinkedList class holds “first” and “last” variables.
- When you add the very first item, both the “first” and “last” point to the new “Node”. They get updated according to the operation type.
- add
- addFirst
- addLast
- remove
- removeFirst
- removeLast
Complexity:
- append : O(1)
- add / get / remove / contains : O(n)
ArrayDeque (Java)
- implements Deque — supports insertion, removal and retrieval of elements at both ends.
- Resizeable array implementation
- Automatically doubles the size of an array when head and tail pointer meets each other while adding an element
- null elements are not allowed.
- Not thread-safe
- ArrayDeque implementation consumes less memory than LinkedList implementation.
- More efficient than the LinkedList for add and remove operation at both ends.
- Works significantly faster than the synchronized Stack.
- Is a faster queue than LinkedList due to the better locality of reference.
- addFirst
- addLast
- removeFirst
- removeLast
- getFirst
- getLast
Complexity:
Most ArrayDeque operations run in amortized constant time (O(1)).
When to use which:
- Use an ArrayList if you need to access elements by index and you only need to insert/delete at the end.
- Use an ArrayDeque as a stack, queue, or deque.
- Use a LinkedList if you need to insert/delete while iterating through the list, or if you need insertion at the ends to be worst-case O(1).
Map
- Hash Table or Hash Map
- data as key-value pairs
- hashing (a key and its unique output — beware of hash collisions)
- associative arrays, database indexing
- designed to optimize searching, insertion, deletion
Complexity:
indexing = O(1)
search = O(1)
insertion = O(1)
HashMap (Java)
- extends AbstractMap
- contains an array of “Node” which has “hash”, “key”, “value”, “next” (points to the next node in the same bucket of array table)
- Hashing = process of converting an object into integer form by using the method “hashCode”.
- A bucket is one element of HashMap array, used to store nodes.
A single bucket can have more than one node (depending on hashCode); using link list to connect the nodes. - Buckets are different in capacity.
- capacity = number of buckets * load factor
- index = hashCode(key) & (n-1)
- allows null key (with hashCode value = 0), multiple null values
- initial size = 16 with a load factor of 0.75 (75% of the capacity)
load factor = a tuning parameter; the measure that decides when to increase the capacity of the Map. - The threshold of a HashMap is approximately = current capacity * load factor.
- When the number of entries in the hash table exceeds the threshold, the Map is rehashed so that it has approximately twice the number of buckets as before.
- Rehashing is the process of re-calculating the hash code of already stored entries.
- Collision occurs when a hash function returns the same bucket location for two different keys.
- In case of collision; it checks via hashCode and equals method that if both the keys are same.
If keys are same, replace the value with current value.
Otherwise, connect this node object to the previous node object via linked list so both are stored at the same index. - HashMap initially uses the LinkedList.
When the number of entries crosses a certain threshold (TREEIFY_THRESHOLD = 8), it will replace a LinkedList with a balanced binary tree (red-black tree).
- put
- get
- getOrDefault
- remove
- containsKey
- containsValue
- isEmpty
- size
Complexity:
- O(1) for insertion and lookup
LinkedHashMap (Java)
- unique elements with insertion order
- non-synchronized
- next pointer required for LinkedList functionality.
- a Doubly LinkedList is also maintained to handle the insertion order.
- To maintain insertion order; 2 more pointers (before and after) at every Node.
before: points to the node inserted before this node
after: points to the node inserted after this node
Complexity:
- O(1) for insertion and lookup
Example:
Insertion Order:
- Dinesh (calculated index = 5)
- Anamika (calculated index = 2)
- Arnav (calculated index = 2)
Status After Insertion of Arnav:
TreeMap (Java)
- Does not use hashing for storing key-value pairs.
- Uses a data structure called the Red-Black Tree — a self-balancing binary search tree (for sorting).
- a Comparator can be provided at creation time.
- Node has references to “parent”, “right”, “left” elements.
Complexity:
O(logN) for insertion and lookup
Example:
Insertion Order:
- Ahmedabad (key = H)
- Jaipur (key = D)
- Delhi (key = B)
- Agra (key = F)
- Patna (key = P)
Set
- Duplicate elements are not allowed.
- HashSet, TreeSet, LinkedHashSet
HashSet (Java)
- HashSet class has a private HashMap property.
- a Set achieves uniqueness internally through HashMap
- When you call “add” method on HashSet, it actually calls its private HashMap object’s “put” method (as well as “remove”).
- The elements you add into HashSet are stored as keys of this HashMap object. The value associated with those keys will be a constant.
- “put” method returns the previous value associated with key, or null if there was no mapping for key.
- default init size 16, always rounded up to a power of 2
- not synchronized
- add
- isEmpty
- size
- remove
- contains
- clear
- clone
Complexity:
O(1) for the basic operations (add, remove and contains)
TreeSet (Java)
- TreeSet is like HashSet, containing unique elements but in a sorted manner.
- uses TreeMap object internally (and TreeMap uses HashMap internally)
- The elements are ordered using Red-Black Tree with their natural ordering (default ascending) or by a Comparator provided at creation time.
- not synchronized
- no tuning parameters.
Complexity:
O(1) for basic operations (add, remove and contains) assuming the hash function disperses the elements properly among the buckets.
LinkedHashSet (Java)
- extends HashSet
- uses LinkedHashMap object internally (to maintain insertion order)
Complexity:
O(1) for basic operations (add, remove and contains)
HashTable (Java)
- extends Dictionary and implements Map
- synchronized
- does not allow null value or key
- internally contains buckets in which it stores the key/value pairs
- hashcode to determine to which bucket the key/value pair should map
Complexity:
O(n) for most common operations such as get, put, contains
Happy Coding!
References
https://www.netjstech.com/2015/08/how-arraylist-works-internally-in-java.html
https://knpcode.com/java/collections/linkedlist-internal-implementation-in-java/
https://www.scientecheasy.com/2020/09/vector-in-java.html/
https://www.baeldung.com/java-array-deque
https://docs.oracle.com/javase/tutorial/collections/implementations/deque.html
https://www.geeksforgeeks.org/internal-working-of-sethashset-in-java/
https://javaconceptoftheday.com/how-hashset-works-internally-in-java/
https://www.javatpoint.com/how-treeset-works-internally-in-java
https://www.geeksforgeeks.org/internal-working-of-hashmap-java/
https://www.baeldung.com/java-hashmap-load-factor
https://anmolsehgal.medium.com/java-linkedhashmap-internal-implementation-44e2e2893036
https://www.dineshonjava.com/internal-working-of-linkedhashmap-in-java/
https://medium.com/@greekykhs/how-linkedhashmap-works-internally-in-java-409846a4f08
https://www.javatpoint.com/how-treemap-works-internally-in-java
https://www.geeksforgeeks.org/how-hashtable-works-internally-in-java/
https://howtodoinjava.com/java/collections/hashtable-class/
https://www.educative.io/edpresso/how-to-use-priorityqueue-in-java
https://www.interviewsansar.com/what-is-time-complexity-for-offer-poll-and-peek-methods-in-priority-queue/
https://qccs.me/classes/cs313/fried/website/list_comparison.html