- Local Variable Type Inference in Java
- Sealed Classes and Interfaces In Java
- Records In Java
- Java Stream API: What It Is (and What It Is Not)
- Creating and Consuming Streams in Java
- Stream Operations and Pipelines Explained
- Aggregating Stream Data Using the reduce Operation
- Bi-Argument Functional Interfaces in the Stream API
- Parallel Stream Processing: Performance and Risks
- Spliterator Explained: The Engine Behind Streams
Introduction
Behind every Java stream lies a lesser-known but fundamental component: the Spliterator. While most developers interact only with high-level stream operations, the performance, correctness, and parallel behavior of streams are largely determined by how data is split and traversed.
This article explains what a Spliterator is, why it exists, and how it powers both sequential and parallel streams. Understanding this mechanism completes the mental model of the Java Stream API.
“Streams describe computations; spliterators define how data is traversed.”
1. Why Spliterator Exists
Before Java 8, iteration relied on Iterator, which supports only sequential traversal. This model does not scale well for parallel execution.
Unlike the classic Iterator, the Spliterator can:
- Traverse elements sequentially
- Split itself into multiple pieces for parallel processing
- Estimate the size of the remaining elements
- Describe characteristics of the data source (sorted, distinct, sized, etc.)
Every Collection in Java has a default Spliterator implementation, which you can access via the spliterator() method:
List<String> names = List.of("Alice", "Bob", "Charlie");
Spliterator<String> spliterator = names.spliterator();
spliterator.forEachRemaining(System.out::println);
2. The Four Key Responsibilities
The Spliterator interface defines four key methods.
boolean tryAdvance(Consumer<? super T> action);
Spliterator<T> trySplit();
long estimateSize();
int characteristics();
2.1. Traversal: tryAdvance()
This is the core iteration method. It takes a Consumer and applies it to the next element if one exists:
int count = 0;
List<String> names = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));
Spliterator<String> sp = names.spliterator();
while (sp.tryAdvance(name -> System.out.println(name))) {
count++;
}
System.out.println("Processed elements: " + count);
Unlike Iterator.next(), which throws an exception if no elements remain, tryAdvance() returns false when complete—a cleaner, functional approach.
2.2. Splitting: trySplit()
This is where the magic of parallel streams happens. When Java needs to process elements concurrently, it calls trySplit() to divide the work:
List<String> names = List.of("Alice", "Bob", "Charlie", "Diana");
Spliterator<String> sp1 = names.spliterator();
Spliterator<String> sp2 = sp1.trySplit();
System.out.println("First spliterator:");
sp1.forEachRemaining(System.out::println);
System.out.println("Second spliterator:");
if (sp2 != null) {
sp2.forEachRemaining(System.out::println);
}
// Now two spliterators exist, each covering half the data
// Can be processed in different threads
The quality of splitting determines parallel efficiency. ArrayList‘s spliterator divides neatly in half, while LinkedList‘s must traverse to find the midpoint.
2.3. Estimation: estimateSize()
Returns an approximate count of remaining elements. This helps optimize splitting decisions and batch sizing:
List<String> names = List.of("Alice", "Bob", "Charlie");
Spliterator<String> sp = names.spliterator();
System.out.println("Estimated size before traversal: " + sp.estimateSize()); // 3
sp.tryAdvance(System.out::println);
System.out.println("Estimated size after one element: " + sp.estimateSize()); // 2
2.4. Characteristics: characteristics()
Returns bit flags describing the data source’s properties:
SIZED: Known exact size (arrays, most collections)SORTED: Elements follow a natural order (TreeSet)DISTINCT: No duplicates (Set implementations)CONCURRENT: Can be safely modified by multiple threadsIMMUTABLE: Cannot be modified at allSUBSIZED: All child Spliterators, whether direct or indirect, will beSIZED
These characteristics allow Stream operations to optimize themselves. For example, knowing a source is SORTED lets skip() and limit() work more efficiently.
Default Spliterator characteristics in the JDK
Every Java collection exposes a Spliterator, which describes how elements can be traversed and split:
ArrayList: ordered and size-aware
List<String> list = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));
Spliterator<String> sp = list.spliterator();
System.out.println("ORDERED : " + sp.hasCharacteristics(Spliterator.ORDERED)); // true
System.out.println("SORTED : " + sp.hasCharacteristics(Spliterator.SORTED)); // false
System.out.println("DISTINCT : " + sp.hasCharacteristics(Spliterator.DISTINCT)); // false
System.out.println("SIZED : " + sp.hasCharacteristics(Spliterator.SIZED)); // true
System.out.println("SUBSIZED : " + sp.hasCharacteristics(Spliterator.SUBSIZED)); // true
HashMap (keys): distinct but unordered
Map<String, Integer> map = new HashMap<>();
map.put("Alice", 30);
map.put("Bob", 25);
map.put("Charlie", 35);
Spliterator<String> sp = map.keySet().spliterator();
System.out.println("ORDERED : " + sp.hasCharacteristics(Spliterator.ORDERED)); // false
System.out.println("SORTED : " + sp.hasCharacteristics(Spliterator.SORTED)); // false
System.out.println("DISTINCT : " + sp.hasCharacteristics(Spliterator.DISTINCT)); // true
System.out.println("SIZED : " + sp.hasCharacteristics(Spliterator.SIZED)); // true
TreeSet: ordered because it is sorted
Set<Integer> treeSet = new TreeSet<>(Set.of(3, 1, 2));
Spliterator<Integer> sp = treeSet.spliterator();
System.out.println("ORDERED : " + sp.hasCharacteristics(Spliterator.ORDERED)); // true
System.out.println("SORTED : " + sp.hasCharacteristics(Spliterator.SORTED)); // true
System.out.println("DISTINCT : " + sp.hasCharacteristics(Spliterator.DISTINCT)); // true
System.out.println("SIZED : " + sp.hasCharacteristics(Spliterator.SIZED)); // true
HashSet: distinct but neither ordered nor sorted
Set<Integer> hashSet = new HashSet<>(Set.of(3, 1, 2));
Spliterator<Integer> sp = hashSet.spliterator();
System.out.println("ORDERED : " + sp.hasCharacteristics(Spliterator.ORDERED)); // false
System.out.println("SORTED : " + sp.hasCharacteristics(Spliterator.SORTED)); // false
System.out.println("DISTINCT : " + sp.hasCharacteristics(Spliterator.DISTINCT)); // true
System.out.println("SIZED : " + sp.hasCharacteristics(Spliterator.SIZED)); // true
3. How Streams Use Spliterators
When you create a stream from a collection:
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
Stream<Integer> stream = numbers.stream();
Here’s what happens internally:
- The collection’s spliterator() method provides a Spliterator
- Stream operations (map, filter, etc.) wrap this Spliterator with new ones
- For parallel streams, trySplit() divides the work among threads
- Terminal operations consume elements via tryAdvance()
The quality of splitting directly impacts performance.
“Good splitting leads to good parallelism.”
4. Writing a Custom Spliterator (When Needed)
In day-to-day Java development, you will rarely manipulate a Spliterator directly.
Standard collections, arrays, and I/O utilities already expose well-designed spliterators, and the Stream API builds on them transparently.
That said, a custom Spliterator becomes useful when you want to stream data that is not stored in a collection, but still has:
- a clear traversal order,
- a known size,
- and no reason to be fully loaded into memory.
A common real-world example is streaming a numeric range or identifiers produced on the fly, such as database IDs or batch numbers.
Minimal real-world example: streaming an ID range
class IdRangeSpliterator implements Spliterator<Long> {
private long current;
private final long end;
IdRangeSpliterator(long start, long end) {
this.current = start;
this.end = end;
}
@Override
public boolean tryAdvance(Consumer<? super Long> action) {
if (current > end) {
return false;
}
action.accept(current++);
return true;
}
@Override
public Spliterator<Long> trySplit() {
long remaining = end - current;
if (remaining < 10) {
return null; // too small to split
}
long mid = current + remaining / 2;
Spliterator<Long> split = new IdRangeSpliterator(current, mid);
current = mid + 1;
return split;
}
@Override
public long estimateSize() {
return end - current + 1;
}
@Override
public int characteristics() {
return ORDERED | SIZED | SUBSIZED | NONNULL | IMMUTABLE;
}
}
Usage:
Spliterator<Long> spliterator = new IdRangeSpliterator(1, 100);
StreamSupport.stream(spliterator, false)
.forEach(System.out::println);
Why this example matters
- No collection is created
- The stream remains ordered and size-aware
- The logic stays simple and predictable
An Iterator could traverse the data, but only a Spliterator can describe how it behaves, which is what allows the Stream API to work efficiently—especially for parallel execution.
Write a custom Spliterator only when you need to describe a data source, not just iterate over it.
Most applications will never need one, but understanding this mechanism clarifies how streams handle ordering, sizing, and parallelism under the hood.
Note: The above example could be replaced by LongStream.rangeClosed.
It is intentionally simplified to illustrate how a Spliterator works.
In real systems, the same structure is used when the data source is lazy, paginated, or external.
5. Spliterator vs Iterator: Key Differences
While both interfaces traverse elements, their responsibilities differ.
| Feature | Iterator | Spliterator |
|---|---|---|
| Traversal | Sequential only / hasNext() / next() | Sequential or parallel / tryAdvance(Consumer) |
| Parallel streams | No | Yes via trySplit() |
| Bulk operations | No | forEachRemaining(Consumer) |
| Metadata | None | Size estimation, characteristics |
“Iterator walks; Spliterator divides and conquers.”
Conclusion
The Spliterator is the silent engine of the Java Stream API. It defines how data is traversed, how work is divided, and how parallel execution scales. While most developers never implement one, understanding spliterators explains why streams behave the way they do.
You can find the complete code of this article here on GitHub.
