You are currently viewing Spliterator Explained: The Engine Behind Streams

Spliterator Explained: The Engine Behind Streams

This entry is part 10 of 10 in the series Modern Java Features (Java 8+)

Introduction

Behind every Java stream lies a lesser-known but fundamental component: the Spliterator. While most developers interact only with high-level stream operations, the performance, correctness, and parallel behavior of streams are largely determined by how data is split and traversed.

This article explains what a Spliterator is, why it exists, and how it powers both sequential and parallel streams. Understanding this mechanism completes the mental model of the Java Stream API.

“Streams describe computations; spliterators define how data is traversed.”

1. Why Spliterator Exists

Before Java 8, iteration relied on Iterator, which supports only sequential traversal. This model does not scale well for parallel execution.

Unlike the classic Iterator, the Spliterator can:

  • Traverse elements sequentially
  • Split itself into multiple pieces for parallel processing
  • Estimate the size of the remaining elements
  • Describe characteristics of the data source (sorted, distinct, sized, etc.)

Every Collection in Java has a default Spliterator implementation, which you can access via the spliterator() method:

List<String> names = List.of("Alice", "Bob", "Charlie");
Spliterator<String> spliterator = names.spliterator();
spliterator.forEachRemaining(System.out::println);

2. The Four Key Responsibilities

The Spliterator interface defines four key methods.

boolean tryAdvance(Consumer<? super T> action);
Spliterator<T> trySplit();
long estimateSize();
int characteristics();

2.1. Traversal: tryAdvance()

This is the core iteration method. It takes a Consumer and applies it to the next element if one exists:

int count = 0;

List<String> names = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));
Spliterator<String> sp = names.spliterator();

while (sp.tryAdvance(name -> System.out.println(name))) {
    count++;
}

System.out.println("Processed elements: " + count);

Unlike Iterator.next(), which throws an exception if no elements remain, tryAdvance() returns false when complete—a cleaner, functional approach.

2.2. Splitting: trySplit()

This is where the magic of parallel streams happens. When Java needs to process elements concurrently, it calls trySplit() to divide the work:

List<String> names = List.of("Alice", "Bob", "Charlie", "Diana");
Spliterator<String> sp1 = names.spliterator();
Spliterator<String> sp2 = sp1.trySplit();

System.out.println("First spliterator:");
sp1.forEachRemaining(System.out::println);

System.out.println("Second spliterator:");
if (sp2 != null) {
    sp2.forEachRemaining(System.out::println);
}

// Now two spliterators exist, each covering half the data
// Can be processed in different threads

The quality of splitting determines parallel efficiency. ArrayList‘s spliterator divides neatly in half, while LinkedList‘s must traverse to find the midpoint.

2.3. Estimation: estimateSize()

Returns an approximate count of remaining elements. This helps optimize splitting decisions and batch sizing:

List<String> names = List.of("Alice", "Bob", "Charlie");
Spliterator<String> sp = names.spliterator();

System.out.println("Estimated size before traversal: " + sp.estimateSize()); // 3

sp.tryAdvance(System.out::println);

System.out.println("Estimated size after one element: " + sp.estimateSize()); // 2

2.4. Characteristics: characteristics()

Returns bit flags describing the data source’s properties:

  • SIZED: Known exact size (arrays, most collections)
  • SORTED: Elements follow a natural order (TreeSet)
  • DISTINCT: No duplicates (Set implementations)
  • CONCURRENT: Can be safely modified by multiple threads
  • IMMUTABLE: Cannot be modified at all
  • SUBSIZED: All child Spliterators, whether direct or indirect, will be SIZED

These characteristics allow Stream operations to optimize themselves. For example, knowing a source is SORTED lets skip() and limit() work more efficiently.

Default Spliterator characteristics in the JDK

Every Java collection exposes a Spliterator, which describes how elements can be traversed and split:

ArrayList: ordered and size-aware

List<String> list = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));
Spliterator<String> sp = list.spliterator();

System.out.println("ORDERED   : " + sp.hasCharacteristics(Spliterator.ORDERED));   // true
System.out.println("SORTED    : " + sp.hasCharacteristics(Spliterator.SORTED));    // false
System.out.println("DISTINCT  : " + sp.hasCharacteristics(Spliterator.DISTINCT));  // false
System.out.println("SIZED     : " + sp.hasCharacteristics(Spliterator.SIZED));     // true
System.out.println("SUBSIZED  : " + sp.hasCharacteristics(Spliterator.SUBSIZED));  // true

HashMap (keys): distinct but unordered

Map<String, Integer> map = new HashMap<>();
map.put("Alice", 30);
map.put("Bob", 25);
map.put("Charlie", 35);

Spliterator<String> sp = map.keySet().spliterator();

System.out.println("ORDERED   : " + sp.hasCharacteristics(Spliterator.ORDERED));   // false
System.out.println("SORTED    : " + sp.hasCharacteristics(Spliterator.SORTED));    // false
System.out.println("DISTINCT  : " + sp.hasCharacteristics(Spliterator.DISTINCT));  // true
System.out.println("SIZED     : " + sp.hasCharacteristics(Spliterator.SIZED));     // true

TreeSet: ordered because it is sorted

Set<Integer> treeSet = new TreeSet<>(Set.of(3, 1, 2));
Spliterator<Integer> sp = treeSet.spliterator();

System.out.println("ORDERED   : " + sp.hasCharacteristics(Spliterator.ORDERED));   // true
System.out.println("SORTED    : " + sp.hasCharacteristics(Spliterator.SORTED));    // true
System.out.println("DISTINCT  : " + sp.hasCharacteristics(Spliterator.DISTINCT));  // true
System.out.println("SIZED     : " + sp.hasCharacteristics(Spliterator.SIZED));     // true

HashSet: distinct but neither ordered nor sorted

Set<Integer> hashSet = new HashSet<>(Set.of(3, 1, 2));
Spliterator<Integer> sp = hashSet.spliterator();

System.out.println("ORDERED   : " + sp.hasCharacteristics(Spliterator.ORDERED));   // false
System.out.println("SORTED    : " + sp.hasCharacteristics(Spliterator.SORTED));    // false
System.out.println("DISTINCT  : " + sp.hasCharacteristics(Spliterator.DISTINCT));  // true
System.out.println("SIZED     : " + sp.hasCharacteristics(Spliterator.SIZED));     // true

3. How Streams Use Spliterators

When you create a stream from a collection:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
Stream<Integer> stream = numbers.stream();

Here’s what happens internally:

  • The collection’s spliterator() method provides a Spliterator
  • Stream operations (map, filter, etc.) wrap this Spliterator with new ones
  • For parallel streams, trySplit() divides the work among threads
  • Terminal operations consume elements via tryAdvance()

The quality of splitting directly impacts performance.

“Good splitting leads to good parallelism.”

4. Writing a Custom Spliterator (When Needed)

In day-to-day Java development, you will rarely manipulate a Spliterator directly.
Standard collections, arrays, and I/O utilities already expose well-designed spliterators, and the Stream API builds on them transparently.

That said, a custom Spliterator becomes useful when you want to stream data that is not stored in a collection, but still has:

  • a clear traversal order,
  • a known size,
  • and no reason to be fully loaded into memory.

A common real-world example is streaming a numeric range or identifiers produced on the fly, such as database IDs or batch numbers.

Minimal real-world example: streaming an ID range

class IdRangeSpliterator implements Spliterator<Long> {

    private long current;
    private final long end;

    IdRangeSpliterator(long start, long end) {
        this.current = start;
        this.end = end;
    }

    @Override
    public boolean tryAdvance(Consumer<? super Long> action) {
        if (current > end) {
            return false;
        }
        action.accept(current++);
        return true;
    }

    @Override
    public Spliterator<Long> trySplit() {
        long remaining = end - current;
        if (remaining < 10) {
            return null; // too small to split
        }
        long mid = current + remaining / 2;
        Spliterator<Long> split = new IdRangeSpliterator(current, mid);
        current = mid + 1;
        return split;
    }

    @Override
    public long estimateSize() {
        return end - current + 1;
    }

    @Override
    public int characteristics() {
        return ORDERED | SIZED | SUBSIZED | NONNULL | IMMUTABLE;
    }
}

Usage:

Spliterator<Long> spliterator = new IdRangeSpliterator(1, 100);

StreamSupport.stream(spliterator, false)
        .forEach(System.out::println);

Why this example matters

  • No collection is created
  • The stream remains ordered and size-aware
  • The logic stays simple and predictable

An Iterator could traverse the data, but only a Spliterator can describe how it behaves, which is what allows the Stream API to work efficiently—especially for parallel execution.

Write a custom Spliterator only when you need to describe a data source, not just iterate over it.

Most applications will never need one, but understanding this mechanism clarifies how streams handle ordering, sizing, and parallelism under the hood.

Note: The above example could be replaced by LongStream.rangeClosed.
It is intentionally simplified to illustrate how a Spliterator works.
In real systems, the same structure is used when the data source is lazy, paginated, or external.

5. Spliterator vs Iterator: Key Differences

While both interfaces traverse elements, their responsibilities differ.

FeatureIteratorSpliterator
TraversalSequential only / hasNext() / next()Sequential or parallel / tryAdvance(Consumer)
Parallel streamsNoYes via trySplit()
Bulk operationsNoforEachRemaining(Consumer)
MetadataNoneSize estimation, characteristics

“Iterator walks; Spliterator divides and conquers.”

Conclusion

The Spliterator is the silent engine of the Java Stream API. It defines how data is traversed, how work is divided, and how parallel execution scales. While most developers never implement one, understanding spliterators explains why streams behave the way they do.

You can find the complete code of this article here on GitHub.

Modern Java Features (Java 8+)

Parallel Stream Processing: Performance and Risks

Noel Kamphoa

Experienced software engineer with expertise in Telecom, Payroll, and Banking. Now Senior Software Engineer at Societe Generale Paris.