Bloom filter folding in Parquet: faster point lookups for observability

If you've used Logfire to search for a sparse equality filter — say span_name = 'tool called' — you're doing a point lookup: a query that matches a handful of rows out of potentially billions. For these queries, filtering is the bottleneck. The faster we can skip irrelevant data, the faster your query returns.

Bloom filters are one of the key mechanisms that make this possible in Parquet. We recently contributed an optimization to Apache Arrow — bloom filter folding — that makes bloom filters both smaller and more effective, with no tuning required.

This post is a deep dive. We'll build up from what a bloom filter actually is, to the specific variant Parquet uses, to the sizing problem that makes them hard to deploy well, and finally to the folding trick that solves it. If you just want the takeaway: your point lookups get faster and your files get smaller, automatically.

What a bloom filter actually is

A bloom filter is a compact data structure that answers one question: "is this value in this set?" — and it answers in one of two ways: "definitely no" or "probably yes."

That asymmetry is the whole point. There are no false negatives. If the filter says no, the value is guaranteed not to be in the set, and you can skip reading the underlying data entirely. There can be false positives — the filter occasionally says "probably yes" for a value that isn't there — but those only cost you a wasted read, never a wrong answer.

Mechanically, a classic bloom filter is just a bit array plus a few hash functions. To insert a value, you hash it with each hash function, and each hash picks one bit to flip to 1. To check a value, you hash it the same way and look at those same bits: if all of them are 1, the answer is "probably yes"; if any of them is 0, the answer is a definite "no" — because if the value had ever been inserted, those bits would have been set.

A 12-bit bloom filter. Inserting 'tool called' sets bits 2, 5, and 9. Probing 'tool called' finds all three bits set, so the answer is 'probably present'. Probing 'http.route' finds bit 4 is 0, so the answer is a definite 'absent — skip the row group'.

In Parquet, each row group gets its own bloom filter per column. When you query span_name = 'tool called', the engine checks the filter for each row group before reading any column data. Every row group whose filter says "definitely no" is skipped — no decompression, no I/O, nothing. For selective queries on high-cardinality columns — trace IDs, span names, user IDs — this can mean skipping 99% of the data.

These aren't theoretical gains. DuckDB measured 50x speedups on point lookups using bloom filters, and InfluxData reported up to 30x.

The Parquet flavor: Split Block Bloom Filters

A textbook bloom filter has a problem on modern hardware: its k bits are scattered across the whole bit array, so a single lookup touches k random memory locations. Each of those is a potential cache miss. When you're checking millions of values, that's death by a thousand cache misses.

Parquet uses a cache-friendly variant called the Split Block Bloom Filter (SBBF), introduced by Putze, Sanders, and Singler. The idea: instead of scattering bits across the whole filter, confine all of a value's bits to a single block of 256 bits — exactly the size of a cache line pair, and a perfect fit for SIMD instructions. One value touches one block, so one lookup is essentially one memory access.

Here's how a value gets placed. The filter is an array of z blocks. You hash the value to a 64-bit number and split it in half:

The high 32 bits pick which block the value belongs to. Rather than an expensive modulo, the spec uses a multiply-shift: i = ((h >> 32) * z) >> 32. This maps the hash uniformly onto [0, z) — and, as we'll see, this exact formula is what makes folding possible.
The low 32 bits decide which 8 bits to set inside that block. Each block is eight 32-bit words, and the value sets exactly one bit in each word, using eight odd "salt" constants to spread the bits out.

A 64-bit hash split into two halves. The high 32 bits select one block out of z blocks via the multiply-shift formula i = ((h>>32) * z) >> 32. The low 32 bits set 8 bits inside the chosen block — one bit per 32-bit word, derived from the SALT constants.

So every value lights up exactly 8 bits in exactly 1 block. Insert sets those 8 bits; lookup checks whether all 8 are already set. It's the same "all bits set → probably yes, any bit clear → definitely no" logic as before, just packed into a single cache-resident block.

The sizing dilemma

Here's the catch — and it's the same catch for every bloom filter, SBBF or not. You have to choose how big the filter should be, and the right size depends on how many distinct values it will hold.

The number of distinct values (NDV) drives the fill rate: what fraction of bits end up set. Insert too many values into too small a filter and nearly every bit becomes 1. At that point the filter answers "probably yes" to everything — it's saturated and useless. The false positive probability of an SBBF block scales roughly with the fill rate to the 8th power (fill⁸, one factor per word), so the cliff is steep: a filter that's comfortable at one cardinality falls off badly as it fills.

Go the other way — make the filter far bigger than you need — and it works perfectly, but you're now paying for a mostly-empty bit array in every single file, multiplied across billions of rows.

Three bloom filters shown as bit grids. 'Too small' is nearly all 1s and saturated, with false positive probability near 1.0. 'Too big' is mostly empty — accurate but wasting space in every file. 'Just right' is balanced for the actual data, with a low false positive rate and no wasted bytes.

The trouble is you rarely know the NDV up front. One row group might hold 10 distinct span names; the next might hold 100,000 distinct trace IDs. Writers handle this in a few ways, none of them free:

Use a fixed guess. This is what the Rust Parquet implementation, arrow-rs, did before our contribution — a fixed size that was okay on average but frequently wrong, producing either bloated or saturated filters.
Sample to estimate NDV. Some systems scan the data first to estimate cardinality before writing. DuckDB does this. It adds a pass and some complexity.
Write several sizes and pick one. Build multiple filters and keep the best after the fact — more memory and more work at write time.

For Logfire this was a real problem. High-cardinality columns like trace IDs would saturate the filter and quietly stop helping; low-cardinality columns would get filters far larger than necessary. And because it all depends on each customer's data distribution, there was no single configuration we could ship that worked for everyone.

Folding: size the filter after you've seen the data

The optimization we contributed flips the order of operations. Instead of guessing the size before writing, you start with a conservatively large filter, write all your values into it, and then shrink it to fit once you know exactly how full it is. The shrinking step is called folding.

Folding works by merging adjacent pairs of blocks with a bitwise OR, which halves the number of blocks:

block[i] = block[2i] | block[2i + 1]

Each fold halves the size. You can fold repeatedly — 8 blocks → 4 → 2 → 1 — OR-ing pairs together at each step. Because OR never clears a bit, every bit that was set in either source block stays set in the result.

Why this is mathematically sound

The reason folding introduces no false negatives comes straight from that block-selection formula, i = ((h >> 32) * z) >> 32. Halving z is the same as halving the result: a value that mapped to block i in a filter of z blocks maps to block ⌊i/2⌋ in a filter of z/2 blocks. In other words, blocks 2i and 2i+1 are exactly the two blocks that collapse into block i after a fold.

So when we OR block[2i] and block[2i+1] together, we're guaranteed to be combining the only two places a value could have landed. Whatever 8 bits a value set before the fold are still set in the merged block afterward — a lookup will always still find them. (This same property is what lets two SBBFs be merged, and the folding result is provably equivalent to having written into the smaller filter from the start.)

How far to fold

After writing, the implementation measures the actual average fill rate across blocks and computes how many folds it can do while staying under the target false positive probability. Each fold roughly doubles the effective number of values per block, so the FPP climbs predictably with each fold; the algorithm folds as many times as it can without crossing the target. The result is a filter sized for the data that was actually written, not for a guess made before writing — automatically, with no NDV estimate and no tuning.

1. Allocate a generously large filter (sized from the row group's max row count).
2. Write every value — the filter fills up, often far below capacity.
3. Measure the real fill rate across all blocks.
4. Fold (OR adjacent pairs) as many times as possible while FPP ≤ target.
5. Serialize the right-sized filter into the file.

In practice this means:

Low-cardinality columns shrink dramatically. A column with 50 distinct values doesn't need a filter sized for a million — it folds down to something tiny.
High-cardinality columns stay effective. Instead of saturating and going useless, the filter folds to whatever size maintains the target false positive rate.
No configuration required. This is the new default. You don't estimate NDV, you don't tune parameters.

What it costs

Folding isn't free, but it's close. The only real price is memory at write time: you hold the conservatively-large filter in RAM until you fold it down — on the order of ~1 MiB per bloom filter during writing. That's negligible next to the column data and other write buffers.

The compute cost is a single popcount-style scan over the oversized filter to measure fill, plus a logarithmic number of OR passes to fold. In the PR's benchmarks, folding a filter built for 1M NDV that actually received 100K values added roughly 27 µs on top of ~190 µs of insert work — about a 14% overhead on that step, and a rounding error against the cost of actually encoding and compressing the column. Once written, the read-side filter is smaller and more accurate, so every query that uses it does less work, forever.

What this means for Logfire

Logfire stores observability data — traces, spans, logs — in Parquet. Many of the most common queries are point lookups: find a specific trace by ID, filter spans by name, search for a particular attribute value. With folded bloom filters:

Faster point lookups. Queries like span_name = 'tool called' or trace_id = '...' skip more row groups and read less data.
Smaller files. Right-sized filters waste fewer bytes — and that adds up across the billions of rows we store.
Better defaults. We no longer guess or tune bloom filter sizes per column or per customer; the right size falls out of the data.
Minimal write overhead. ~1 MiB of memory per filter during writes, and nothing on the read path but a smaller, sharper filter.

This builds on the dynamic shredding work we shipped earlier this year. Shredding made attribute queries fast by giving them dedicated columns; folded bloom filters make point lookups on those columns faster still by skipping whole row groups before we ever touch them.

Community contribution

We contributed this directly to Apache Arrow rather than maintaining a fork. Every project that uses the Rust Parquet implementation — not just Logfire — gets smarter bloom filters by default. The Parquet Java maintainers have already expressed interest in porting the approach, so there's a real chance folding becomes a standard part of the Parquet ecosystem.

Want to see this in action? Get started with Pydantic Logfire and run a point lookup on your trace data.

Smarter bloom filters in Parquet: how filter folding speeds up point lookups in Logfire

What a bloom filter actually is

The Parquet flavor: Split Block Bloom Filters

The sizing dilemma

Folding: size the filter after you've seen the data

Why this is mathematically sound

How far to fold

What it costs

What this means for Logfire

Community contribution

Related content

Observability tools agents want

Some customers can't use your cloud. Now what?

Explore Logfire

Smarter bloom filters in Parquet: how filter folding speeds up point lookups in Logfire

#What a bloom filter actually is

#The Parquet flavor: Split Block Bloom Filters

#The sizing dilemma

#Folding: size the filter after you've seen the data

#Why this is mathematically sound

#How far to fold

#What it costs

#What this means for Logfire

#Community contribution

Related content

Observability tools agents want

Some customers can't use your cloud. Now what?

Explore Logfire

What a bloom filter actually is

The Parquet flavor: Split Block Bloom Filters

The sizing dilemma

Folding: size the filter after you've seen the data

Why this is mathematically sound

How far to fold

What it costs

What this means for Logfire

Community contribution