Skip to content

[SPARK-56032][SQL] Skip FilterExec subexpression elimination codegen when there is no common subexpression#56209

Open
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:wenchen/filter-cse-skip-when-no-common-subexpr
Open

[SPARK-56032][SQL] Skip FilterExec subexpression elimination codegen when there is no common subexpression#56209
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:wenchen/filter-cse-skip-when-no-common-subexpr

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

FilterExec whole-stage codegen takes the subexpression-elimination (CSE) path whenever subexpressionEliminationEnabled && otherPreds.nonEmpty, regardless of whether any common subexpression actually exists. That path emits an inputVarsEvalCode prologue at the top of the per-row loop that eagerly evaluates every input column referenced by otherPreds (required so eliminated subexpressions can be materialized into shared variables). When there is nothing to eliminate, this prologue provides no benefit but still defeats the short-circuiting the non-CSE path gets from loading columns lazily, just before the predicate that needs them.

This PR gates the CSE path on whether otherPreds actually contain a common subexpression (a new hasCommonSubexpressions, using the same EquivalentExpressions detection and output binding as the CSE codegen, so it agrees exactly with whether that path would find anything). When there is none, it falls back to the non-CSE generatePredicateCode, which loads columns lazily and preserves short-circuiting. Filters that do have a common subexpression are unaffected.

Why are the changes needed?

For a filter with no common subexpression but multiple conjuncts over different columns (e.g. q_int BETWEEN ... AND (decimal_a BETWEEN ... OR decimal_b BETWEEN ...)), the eager prologue decodes the decimal columns for every row, including rows a cheaper earlier predicate would have rejected. Decoding a high-precision decimal allocates a BigInteger/BigDecimal per call, so this is pure waste and shows up as a measurable performance regression versus the lazy non-CSE path (observed on TPC-DS q28).

Does this PR introduce any user-facing change?

No. This is a codegen-only change; query results are unchanged.

How was this patch tested?

New unit test in WholeStageCodegenSuite asserting that, for a filter with no common subexpression, the CSE-enabled generated code is identical to the CSE-disabled code (i.e. it falls back to the lazy, short-circuiting path). The existing FilterExec CSE tests, which use genuine common subexpressions, still exercise the CSE path and pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Claude Code)

cloud-fan added 2 commits May 29, 2026 16:40
…when there is no common subexpression

### What changes were proposed in this pull request?

`FilterExec` whole-stage codegen takes the subexpression-elimination (CSE) path
whenever `subexpressionEliminationEnabled && otherPreds.nonEmpty`, regardless of
whether any common subexpression actually exists. That path emits an
`inputVarsEvalCode` prologue at the top of the per-row loop that eagerly evaluates
every input column referenced by `otherPreds` (required so eliminated
subexpressions can be materialized into shared variables). When there is nothing
to eliminate, this prologue provides no benefit but still defeats the
short-circuiting the non-CSE path gets from loading columns lazily, just before
the predicate that needs them.

This PR gates the CSE path on whether `otherPreds` actually contain a common
subexpression (`hasCommonSubexpressions`, using the same `EquivalentExpressions`
detection and `output` binding as the CSE codegen). When there is none, it falls
back to the non-CSE `generatePredicateCode`, which loads columns lazily and
preserves short-circuiting.

### Why are the changes needed?

For a filter with no common subexpression but multiple conjuncts over different
columns (e.g. `q_int BETWEEN ... AND (decimal_a BETWEEN ... OR decimal_b BETWEEN
...)`), the eager prologue decodes the decimal columns for every row, including
rows a cheaper earlier predicate would have rejected. Decoding a high-precision
decimal allocates a BigInteger/BigDecimal per call, so this is pure waste and
shows up as a measurable performance regression versus the lazy non-CSE path.

### Does this PR introduce _any_ user-facing change?

No. This is a codegen-only change; query results are unchanged.

### How was this patch tested?

New unit test in `WholeStageCodegenSuite` asserting that, for a filter with no
common subexpression, the CSE-enabled generated code is identical to the
CSE-disabled code (i.e. it falls back to the lazy, short-circuiting path). The
existing `FilterExec` CSE tests, which use genuine common subexpressions, still
exercise the CSE path and pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Claude Code)

Co-authored-by: Isaac
Reuse a single EquivalentExpressions analysis between the has-common-subexpression
gate and the CSE codegen, instead of building it twice. Add a
subexpressionEliminationForWholeStageCodegen overload that takes a pre-built
EquivalentExpressions, and have FilterExec hold the bound predicates and their CSE
analysis as @transient lazy vals (driver-only codegen state; EquivalentExpressions
is not serializable).

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant