[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207
[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207stevomitric wants to merge 2 commits into
Conversation
### What changes were proposed in this pull request? Implement `Ordering` for `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)`, both in the interpreted path (`PhysicalDataType.ordering`) and the codegen path (`CodeGenerator.genComp`). This is the second of three PRs for SPARK-57103. The first (`[SPARK-57103][SQL] Add Comparable to TimestampNanosVal`, commit 0b0ffb7) made the value class `Comparable`. The remaining PR will extend `hash`, `xxhash64`, and `murmur3` for the two nanos types. Changes: - `PhysicalDataType.scala`: replace the two `orderedOperationUnsupportedByDataTypeError` throws with `implicitly[Ordering[InternalType]]`, following the `PhysicalGeographyType` / `PhysicalGeometryType` precedent. Resolves via `scala.math.Ordering.ordered[T <: Comparable[T]]` now that `TimestampNanosVal` is `Comparable`. - `CodeGenerator.scala`: add an explicit `genComp` arm that calls `compareTo`. Required because the existing AtomicType fallback would emit `c1.compare(c2)`, which fails to compile on `TimestampNanosVal` (it has `compareTo`, not `compare`). - Updated the scaladoc on both physical types to note that only hash remains as future work. ### Why are the changes needed? Without ordering, SQL operators that need a total order on the type (`ORDER BY`, sort-merge join, sort-based `GROUP BY`, `DISTINCT`) cannot execute against nanos-precision columns. The two physical types previously threw at runtime. ### Does this PR introduce _any_ user-facing change? No. The nanos types remain gated behind `spark.sql.timestampNanosTypes.enabled`; this PR only fills in the ordering hole their `PhysicalDataType` had. ### How was this patch tested? Added 10 unit tests in `OrderingSuite` (5 cases × 2 types): equal values, `epochMicros` primary key, `nanosWithinMicro` tie-breaker, `Long.MinValue` / `Long.MaxValue` boundary, and pre-epoch (negative `epochMicros`). Each case verifies both `InterpretedOrdering` and `LazilyGeneratedOrdering` agree on ASC and DESC, and that `compare(a, a) == 0`. ``` build/mvn test -pl sql/catalyst \ -DwildcardSuites=org.apache.spark.sql.catalyst.expressions.OrderingSuite ``` Tests: 66/66 passing (10 new). Also ran `DataTypeSuite` and the catalyst-side `TimestampNanos*Suite` set: 344/344 passing. Not adding nanos types to `DataTypeTestUtils.atomicTypes` yet because the generic `GenerateOrdering with $dataType` test there uses `RandomDataGenerator`, which does not yet support nanos types (tracked in SPARK-57034). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.7)
|
@MaxGekk please take a look at this PR. |
| case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? -1 : 0)" | ||
| case BinaryType => s"org.apache.spark.unsafe.types.ByteArray.compareBinary($c1, $c2)" | ||
| case CalendarIntervalType => s"$c1.compareTo($c2)" | ||
| case _: TimestampNTZNanosType | _: TimestampLTZNanosType => s"$c1.compareTo($c2)" |
There was a problem hiding this comment.
Consider restoring a short comment here explaining why this arm is needed. It's not self-evident: TimestampNTZNanosType/TimestampLTZNanosType are AtomicTypes, so without this case they'd fall through to the case other if other.isInstanceOf[AtomicType] => s"$c1.compare($c2)" fallback a few lines below — which emits .compare(...), a method TimestampNanosVal does not have (it's Comparable, providing compareTo, not Ordered). A future maintainer could reasonably delete this arm as "redundant" and silently break codegen compilation for these types. Something like:
// AtomicType, but the generic AtomicType fallback below emits `.compare`, which
// TimestampNanosVal (Comparable, not Ordered) does not have; emit `.compareTo`.
case _: TimestampNTZNanosType | _: TimestampLTZNanosType => s"$c1.compareTo($c2)"| } | ||
| } | ||
|
|
||
| Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9)).foreach { dt => |
There was a problem hiding this comment.
The chosen edge cases are good. Two cheap additions would round out coverage:
- A null case. The
BoundReferenceisnullable = true, but no null values are ever passed, so null-vs-value ordering for nanos columns isn't exercised. It's generic (handled inBaseOrdering), so low risk, but nearly free to cover. - A non-
9precision case (e.g.7or8). Ordering is precision-independent here, so one case would document that intent.
What changes were proposed in this pull request?
Implement
OrderingforTimestampNTZNanosType(p)andTimestampLTZNanosType(p), both in the interpreted path and the codegen path.Why are the changes needed?
Without ordering, SQL operators that need a total order on the type (
ORDER BY, sort-merge join, sort-basedGROUP BY,DISTINCT) cannot execute against nanos-precision columns.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT in this PR.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7