Skip to content

[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207

Open
stevomitric wants to merge 2 commits into
apache:masterfrom
stevomitric:stevomitric/SPARK-57103-ordering
Open

[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207
stevomitric wants to merge 2 commits into
apache:masterfrom
stevomitric:stevomitric/SPARK-57103-ordering

Conversation

@stevomitric
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Implement Ordering for TimestampNTZNanosType(p) and TimestampLTZNanosType(p), both in the interpreted path and the codegen path.

Why are the changes needed?

Without ordering, SQL operators that need a total order on the type (ORDER BY, sort-merge join, sort-based GROUP BY, DISTINCT) cannot execute against nanos-precision columns.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT in this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

### What changes were proposed in this pull request?

Implement `Ordering` for `TimestampNTZNanosType(p)` and
`TimestampLTZNanosType(p)`, both in the interpreted path
(`PhysicalDataType.ordering`) and the codegen path
(`CodeGenerator.genComp`).

This is the second of three PRs for SPARK-57103. The first
(`[SPARK-57103][SQL] Add Comparable to TimestampNanosVal`,
commit 0b0ffb7) made the value class `Comparable`. The
remaining PR will extend `hash`, `xxhash64`, and `murmur3`
for the two nanos types.

Changes:
- `PhysicalDataType.scala`: replace the two
  `orderedOperationUnsupportedByDataTypeError` throws with
  `implicitly[Ordering[InternalType]]`, following the
  `PhysicalGeographyType` / `PhysicalGeometryType` precedent.
  Resolves via `scala.math.Ordering.ordered[T <: Comparable[T]]`
  now that `TimestampNanosVal` is `Comparable`.
- `CodeGenerator.scala`: add an explicit `genComp` arm that
  calls `compareTo`. Required because the existing AtomicType
  fallback would emit `c1.compare(c2)`, which fails to compile
  on `TimestampNanosVal` (it has `compareTo`, not `compare`).
- Updated the scaladoc on both physical types to note that
  only hash remains as future work.

### Why are the changes needed?

Without ordering, SQL operators that need a total order on
the type (`ORDER BY`, sort-merge join, sort-based `GROUP BY`,
`DISTINCT`) cannot execute against nanos-precision columns.
The two physical types previously threw at runtime.

### Does this PR introduce _any_ user-facing change?

No. The nanos types remain gated behind
`spark.sql.timestampNanosTypes.enabled`; this PR only fills
in the ordering hole their `PhysicalDataType` had.

### How was this patch tested?

Added 10 unit tests in `OrderingSuite` (5 cases × 2 types):
equal values, `epochMicros` primary key, `nanosWithinMicro`
tie-breaker, `Long.MinValue` / `Long.MaxValue` boundary, and
pre-epoch (negative `epochMicros`). Each case verifies both
`InterpretedOrdering` and `LazilyGeneratedOrdering` agree on
ASC and DESC, and that `compare(a, a) == 0`.

```
build/mvn test -pl sql/catalyst \
  -DwildcardSuites=org.apache.spark.sql.catalyst.expressions.OrderingSuite
```
Tests: 66/66 passing (10 new).

Also ran `DataTypeSuite` and the catalyst-side
`TimestampNanos*Suite` set: 344/344 passing.

Not adding nanos types to `DataTypeTestUtils.atomicTypes`
yet because the generic `GenerateOrdering with $dataType`
test there uses `RandomDataGenerator`, which does not yet
support nanos types (tracked in SPARK-57034).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)
@stevomitric
Copy link
Copy Markdown
Contributor Author

@MaxGekk please take a look at this PR.

case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? -1 : 0)"
case BinaryType => s"org.apache.spark.unsafe.types.ByteArray.compareBinary($c1, $c2)"
case CalendarIntervalType => s"$c1.compareTo($c2)"
case _: TimestampNTZNanosType | _: TimestampLTZNanosType => s"$c1.compareTo($c2)"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider restoring a short comment here explaining why this arm is needed. It's not self-evident: TimestampNTZNanosType/TimestampLTZNanosType are AtomicTypes, so without this case they'd fall through to the case other if other.isInstanceOf[AtomicType] => s"$c1.compare($c2)" fallback a few lines below — which emits .compare(...), a method TimestampNanosVal does not have (it's Comparable, providing compareTo, not Ordered). A future maintainer could reasonably delete this arm as "redundant" and silently break codegen compilation for these types. Something like:

// AtomicType, but the generic AtomicType fallback below emits `.compare`, which
// TimestampNanosVal (Comparable, not Ordered) does not have; emit `.compareTo`.
case _: TimestampNTZNanosType | _: TimestampLTZNanosType => s"$c1.compareTo($c2)"

}
}

Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9)).foreach { dt =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chosen edge cases are good. Two cheap additions would round out coverage:

  • A null case. The BoundReference is nullable = true, but no null values are ever passed, so null-vs-value ordering for nanos columns isn't exercised. It's generic (handled in BaseOrdering), so low risk, but nearly free to cover.
  • A non-9 precision case (e.g. 7 or 8). Ordering is precision-independent here, so one case would document that intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants