[WIP][SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types#56198
Open
MaxGekk wants to merge 4 commits into
Open
[WIP][SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types#56198MaxGekk wants to merge 4 commits into
MaxGekk wants to merge 4 commits into
Conversation
… timestamp types Implement read/write/append support for TimestampNTZNanosType and TimestampLTZNanosType in column vectors, following the CalendarInterval two-child-vector pattern (Long for epochMicros, Short for nanosWithinMicro). Co-authored-by: Max Gekk <max.gekk@gmail.com>
…-vector support Four issues found in code review: 1. appendStruct(true) null-propagation: extend the StructType|VariantType guard in WritableColumnVector to also recurse for CalendarIntervalType, TimestampNTZNanosType, and TimestampLTZNanosType children, so that a nullable struct field of these types correctly propagates nulls into their own child sub-columns, preventing index divergence. 2. MutableColumnarRow: add copy(), get(), and update() branches for TimestampNTZNanosType and TimestampLTZNanosType, plus setTimestampNTZNanos and setTimestampLTZNanos setters. 3. ColumnVector Javadoc: fix "int vector" -> "short vector" for child 1 of the nanosecond timestamp layout. 4. Test coverage: add testVectors (OnHeap + OffHeap) for both nanos types to ColumnVectorSuite; add populate tests to ColumnVectorUtilsSuite; add nanos columns to the ColumnarBatchSuite RowToColumnConverter end-to-end test. Co-authored-by: Max Gekk <max.gekk@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Implement columnar storage support for
TimestampNTZNanosTypeandTimestampLTZNanosTypeacross the column-vector stack. The layout mirrorsCalendarInterval: each column gets two child vectors — aLongchild forepochMicrosand aShortchild fornanosWithinMicro(range [0, 999]).Concretely:
ColumnVector—getTimestampNTZNanos/getTimestampLTZNanosnow read from child vectors instead of throwingSparkUnsupportedOperationException.WritableColumnVector— allocates the two child columns in the constructor; addsputTimestampNTZNanos/putTimestampLTZNanoswrite methods.ConstantColumnVector— same child-column allocation; addssetTimestampNanosValfor the constant-value (partition-column) path.RowToColumnConverter(Columnar.scala) — addsTimestampNTZNanosConverter/TimestampLTZNanosConverterobjects (appendepochMicros+nanosWithinMicroto children viaappendStruct); routes nullable columns throughStructNullableTypeConverter.ColumnVectorUtils— handles both types inpopulate(constant-column path) and inappendValue(null and non-null branches).Why are the changes needed?
SPARK-56981 added row-level physical representation for nanosecond timestamps, but columnar execution could not hold or move these values — any attempt to build a
ColumnarBatchfrom rows containing nanosecond timestamps threw an unsupported-operation exception. This PR closes that gap.Does this PR introduce any user-facing change?
Yes.
ColumnarBatchcan now be built fromInternalRows containingTimestampNTZNanosType/TimestampLTZNanosTypevalues. Previously this threwSparkUnsupportedOperationException.How was this patch tested?
Added four unit tests to
RowToColumnConverterSuite:TimestampNTZNanosType column roundtrip— non-null values survive the row→column→read cycle.TimestampNTZNanosType column with nulls— null slots are preserved correctly.TimestampLTZNanosType column roundtrip— same for the LTZ variant.TimestampLTZNanosType column with nulls— same for the LTZ variant.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6 (claude.ai/code)