Skip to content

[WIP][SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types#56198

Open
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-column-vector
Open

[WIP][SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types#56198
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-column-vector

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented May 29, 2026

What changes were proposed in this pull request?

Implement columnar storage support for TimestampNTZNanosType and TimestampLTZNanosType across the column-vector stack. The layout mirrors CalendarInterval: each column gets two child vectors — a Long child for epochMicros and a Short child for nanosWithinMicro (range [0, 999]).

Concretely:

  • ColumnVectorgetTimestampNTZNanos / getTimestampLTZNanos now read from child vectors instead of throwing SparkUnsupportedOperationException.
  • WritableColumnVector — allocates the two child columns in the constructor; adds putTimestampNTZNanos / putTimestampLTZNanos write methods.
  • ConstantColumnVector — same child-column allocation; adds setTimestampNanosVal for the constant-value (partition-column) path.
  • RowToColumnConverter (Columnar.scala) — adds TimestampNTZNanosConverter / TimestampLTZNanosConverter objects (append epochMicros + nanosWithinMicro to children via appendStruct); routes nullable columns through StructNullableTypeConverter.
  • ColumnVectorUtils — handles both types in populate (constant-column path) and in appendValue (null and non-null branches).

Why are the changes needed?

SPARK-56981 added row-level physical representation for nanosecond timestamps, but columnar execution could not hold or move these values — any attempt to build a ColumnarBatch from rows containing nanosecond timestamps threw an unsupported-operation exception. This PR closes that gap.

Does this PR introduce any user-facing change?

Yes. ColumnarBatch can now be built from InternalRows containing TimestampNTZNanosType / TimestampLTZNanosType values. Previously this threw SparkUnsupportedOperationException.

How was this patch tested?

Added four unit tests to RowToColumnConverterSuite:

  • TimestampNTZNanosType column roundtrip — non-null values survive the row→column→read cycle.
  • TimestampNTZNanosType column with nulls — null slots are preserved correctly.
  • TimestampLTZNanosType column roundtrip — same for the LTZ variant.
  • TimestampLTZNanosType column with nulls — same for the LTZ variant.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6 (claude.ai/code)

… timestamp types

Implement read/write/append support for TimestampNTZNanosType and
TimestampLTZNanosType in column vectors, following the CalendarInterval
two-child-vector pattern (Long for epochMicros, Short for nanosWithinMicro).

Co-authored-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk changed the title [SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types [WIP][SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types May 29, 2026
MaxGekk added 3 commits May 29, 2026 11:27
…-vector support

Four issues found in code review:

1. appendStruct(true) null-propagation: extend the StructType|VariantType guard
   in WritableColumnVector to also recurse for CalendarIntervalType,
   TimestampNTZNanosType, and TimestampLTZNanosType children, so that a
   nullable struct field of these types correctly propagates nulls into
   their own child sub-columns, preventing index divergence.

2. MutableColumnarRow: add copy(), get(), and update() branches for
   TimestampNTZNanosType and TimestampLTZNanosType, plus setTimestampNTZNanos
   and setTimestampLTZNanos setters.

3. ColumnVector Javadoc: fix "int vector" -> "short vector" for child 1
   of the nanosecond timestamp layout.

4. Test coverage: add testVectors (OnHeap + OffHeap) for both nanos types
   to ColumnVectorSuite; add populate tests to ColumnVectorUtilsSuite;
   add nanos columns to the ColumnarBatchSuite RowToColumnConverter
   end-to-end test.

Co-authored-by: Max Gekk <max.gekk@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant