Describe the bug
When a query_df is called with query_tz= "UTC" or query_tz=pytz.UTC the result Timestamp columns are timezone naive weither they have DateTime64 or DateTime columns as origin.
After analysing the code, it turns out that this seems intentional as inside active_tz the returned timezone is explicitly None when active_tz = pytz.UTC. I'm not sure why, and if there is a good reason it should be added in the documentation.
https://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/driver/query.py#L171
def active_tz(self, datatype_tz: Optional[tzinfo]):
if self.column_tz:
active_tz = self.column_tz
elif datatype_tz:
active_tz = datatype_tz
elif self.query_tz:
active_tz = self.query_tz
elif self.response_tz:
active_tz = self.response_tz
elif self.apply_server_tz:
active_tz = self.server_tz
else:
active_tz = tzutil.local_tz
if active_tz == pytz.UTC:
return None
return active_tz
But even if this method is fixed and does return UTC when query_tz set to UTC (and this solves the issue for DateTime) the DateTime64._read_column_binary method explicitly ignores setting the timezone if active_tz = pytz.UTC
https://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/datatypes/temporal.py#L246
def _read_column_binary(self, source: ByteSource, num_rows: int, ctx: QueryContext, _read_state: Any) -> Sequence:
if self.read_format(ctx) == 'int':
return source.read_array('q', num_rows)
active_tz = ctx.active_tz(self.tzinfo)
if ctx.use_numpy:
np_array = numpy_conv.read_numpy_array(source, self.np_type, num_rows)
if ctx.as_pandas and active_tz and active_tz != pytz.UTC:
return pd.DatetimeIndex(np_array, tz='UTC').tz_convert(active_tz)
return np_array
column = source.read_array('q', num_rows)
if active_tz and active_tz != pytz.UTC:
return self._read_binary_tz(column, active_tz)
return self._read_binary_naive(column)
Steps to reproduce
- See test in Code example
Expected behaviour
When query_tz is set to UTC, the response's date time columns should be localised to UTC.
Code example
This test can executed by being appended to tests/test_pandas.py file.
@pytest.mark.parametrize('tz_to_test', [
None, # No timezone - should return naive datetime => succeeds
'Europe/Madrid', # Timezone - should return timezone aware datetime => succeeds
'UTC', # Timezone - should return timezone aware datetime => fails
])
def test_pandas_timezone(test_client: Client, table_context: Callable, tz_to_test):
if not test_client.min_version('20'):
pytest.skip(f'DateTime64 not supported in this server version {test_client.server_version}')
nano_timestamp = pd.Timestamp(1992, 11, 6, 12, 50, 40, 7420, 44)
with table_context('test_pandas_timezone', ['key String',
"millis DateTime",
"nanos DateTime64(9)"]):
df = pd.DataFrame([['key1', nano_timestamp, nano_timestamp]], columns=['key', 'millis', 'nanos'])
test_client.insert_df('test_pandas_timezone', df)
result_df = test_client.query_df('SELECT * FROM test_pandas_timezone', query_tz=tz_to_test)
if tz_to_test is None:
assert result_df.iloc[0]['nanos'].tzinfo is None
assert result_df.iloc[0]['millis'].tzinfo is None
else:
assert result_df.iloc[0]['millis'].tzinfo.zone == tz_to_test
assert result_df.iloc[0]['nanos'].tzinfo.zone == tz_to_test
This test fails for the UTC case with error log:
tests/integration_tests/test_pandas.py:202 (test_pandas_timezone[UTC])
test_client = <clickhouse_connect.driver.httpclient.HttpClient object at 0x7ebedc8f3e00>
table_context = <function table_context_fixture.<locals>.context at 0x7ebedc806f20>
tz_to_test = 'UTC'
@pytest.mark.parametrize('tz_to_test', [
None, # No timezone - should return naive datetime => succeeds
'Europe/Madrid', # Timezone - should return timezone aware datetime => succeeds
'UTC', # Timezone - should return timezone aware datetime => fails
])
def test_pandas_timezone(test_client: Client, table_context: Callable, tz_to_test):
if not test_client.min_version('20'):
pytest.skip(f'DateTime64 not supported in this server version {test_client.server_version}')
nano_timestamp = pd.Timestamp(1992, 11, 6, 12, 50, 40, 7420, 44)
with table_context('test_pandas_timezone', ['key String',
"millis DateTime",
"nanos DateTime64(9)"]):
df = pd.DataFrame([['key1', nano_timestamp, nano_timestamp]], columns=['key', 'millis', 'nanos'])
test_client.insert_df('test_pandas_timezone', df)
result_df = test_client.query_df('SELECT * FROM test_pandas_timezone', query_tz=tz_to_test)
if tz_to_test is None:
assert result_df.iloc[0]['nanos'].tzinfo is None
assert result_df.iloc[0]['millis'].tzinfo is None
else:
assert result_df.iloc[0]['millis'].tzinfo.zone == tz_to_test
> assert result_df.iloc[0]['nanos'].tzinfo.zone == tz_to_test
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E AttributeError: 'NoneType' object has no attribute 'zone'
tests/integration_tests/test_pandas.py:224: AttributeError
Proposed solution
Modify active_tz function to return the query_tz if explicitly set by the caller as it should logically prime over other tzs.
Remove the and active_tz != pytz.UTC condition from the DateTime64._read_column_binary method.
After doing the tests on my side this solution solves this problem and does not break any other tests, if this is a genuine issue and not just an undocumented edge case for some resons that escapes me I can do a PR.
- clickhouse-connect version: 0.9.2
- Python version: 3.13
- Operating system: Ubuntu 22.04.5 LTS
Describe the bug
When a
query_dfis called withquery_tz= "UTC"orquery_tz=pytz.UTCthe result Timestamp columns are timezone naive weither they haveDateTime64orDateTimecolumns as origin.After analysing the code, it turns out that this seems intentional as inside
active_tzthe returned timezone is explicitlyNonewhenactive_tz = pytz.UTC. I'm not sure why, and if there is a good reason it should be added in the documentation.https://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/driver/query.py#L171
But even if this method is fixed and does return
UTCwhenquery_tzset toUTC(and this solves the issue forDateTime) theDateTime64._read_column_binarymethod explicitly ignores setting the timezone ifactive_tz = pytz.UTChttps://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/datatypes/temporal.py#L246
Steps to reproduce
Expected behaviour
When
query_tzis set to UTC, the response's date time columns should be localised to UTC.Code example
This test can executed by being appended to
tests/test_pandas.pyfile.This test fails for the UTC case with error log:
Proposed solution
Modify
active_tzfunction to return thequery_tzif explicitly set by the caller as it should logically prime over other tzs.Remove the
and active_tz != pytz.UTCcondition from theDateTime64._read_column_binarymethod.After doing the tests on my side this solution solves this problem and does not break any other tests, if this is a genuine issue and not just an undocumented edge case for some resons that escapes me I can do a PR.