Skip to content

Timezone naive datetimes when query_tz == pytz.UTC #566

@prokofyevDmitry

Description

@prokofyevDmitry

Describe the bug

When a query_df is called with query_tz= "UTC" or query_tz=pytz.UTC the result Timestamp columns are timezone naive weither they have DateTime64 or DateTime columns as origin.

After analysing the code, it turns out that this seems intentional as inside active_tz the returned timezone is explicitly None when active_tz = pytz.UTC. I'm not sure why, and if there is a good reason it should be added in the documentation.

https://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/driver/query.py#L171

    def active_tz(self, datatype_tz: Optional[tzinfo]):
        if self.column_tz:
            active_tz = self.column_tz
        elif datatype_tz:
            active_tz = datatype_tz
        elif self.query_tz:
            active_tz = self.query_tz
        elif self.response_tz:
            active_tz = self.response_tz
        elif self.apply_server_tz:
            active_tz = self.server_tz
        else:
            active_tz = tzutil.local_tz
        if active_tz == pytz.UTC:
            return None
        return active_tz

But even if this method is fixed and does return UTC when query_tz set to UTC (and this solves the issue for DateTime) the DateTime64._read_column_binary method explicitly ignores setting the timezone if active_tz = pytz.UTC
https://github.com/ClickHouse/clickhouse-connect/blob/main/clickhouse_connect/datatypes/temporal.py#L246

    def _read_column_binary(self, source: ByteSource, num_rows: int, ctx: QueryContext, _read_state: Any) -> Sequence:
        if self.read_format(ctx) == 'int':
            return source.read_array('q', num_rows)
        active_tz = ctx.active_tz(self.tzinfo)
        if ctx.use_numpy:
            np_array = numpy_conv.read_numpy_array(source, self.np_type, num_rows)
            if ctx.as_pandas and active_tz and active_tz != pytz.UTC:
                return pd.DatetimeIndex(np_array, tz='UTC').tz_convert(active_tz)
            return np_array
        column = source.read_array('q', num_rows)
        if active_tz and active_tz != pytz.UTC:
            return self._read_binary_tz(column, active_tz)
        return self._read_binary_naive(column)

Steps to reproduce

  1. See test in Code example

Expected behaviour

When query_tz is set to UTC, the response's date time columns should be localised to UTC.

Code example

This test can executed by being appended to tests/test_pandas.py file.

@pytest.mark.parametrize('tz_to_test', [
        None,  # No timezone - should return naive datetime => succeeds
        'Europe/Madrid', # Timezone - should return timezone aware datetime => succeeds
        'UTC', # Timezone - should return timezone aware datetime => fails
    ])
def test_pandas_timezone(test_client: Client, table_context: Callable, tz_to_test):
    if not test_client.min_version('20'):
        pytest.skip(f'DateTime64 not supported in this server version {test_client.server_version}')

    nano_timestamp = pd.Timestamp(1992, 11, 6, 12, 50, 40, 7420, 44)
    with table_context('test_pandas_timezone', ['key String',
                                            "millis DateTime",
                                            "nanos DateTime64(9)"]):
        df = pd.DataFrame([['key1', nano_timestamp, nano_timestamp]], columns=['key', 'millis', 'nanos'])
        test_client.insert_df('test_pandas_timezone', df)
        result_df = test_client.query_df('SELECT * FROM test_pandas_timezone', query_tz=tz_to_test)
        if tz_to_test is None:
            assert result_df.iloc[0]['nanos'].tzinfo is None
            assert result_df.iloc[0]['millis'].tzinfo is None
        else:
            assert result_df.iloc[0]['millis'].tzinfo.zone == tz_to_test
            assert result_df.iloc[0]['nanos'].tzinfo.zone == tz_to_test

This test fails for the UTC case with error log:

tests/integration_tests/test_pandas.py:202 (test_pandas_timezone[UTC])
test_client = <clickhouse_connect.driver.httpclient.HttpClient object at 0x7ebedc8f3e00>
table_context = <function table_context_fixture.<locals>.context at 0x7ebedc806f20>
tz_to_test = 'UTC'

    @pytest.mark.parametrize('tz_to_test', [
        None,  # No timezone - should return naive datetime => succeeds
        'Europe/Madrid', # Timezone - should return timezone aware datetime => succeeds
        'UTC', # Timezone - should return timezone aware datetime => fails
    ])
    def test_pandas_timezone(test_client: Client, table_context: Callable, tz_to_test):
        if not test_client.min_version('20'):
            pytest.skip(f'DateTime64 not supported in this server version {test_client.server_version}')
    
        nano_timestamp = pd.Timestamp(1992, 11, 6, 12, 50, 40, 7420, 44)
        with table_context('test_pandas_timezone', ['key String',
                                                "millis DateTime",
                                                "nanos DateTime64(9)"]):
            df = pd.DataFrame([['key1', nano_timestamp, nano_timestamp]], columns=['key', 'millis', 'nanos'])
            test_client.insert_df('test_pandas_timezone', df)
            result_df = test_client.query_df('SELECT * FROM test_pandas_timezone', query_tz=tz_to_test)
            if tz_to_test is None:
                assert result_df.iloc[0]['nanos'].tzinfo is None
                assert result_df.iloc[0]['millis'].tzinfo is None
            else:
                assert result_df.iloc[0]['millis'].tzinfo.zone == tz_to_test
>               assert result_df.iloc[0]['nanos'].tzinfo.zone == tz_to_test
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E               AttributeError: 'NoneType' object has no attribute 'zone'

tests/integration_tests/test_pandas.py:224: AttributeError

Proposed solution

Modify active_tz function to return the query_tz if explicitly set by the caller as it should logically prime over other tzs.
Remove the and active_tz != pytz.UTC condition from the DateTime64._read_column_binary method.
After doing the tests on my side this solution solves this problem and does not break any other tests, if this is a genuine issue and not just an undocumented edge case for some resons that escapes me I can do a PR.

  • clickhouse-connect version: 0.9.2
  • Python version: 3.13
  • Operating system: Ubuntu 22.04.5 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions