Reports scheduler advances checkpoint before enqueue completes, causing missed triggers on failure

I think there’s a subtle but serious issue in the reports scheduler around how the checkpoint (`REPORT_SCHEDULER_LAST_CHECK_CACHE_KEY`) is handled.

Right now in `check_and_enqueue`, the scheduler updates the checkpoint to `now` *before* it actually enqueues any report trigger messages. This effectively marks the entire time window as “processed” even though the enqueue step hasn’t completed yet.

That becomes a problem in failure scenarios:

* If the process crashes (or restarts) after the checkpoint is written but before the enqueue loop finishes, the next run will start from the new checkpoint and skip all the pending hour buckets in that window.
* If the message queue is unavailable (e.g. RabbitMQ down), `push_to_reports_queue` can fail for all reports, but the function still returns `Ok(())` and the checkpoint has already moved forward — so those triggers are never retried.
* Errors from enqueue are only logged, and the result of `cache.insert(...)` is ignored (`let _ =`), so there’s no strong signal that anything went wrong.

In practice this can lead to silently missing scheduled reports (daily/weekly), which is pretty hard to detect unless someone notices the absence of emails.

**Repro idea:**

* Let the scheduler run with at least one report due
* Kill the process right after the checkpoint is written (before enqueue completes), or simulate MQ failure
* Restart and observe that the missed time buckets are not retried

**Root cause (as I understand it):**
The checkpoint is being treated as “we attempted this window” instead of “we successfully handed off the work”.

**Possible fix directions:**

* Move the checkpoint update to *after* successful enqueue
* Or advance it incrementally (per hour bucket / per successful enqueue)
* Avoid ignoring errors from `cache.insert`
* Optionally add retry/backoff for queue failures

Happy to take a shot at a fix if this direction makes sense 👍


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reports scheduler advances checkpoint before enqueue completes, causing missed triggers on failure #1586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reports scheduler advances checkpoint before enqueue completes, causing missed triggers on failure #1586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions