Skip to content

Reports scheduler advances checkpoint before enqueue completes, causing missed triggers on failure #1586

@h30s

Description

@h30s

I think there’s a subtle but serious issue in the reports scheduler around how the checkpoint (REPORT_SCHEDULER_LAST_CHECK_CACHE_KEY) is handled.

Right now in check_and_enqueue, the scheduler updates the checkpoint to now before it actually enqueues any report trigger messages. This effectively marks the entire time window as “processed” even though the enqueue step hasn’t completed yet.

That becomes a problem in failure scenarios:

  • If the process crashes (or restarts) after the checkpoint is written but before the enqueue loop finishes, the next run will start from the new checkpoint and skip all the pending hour buckets in that window.
  • If the message queue is unavailable (e.g. RabbitMQ down), push_to_reports_queue can fail for all reports, but the function still returns Ok(()) and the checkpoint has already moved forward — so those triggers are never retried.
  • Errors from enqueue are only logged, and the result of cache.insert(...) is ignored (let _ =), so there’s no strong signal that anything went wrong.

In practice this can lead to silently missing scheduled reports (daily/weekly), which is pretty hard to detect unless someone notices the absence of emails.

Repro idea:

  • Let the scheduler run with at least one report due
  • Kill the process right after the checkpoint is written (before enqueue completes), or simulate MQ failure
  • Restart and observe that the missed time buckets are not retried

Root cause (as I understand it):
The checkpoint is being treated as “we attempted this window” instead of “we successfully handed off the work”.

Possible fix directions:

  • Move the checkpoint update to after successful enqueue
  • Or advance it incrementally (per hour bucket / per successful enqueue)
  • Avoid ignoring errors from cache.insert
  • Optionally add retry/backoff for queue failures

Happy to take a shot at a fix if this direction makes sense 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions