How do you know your data is correct?
TL;DR
I built a separate validation pipeline that hits the same API as my main ingestion pipeline, but instead of inserting data, it only computes metrics like count(transaction_id)
, sum(payout)
, and sum(revenue)
per hour (based on event datetime). It compares these to what’s in the data warehouse and stores the deltas in a validation table.