Safe query costs with BigQuery
Processing data in BigQuery offers immense power and scalability, but it can also lead to unexpected costs if not managed properly. This recent story from the HTTP Archive community and other cost issues are often shared publicly prove that raising awareness is not enough.
A Costly Human Error
Over the years, I’ve observed numerous instances where a single or looped queries result in abysmal charges of $1,000 or even $10,000+. In one extreme case, I witnessed a bill increase by $163K due to a single costly error. This isn’t just a problem for BigQuery data analysts; engineers working with datasets can also face significant financial consequences due to code bugs. Extensive experience alone doesn’t seem to protect from making a costly mistake.
I’ve also had unpleasant tasks leading investigation and implementing safety measures after substantial cost spikes. Despite being aware of storage and SQL optimizations, human error is inevitable.
This highlights the necessity of implementing cost safety measures regardless of experience level. Over the years, I’ve seen these few simple preventive steps save us from subsequent billing issues. Implementing these steps in advance can protect you from unexpected costs. And it’s the only reliable way regardless of what interfaces are used to access BigQuery.
Steps to Ensure Cost Safety in BigQuery
1. Set Query Usage Quotas
Prevent excessive query usage by limiting the amount of data that can be processed in a day. Whenever a subsequent query exceeds the quota, it will fail. You can adjust query usage quota values at the project or user level.
Step-by-Step Guide
- In the Google Cloud Console go to API & Services > BigQuery API > Quotas & System Limits.
- Select the quota you want to adjust.
- Click “Edit Quotas” and enter your desired values.
- Submit your changes.
- Return to exploring petabytes of data with peace of mind.
Bring your quota down to 1TB per user per day:
gcloud alpha services quota update --consumer=projects/PROJECT_ID --service=bigquery.googleapis.com --metric=bigquery.googleapis.com/quota/query/usage --unit=1/d/{project}/{user} --value=1048576
What if you’re past 1Tb quota? Check your usage statistics in Log Analytics.
When you configure the Maximum bytes billed parameter in the query settings of the BigQuery Console UI, it mentions a project default setting. However, there is no direct way to set a per-query limit at the project level. The closest alternative is to use API daily quotas to control the maximum bytes billed.
2. Set Up Budget Alerts
Stay aware of your spending trends and get notified before costs spiral out of control. Even though you now have a cap on your daily spend, you may expect that you average daily spend is much lower than that throughout the month.
Step-by-Step Guide
gcloud config set project PROJECT_ID
gcloud billing budgets create --billing-account=BILLING_ACCOUNT --display-name="BigQuery Budget" --filter-services=services/24E6-581D-38E5 --last-period-amount --threshold-rule=percent=0.5,basis=forecasted-spend --threshold-rule=percent=0.9,basis=forecasted-spend --threshold-rule=percent=1,basis=forecasted-spend
Don’t wait till it’s your time to worry about a big bill
No amount of experience safeguards against human error. Implementing usage quotas and budget alerts in BigQuery is essential to prevent costly mistakes. These steps provide a safety net, ensuring that both small and large-scale operations can manage their data processing costs effectively.
For small businesses these measures can help avoid unexpected costs during data processing spikes. By setting a quota, you ensure that no single query or user can bankrupt your budget.
Large enterprises can manage and monitor querying quotas and budgets at the department level. Monitoring spending helps them stay within allocated budgets and avoid financial investigations.
By following these steps, you can avoid the need to seek Google’s forgiveness for an unexpected bill and maintain strict control over your data processing costs.