BigQuery cost optimization

Suds Kumar
4 min readJan 23, 2021

BigQuery operations that are free of charge in any location:

  • Batch loading data
  • Automatic re-clustering
  • Exporting data
  • Deleting table, views, partitions, functions and datasets
  • Metadata operations
  • Cached queries
  • Queries that result in an error
  • Storage for the first 10 GB of data per month
  • Query data processed for the first 1 TB of data per month

For any location, the BigQuery pricing is broken down like this:

1 Storage

  • Active storage
  • Long-term storage
  • Streaming inserts

2 Query processing

  • On-demand
  • Flat-rate

Storage

  1. Keep your data only as long as you need it.
    Configure default table expiration on your dataset. Expiration can be set at the table and partition level as well.
  2. Be wary of how you edit your data.
    If your table or partition of a table has not been edited for 90 days, the price of the data stored in the table automatically drops by about 50%.
  3. Avoid duplicate copies of data.
    BigQuery uses a federated data access model that allows you to query data directly from external data sources like Cloud Bigtable, Cloud Storage, Google Drive and Cloud SQL.
  4. See whether you’re using the streaming insert to load your data.
    Use streaming inserts only if the data in BigQuery is consumed immediately by downstream consumers.
  5. Understand BigQuery’s backup and DR processes.
    BigQuery maintains a seven-day history of changes to your table, which allows you to query a point-in-time snapshot of your data. This means you can revert back the data without restoring from recovery backups. If the table is deleted, its history is flushed after two days.

Query processing

  1. Only query the data you need.
    SELECT * is the most expensive way to query data. Applying the LIMIT clause to your query doesn’t have an effect on cost. If you do need to explore the data and understand its semantics, you can always use the no-charge data preview option. Filter your query as early and as often as you can to reduce cost and improve performance in BigQuery. Before running queries, preview them to estimate costs, using query validator or dry run options.
  2. Set up controls for accidental human errors
    Use the maximum bytes billed setting to limit query cost. Along with enabling cost control on a query level, you can apply similar logic to the user level and project level as well.
  3. View costs using a dashboard and query your audit logs
    Create a dashboard to view your billing data so you can make adjustments to your BigQuery usage. Also, consider streaming your audit logs to BigQuery so you can analyze usage patterns.
  4. Use caching intelligently.
    Caching can actually boost your query performance, and you won’t be charged for the results retrieved from the cached tables. By default, cache preference is turned on. Caching is per user, per project. To significantly increase the cache hit across multiple users, use a single service account to query BigQuery
  5. Partition your tables.
    Partitioning your tables, whenever possible, can help reduce the cost of processing queries as well as improve performance. While creating or updating partitioned table, you can enable “Require partition filter” which will force users to include a WHERE clause that specifies the partition column, or else the query will result in error.
  6. Reduce sweeping your data using clustering.
    After partitioning, you can now cluster your table, which organizes your data based on the content for up to four columns. BigQuery then sorts the data based on the order of columns specified and organizes them into a block. When you use query filters using these columns, BigQuery intelligently only scans the relevant blocks using a process referred to as block pruning.
  7. Materialize query results in stages.
    If possible, materialize your query results in stages. If you create a large, multi-stage query, each time you run it, BigQuery reads all the data that is required by the query. You are billed for all the data that is read each time the query is run. Instead, break your query into stages where each stage materializes the query results by writing them to a destination table.

Understanding flat-rate vs. on-demand pricing

Once your BigQuery monthly bill hits north of $10,000, check your BigQuery cost for processing queries to see if flat-rate pricing is more cost-effective. Flat-rate allows you to have a stable monthly cost for unlimited data processed by queries rather than paying the variable on-demand rate based on bytes processed. If your queries exceed flat-rate capacity, BigQuery will run proportionally more slowly until the slots become available. Buying too few slots can impact performance, while buying too many slots will introduce idle processing capacity, resulting in cost implications. You can always use a hybrid approach of on-demand rate and flat-rate pricing in your GCP Organization to maximize your overall savings.

--

--