BigQuery to Cloud Storage extract using DataFlow

BigQuery provides an excellent CLI — “bq extract” and an API as well as Client Library in many languages for achieving the same. In exporting table data documentation it says — You can use a service such as Dataflow to read data from BigQuery instead of manually exporting it. This post looks at the Dataflow way to extract data out of BigQuery. This is useful in situations where “bq extract” doesn’t meet requirements and you really need a programmatic way to extract and manipulate into files.

As an example, I have used the Chicago Taxi Trips public dataset and the following query:

SELECT trip_start_timestamp, trip_end_timestamp, trip_miles, fare
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
LIMIT 100

We now read the query results with:

bq_data now contains key-value pairs:

We can apply a beam.Map function to yield only values:

Excerpt of bq_values:

And finally, map again to have all column values separated by commas instead of a list (take into account that you would need to escape double quotes if they can appear within a field):

Now we write the results to GCS with the suffix and headers:

Written results:

--

--

Google Cloud enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store