This directory contains a python script that can be used to import data into Cloud Spanner. The script reads a gzipped csv file from a Google Cloud Storage bucket and a local schema file, and then inserts the data into a specified Spanner table in batches.
Follow the steps on the Spanner Quickstart to create your spanner instance, database and table.
Use the sample.schema to define the schema for the table that you are going to load. Use a colon ( : ) to specify the data type for each of your columns and a comma ( , ) to separate each of the columns on your table.
For example for a table with two STRING columns, named one and two, this would be the corresponding schema.
one:STRING,two:STRING
Note: This step is not required in the event that you have configured appropriate account and project configuration using the gcloud SDK, or are running the tool from a GCE instance within the target project with a service account that has appropriate permissions for the Spanner instance being targeted. In these cases, the tool will pick-up the configuration from the environment automatically.
Optionally, create a service account to be used by the spanner client library for authentication against your Spanner instance.
Follow the steps described in Creating a Service Account to create a Service Account for this purpose. Once you have created your service account follow the steps described in Creating a Service Account Key to create a key for the service account you just created and finally follow these steps to grant permissions to the service account.
Make sure to use a role with read and write access to Spanner, like Cloud Spanner Database User for example. You can have more information on the Cloud Spanner Roles here.
Note: this tool requires Python 3
Install the requirements for the python script by executing the following command:
pip3 install -r requirements.txt
Execute the spanner-loader python script with the required arguments.
python spanner-loader.py --instance_id=[Your Cloud Spanner instance ID] --database_id=[Your Cloud Spanner database ID] --table_id=[Your table name] --batchsize=[The number of rows to insert in a batch] --bucket_name=[The name of the bucket for the source file] --file_name=[The csv input data file] --schema_file=[The format file describing the input data file]
Optional parameters:
--delimiter=[The delimiter used between columns in source file]
--project_id=[Your Google Cloud Project id]
--path_to_credentials=[Path to the json file with the credentials]