Automate SQL to BigQuery Data Transfer with Python


sql server to big query data transfer

Moving data from SQL Server to BigQuery manually is a waste of time.

Instead of exporting and importing data, this Python script automates the process. With just a few modifications, it retrieves data from SQL Server and pushes it to BigQuery automatically.

This script is designed to run in Colab Enterprise Notebook on BigQuery Studio, ensuring seamless execution and automation.

Step 1: Set Up the Colab Enterprise Notebook in BigQuery Studio

To run this script, open BigQuery Studio and create a Colab Enterprise Notebook. This notebook will allow you to execute Python code directly within Google Cloud.

Once the notebook is open, install the required dependencies by running:

pip install pyodbc pandas google-cloud-bigquery pandas-gbq


These libraries are required for:

  • ✔ Connecting to SQL Server (pyodbc)
  • ✔ Processing data with Pandas (pandas)
  • ✔ Pushing data to BigQuery (pandas-gbq)


Step 2: Python Script for SQL Server to BigQuery Transfer


import pyodbc
import pandas as pd
from google.cloud import bigquery

# Set up SQL Server connection
def get_sql_data(query):
    server = "your_sql_server"
    database = "your_database_name"
    username = "your_username"
    password = "your_password"  # Use env variables or Secret Manager in production

    conn_str = f"DRIVER={{ODBC Driver 13 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"
    conn = pyodbc.connect(conn_str)
    
    df = pd.read_sql(query, conn)
    conn.close()
    
    return df

# Run query, fetch data, and push to BigQuery
query = "SELECT * FROM your_table"

df = get_sql_data(query)
df.to_gbq("your_gcp_project_id.your_dataset_name.your_table_name", project_id="your_project_id", if_exists="replace")
  


Step 3: How the Script Works

1. Connecting to SQL Server and Fetching Data

The function get_sql_data(query) connects to SQL Server using pyodbc and executes the provided SQL query. The results are returned as a Pandas DataFrame, making it easy to manipulate and upload to BigQuery.

What to modify:

  • "your_sql_server" → Replace with your actual SQL Server hostname or IP address.
  • "your_database_name" → Set this to your database name.
  • "your_username" & "your_password" → Update with your credentials.


2. Running the SQL Query

The script currently fetches all data:

SELECT * FROM your_table

To limit the data, modify the query with a WHERE clause, like:

SELECT * FROM your_table WHERE created_at >= '2024-01-01'


3. Writing Data to BigQuery

df.to_gbq("your_gcp_project_id.your_dataset_name.your_table_name", project_id="your_project_id", if_exists="replace")


Step 4: Running and Automating the Script in BigQuery Studio

After setting up the Colab Enterprise Notebook in BigQuery Studio, run the script inside the notebook. This ensures that your data pipeline executes directly in Google Cloud without any manual exports or imports.


Final Thoughts – Automate SQL Server to BigQuery with Ease

This Python script automates the entire data transfer process between SQL Server and BigQuery. With Colab Enterprise Notebook in BigQuery Studio, you can run it seamlessly without worrying about manual exports.

Modify the connection details, query, and BigQuery table name, and your pipeline is ready to go.

If you found this helpful, please share and comment below! 🚀


Share:

No comments:

Post a Comment

We'd like to hear your comments!

Recent Posts