Instead of exporting and importing data, this Python script automates the process. With just a few modifications, it retrieves data from SQL Server and pushes it to BigQuery automatically.
This script is designed to run in Colab Enterprise Notebook on BigQuery Studio, ensuring seamless execution and automation.
Step 1: Set Up the Colab Enterprise Notebook in BigQuery Studio
To run this script, open BigQuery Studio and create a Colab Enterprise Notebook. This notebook will allow you to execute Python code directly within Google Cloud.
Once the notebook is open, install the required dependencies by running:
pip install pyodbc pandas google-cloud-bigquery pandas-gbq
These libraries are required for:
- ✔ Connecting to SQL Server (pyodbc)
- ✔ Processing data with Pandas (pandas)
- ✔ Pushing data to BigQuery (pandas-gbq)
Step 2: Python Script for SQL Server to BigQuery Transfer
import pyodbc
import pandas as pd
from google.cloud import bigquery
# Set up SQL Server connection
def get_sql_data(query):
server = "your_sql_server"
database = "your_database_name"
username = "your_username"
password = "your_password" # Use env variables or Secret Manager in production
conn_str = f"DRIVER={{ODBC Driver 13 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"
conn = pyodbc.connect(conn_str)
df = pd.read_sql(query, conn)
conn.close()
return df
# Run query, fetch data, and push to BigQuery
query = "SELECT * FROM your_table"
df = get_sql_data(query)
df.to_gbq("your_gcp_project_id.your_dataset_name.your_table_name", project_id="your_project_id", if_exists="replace")
Step 3: How the Script Works
1. Connecting to SQL Server and Fetching Data
The function get_sql_data(query)
connects to SQL Server using pyodbc and executes the provided SQL query. The results are returned as a Pandas DataFrame, making it easy to manipulate and upload to BigQuery.
What to modify:
"your_sql_server"
→ Replace with your actual SQL Server hostname or IP address."your_database_name"
→ Set this to your database name."your_username"
&"your_password"
→ Update with your credentials.
2. Running the SQL Query
The script currently fetches all data:
SELECT * FROM your_table
To limit the data, modify the query with a WHERE clause, like:
SELECT * FROM your_table WHERE created_at >= '2024-01-01'
3. Writing Data to BigQuery
df.to_gbq("your_gcp_project_id.your_dataset_name.your_table_name", project_id="your_project_id", if_exists="replace")
Step 4: Running and Automating the Script in BigQuery Studio
After setting up the Colab Enterprise Notebook in BigQuery Studio, run the script inside the notebook. This ensures that your data pipeline executes directly in Google Cloud without any manual exports or imports.
Final Thoughts – Automate SQL Server to BigQuery with Ease
This Python script automates the entire data transfer process between SQL Server and BigQuery. With Colab Enterprise Notebook in BigQuery Studio, you can run it seamlessly without worrying about manual exports.
Modify the connection details, query, and BigQuery table name, and your pipeline is ready to go.
If you found this helpful, please share and comment below! 🚀
No comments:
Post a Comment
We'd like to hear your comments!