Hello,
I’m doing a project for practice. The project is on creating a dummy data in csv file for employees using python script and store that csv file in cloud storage bucket. I used data fusion wrangler to transform data. So, when I opened the csv file in DataFusion I saw few fields were empty and last two columns was totally blank. In my desktop csv file all the data is available in all the column. Can anyone help me overcome this problem. I did alot of troubleshooting using chatgpt right now as a begineer im stuck here. If any one interested I can share my screen and we can collaborate and trouble shoot together. I want to overcome this challenge.
Here is my VS code:
import csv
from faker import Faker
import random
import string
from google.cloud import storage
import os
Set Google Cloud project environment variable
os.environ[‘GOOGLE_CLOUD_PROJECT’] = ‘marine-champion-432318-n3’
Initialize Faker
fake = Faker()
Generate dummy data
def generate_employee_data():
data = {
“first_name”: fake.first_name(),
“last_name”: fake.last_name(),
“email”: fake.email(),
“address”: fake.address(),
“phone_number”: fake.phone_number(),
“ssn”: fake.ssn(),
“date_of_birth”: fake.date_of_birth(minimum_age=18, maximum_age=65).isoformat(),
“password”: fake.password(length=12, special_chars=True, digits=True, upper_case=True, lower_case=True)
}
print(data) # Print generated data for debugging
return data
Save data to CSV
def save_to_csv(file_path, data_list ![]()
with open(file_path, mode=‘w’, newline=‘’, encoding=‘utf-8’) as file:
writer = csv.DictWriter(file, fieldnames=data_list[0].keys())
writer.writeheader()
writer.writerows(data_list)
print(f"Data saved to {file_path}")
Upload file to GCS
def upload_to_gcs(bucket_name, source_file_name, destination_blob_name ![]()
“”“Uploads a file to the bucket.”“”
Initialize a client
storage_client = storage.Client(project=‘marine-champion-432318-n3’)
Get the bucket
bucket = storage_client.bucket(bucket_name)
Create a blob object
blob = bucket.blob(destination_blob_name)
Upload the file
blob.upload_from_filename(source_file_name)
print(f"File {source_file_name} uploaded to {destination_blob_name}.")
if name == “main”:
Generate a list of employee data
employees = [generate_employee_data() for _ in range(10)] # Adjust the number of records as needed
Define file paths
csv_file_path = “employee_data.csv”
Save data to CSV
save_to_csv(csv_file_path, employees)
Define GCS parameters
bucket_name = “employee-project” # Replace with your bucket name
source_file_name = “employee_data.csv”
destination_blob_name = “employee_data.csv” # Blob name in GCS
Upload the file to GCS
upload_to_gcs(bucket_name, source_file_name, destination_blob_name)
I have attached screenshot of the file on how it looks in wrangler.

