Hi @domokapsky ,
Yes, this is a limitation due to Dataform’s restricted execution environment, which does not support Node.js modules that rely on certain system functionalities like child_process.
However, you can work around this limitation by using Cloud Functions or Cloud Run to handle the file-moving operation. Here’s how you can integrate this into your Dataform workflow:
Approach Using Google Cloud Functions
1. Create a Cloud Function:
Develop a Cloud Function that moves files in GCS.
// index.js
const {Storage} = require('@google-cloud/storage');
exports.moveFiles = async (req, res) => {
const storage = new Storage();
const sourceBucketName = req.body.sourceBucket;
const targetFolder = req.body.targetFolder;
const filePaths = req.body.filePaths; // Array of file paths to move
try {
for (const filePath of filePaths) {
const fileName = filePath.split('/').pop();
await storage.bucket(sourceBucketName).file(filePath).move(`${targetFolder}/${fileName}`);
}
res.status(200).send('Files moved successfully.');
} catch (error) {
console.error('Error moving files:', error);
res.status(500).send('Error moving files.');
}
};
2. Deploy the Cloud Function:
Deploy the function using the gcloud CLI:
gcloud functions deploy moveFiles \ --runtime nodejs18 \ --trigger-http \ --allow-unauthenticated
3. Call the Cloud Function from Dataform:
Modify your Dataform JavaScript action to call the Cloud Function via HTTP request instead of using the GCS Node.js client library directly.
const axios = require('axios');
async function main(params) {
const url = 'https://<YOUR_CLOUD_FUNCTION_URL>'; // Replace with your Cloud Function URL
const payload = {
sourceBucket: params.sourceBucket,
targetFolder: params.targetFolder,
filePaths: params.rows.map(row => row.file_path)
};
try {
const response = await axios.post(url, payload);
console.log('Cloud Function response:', response.data);
} catch (error) {
console.error('Error calling Cloud Function:', error);
throw error;
}
}
module.exports = {main};
4. Update Dataform Configuration:
Ensure your dataform.json or configuration file reflects these changes.
{
"actions": [
{
"name": "move_csv_files",
"type": "js",
"file": "move_files.js",
"dependencies": ["your_sqlx_file_name"], // Ensure SQLX runs first
"params": {
"sourceBucket": "${constants.CLIENT_BUCKET_NAME}",
"targetFolder": "processed"
}
}
]
}
If you prefer using Cloud Run for more control over the environment:
1. Create a Cloud Run Service:
- Write the file-moving logic in a Node.js application.
- Use
@google-cloud/storage to interact with GCS.
- Deploy this application to Cloud Run.
2. Call Cloud Run from Dataform:
- Replace the Cloud Function URL with your Cloud Run service URL.
- Use the same
axios call method in the Dataform JavaScript action.
By using Cloud Functions or Cloud Run, you can handle more complex file operations while leveraging Dataform for the core data management and orchestration. This method bypasses the limitations of Dataform’s execution environment and provides a flexible solution for your use case.