How to train Vertex AI: Generative AI studio on custom dataset?

How to input own dataset for generative ai?

@xylobrite I am searching for the same answer. What if we feed input file from cloud storage and read it in context of model using python?

I think Gen APP Builder can be solution but its difficult to get access for me.

1 Like

@xylobrite @raja9682

Tune language foundation models | Vertex AI | Google Cloud

This is the foundational general available fine tuning is for the prompt itself, right?

How to send own custom data eg. reading from docs, pdfs, sql as context?

1 Like

I had no idea about Gen APP builder, thanks. It is only for allowlisted customer and in development phase I guess?

I’ll give an example with the dataset alpaca, from huggingface:

import torch
from datasets import load_dataset
import pandas as pd
import json

train_dataset = load_dataset("tatsu-lab/alpaca", split="train")

# Transform this pytorch dataset train_dataset in a pandas dataframe
df = train_dataset.to_pandas()
df["input_text"]=df.text.astype(str)+': '+df.instruction.astype(str)
df["output_text"]=df.output.astype(str)
df=df[["input_text","output_text"]]

data_list = df.to_dict(orient='records')
with open('output_alpaca_k.jsonl', 'w') as file:
    for example in data_list:
        file.write(json.dumps(example) + '\n')

The result is a JSON like this, one example per line: