How to input own dataset for generative ai?
@xylobrite I am searching for the same answer. What if we feed input file from cloud storage and read it in context of model using python?
I think Gen APP Builder can be solution but its difficult to get access for me.
1 Like
This is the foundational general available fine tuning is for the prompt itself, right?
How to send own custom data eg. reading from docs, pdfs, sql as context?
1 Like
I had no idea about Gen APP builder, thanks. It is only for allowlisted customer and in development phase I guess?
I’ll give an example with the dataset alpaca, from huggingface:
import torch
from datasets import load_dataset
import pandas as pd
import json
train_dataset = load_dataset("tatsu-lab/alpaca", split="train")
# Transform this pytorch dataset train_dataset in a pandas dataframe
df = train_dataset.to_pandas()
df["input_text"]=df.text.astype(str)+': '+df.instruction.astype(str)
df["output_text"]=df.output.astype(str)
df=df[["input_text","output_text"]]
data_list = df.to_dict(orient='records')
with open('output_alpaca_k.jsonl', 'w') as file:
for example in data_list:
file.write(json.dumps(example) + '\n')
The result is a JSON like this, one example per line:
