In Vertex AI I am updating an image dataset, thus:
from google.cloud import aiplatform
import_schema_uri = aiplatform.schema.dataset.ioformat.image.single_label_classification
dataset_id = "my_ds_id"
ds = aiplatform.ImageDataset(dataset_id)))
ds.import_data(gcs_source=DATASET_PATH, import_schema_uri=import_schema_uri)
the images are uploaded to the dataset but their labels are ignored and they are classed as Unlabeled. What am I doing wrong? TIA!
PS they are in a csv, like:
gs://path/to/file/barnacles.jpg,label1
which worked fine for the dataset creation.
You could check this sample code to Import data for image classification single label:
from google.cloud import aiplatform
def import_data_image_classification_single_label_sample(
project: str,
dataset_id: str,
gcs_source_uri: str,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
timeout: int = 1800,
):
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.DatasetServiceClient(client_options=client_options)
import_configs = [
{
"gcs_source": {"uris": [gcs_source_uri]},
"import_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml",
}
]
name = client.dataset_path(project=project, location=location, dataset=dataset_id)
response = client.import_data(name=name, import_configs=import_configs)
print("Long running operation:", response.operation.name)
import_data_response = response.result(timeout=timeout)
print("import_data_response:", import_data_response)
Thanks, but exactly the same result.
From this Tensorflow blog post:
In addition to image files, we’ve provided a CSV file (all_data.csv) containing the image URIs and labels. We randomly split this data into two files, train_set.csv and eval_set.csv, with 90% data for training and 10% for eval, respectively.> > > gs://cloud-ml-data/img/flower_photos/dandelion/17388674711_6dca8a2e8b_n.jpg,dandelion> gs://cloud-ml-data/img/flower_photos/sunflowers/9555824387_32b151e9b0_m.jpg,sunflowers> gs://cloud-ml-data/img/flower_photos/daisy/14523675369_97c31d0b5b.jpg,daisy> gs://cloud-ml-data/img/flower_photos/roses/512578026_f6e6f2ad26.jpg,roses> gs://cloud-ml-data/img/flower_photos/tulips/497305666_b5d4348826_n.jpg,tulips> > > > We also need a text file containing all the labels (dict.txt), which is used to sequentially map labels to internally used IDs. In this case, daisy would become ID 0 and tulips would become 4. If the label isn’t in the file, it will be ignored from preprocessing and training.> > > daisy > dandelion > roses > sunflowers > tulips > >
Therefore, you need to create the dict.txt file which will have the all the labels used as shown above.
See also:
Thanks but that is six years old and not a Vertex AI dataset.
Could you please raise a private thread in the issue tracker (referencing this question, as stated in the template) with the project ID, job ID and a sample data of your input CSV file (Don’t want the entire file or any PII)?