Vertex takes so much time

I am training a regression model. My dataset only has 50 rows and 5 columns. There is one categorical column and it has only 3 categories. I trained my model in Colabs, and it didn’t take even a minute. On Vertex, it takes hours. Why does Vertex take so long to train a simple model with such a small dataset? I was expecting it to be quicker than Colab.

Ultimately, it failed after 24 minutes telling me there were too few rows. The minimum number is 1000. Couldn’t it just check that in the beginning? It took Vertex 24 minutes to realize that there were only 50 rows. If I had 500 rows, would it have been 9 times more time to give me this error? So somewhat around 5 hours. That is not cool.

Hi @Mouzma !

Your question has brought to light something I had not noticed before. Yes, the vertex has data structure requirements.

  • The dataset must have at least 1,000 and no more than 100,000,000 rows. Depending on how many features your dataset has, 1,000 rows might not be enough to train a high-performing model.

Considering you are using such a small dataset, based on my practical experience, I highly recommend conducting experiments directly on your local machine or Colab. It’s cost-effective and more efficient.

Best wishes,

1 Like

I am exploring vertex for now so just doing a few experiments.

Hello @Ivy_wang I am facing the same issue with a dataset of 10000 rows.

The first time I ran the training pipeline, the train_model step took 1 hour without completing, I had to stop it to relaunch the pipeline.

After relaunching it by cloning the training pipeline with the same parameters, the train_model step now took 2 minutes to run.

Can you explain what happened ?

Also, I want to know if the pipeline and pipeline compnents are run inside the VM containing the note book ?

Thanks in advance.

Best Regards,

1 Like