I am training a regression model. My dataset only has 50 rows and 5 columns. There is one categorical column and it has only 3 categories. I trained my model in Colabs, and it didn’t take even a minute. On Vertex, it takes hours. Why does Vertex take so long to train a simple model with such a small dataset? I was expecting it to be quicker than Colab.
Ultimately, it failed after 24 minutes telling me there were too few rows. The minimum number is 1000. Couldn’t it just check that in the beginning? It took Vertex 24 minutes to realize that there were only 50 rows. If I had 500 rows, would it have been 9 times more time to give me this error? So somewhat around 5 hours. That is not cool.
Your question has brought to light something I had not noticed before. Yes, the vertex has data structure requirements.
The dataset must have at least 1,000 and no more than 100,000,000 rows. Depending on how many features your dataset has, 1,000 rows might not be enough to train a high-performing model.
Considering you are using such a small dataset, based on my practical experience, I highly recommend conducting experiments directly on your local machine or Colab. It’s cost-effective and more efficient.