Suppose, I have a tensor tfDataSet as follows:

data3d = [
[[7.042   9.118  0.      1.    1.    1.    1.    1.    0.    0.   1.   ]
 [5.781   5.488  7.47    0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.399   5.166  6.452   0.    0.    0.    0.    0.    1.    0.   0.   ]
 [5.373   4.852  6.069   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.423   5.164  6.197   0.    0.    0.    0.    2.    1.    0.   0.   ]]
,
[[ 5.247  4.943  6.434   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [ 5.485  8.103  8.264   0.    0.    0.    0.    1.    0.    0.   1.   ]
 [ 6.675  9.152  9.047   0.    0.    0.    0.    1.    0.    0.   1.   ]
 [ 6.372  8.536 11.954   0.    0.    0.    0.    0.    0.    0.   1.   ]
 [ 5.669  5.433  6.703   0.    0.    0.    0.    0.    1.    0.   0.   ]]
, 
[[5.304   4.924  6.407   0.    0.    0.    0.    0.    1.    0.   0.   ]
 [5.461   5.007  6.088   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.265   5.057  6.41    0.    0.    0.    0.    3.    0.    0.   1.   ]
 [5.379   5.026  6.206   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.525   5.154  6.      0.    0.    0.    0.    1.    1.    0.   0.   ]]
,
[[5.403   5.173  6.102   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.588   5.279  6.195   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.381   5.238  6.675   0.    0.    0.    0.    1.    0.    0.   1.   ]
 [5.298   5.287  6.668   0.    0.    0.    0.    1.    1.    0.   0.   ]
 [5.704   7.411  4.926   0.    0.    0.    0.    1.    1.    0.   0.   ]]
,

... ... ... ...
... ... ... ...
]

tfDataSet = tf.convert_to_tensor(data3d)

In each 2D arry inside the tensor, 1st eight columns are features, and the rest three columns are one-hot-encoded labels.

Suppose, I want to feed this tensor into a CNN. For that, I need to do two things:

  • (1) split the data3d into trainData3d, validData3d, and testData3d
  • (2) split each of the above three into featureData3d and labelData3d.

Now, my question is, which one of the above steps should I do first and which one should I do second in order for being least expensive?

Explain why.

If I do #2 first, how can the feature and labels data maintain their correspondence?

Cross-posted: stackoverflow