Handle label feature in TFX in different environments

  Kiến thức lập trình

I’m new in MLOps and trying to figure out how to work with label feature in data. I read that for the uniformity of the data it is necessary to use the same schema for both the training and validation sets.
My question is: How can I mark the label as optional for the validation set? If I received data from users that will not have a label feature, how do I compare the schemas of the new data (without label) with the original schema that contains the label by example_validator?
I know that it can be done by tfdv.get_feature(schema, 'labels').not_in_environment.append('SERVING'), but as far as I know it is not solution for production pipeline.
Another thoughts: using preprocessing function in Transform delete label feature from validation set, but I don’t really understand how context.pipeline.stage works. Example:

if tft.TFTRuntimeContext().context.pipeline.stage == 'train':
    #transform label for training set
else:#for validation set delete label column
    pass

Thanks!

I tried handle this by preprocessing function, but I need to compare schemas from original data and data from user for prediction (without label) BEFORE making transformation. So, this cause error in example_validator.
All way is: ExampleGen – StatisticGen- SchemaGen – ExampleValidator – Transform – (and stage for model).

New contributor

AnnacKK is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT