I have the following function:
def pad(array, target_size=(224, 224), pad_value=255):
# calculate padding
pad_h = (target_size[0] - array.shape[0]) // 2
pad_w = (target_size[1] - array.shape[1]) // 2
# pad the image
padded_array = np.pad(array, ((pad_h, pad_h), (pad_w, pad_w), (0, 0)), mode='constant', constant_values=(pad_value, pad_value))
# convert to bytes
processed_array_bytes = padded_array.tobytes()
return processed_array_bytes
def top_level(input_data):
array = first_steps(input_data)
final_array = pad(array)
return final_array
When I run the input directly into the top-level function it works, however, it does not work as a UDF:
test_udf = udf(top_level, BinaryType())
result_df = df.withColumn('arrays', test_udf(df['column_to_use']))
Somehow, as a UDF this returns ValueError: index can't contain negative values
. I’ve pinpointed that this padding step is where the error is coming up, but I have no idea what’s going wrong or how to resolve it. Any help would be appreciated, thank in advance.