I have a function like this which loads fine if I am running it directly without importing it from another helper.py file.

I’m not sure what’s causing the slow loading.


from transformers import BertTokenizer 
def embed_answers(ans, length): 
    sentence_embeddings = []
    embeddings = BertTokenizer.from_pretrained('...')
    sentence_embeddings.extend(embeddings.encode(ans, max_length=length, padding='max_length') 
    return sentence embeddings 

def get_dataset(vec_type): 
    vec_dict = {"large": 1000, "medium": 500, "small": 150} 
    if vec_type.lower() not in vec_dict: 
        raise Exception("Invalid vector type!")
    df = pd.read_hdf('...', mode='r')
    vec_length = vec_dict[vec_type]
    df['embeddings_col'] = df['answer'].apply(embed_answers, length=vec_length) 
    return df 

When I import and call get_dataset from a main.py file it crashes without loading fully. But directly running the function from main.py is fine.

Not sure what’s the issue, appreciate any ideas, thanks!