During the fine-tuning of the Qwen1.5-14B large model, strange loss oscillations were encountered and could not be resolved

  Kiến thức lập trình

`The following are the loss images, validation set loss, and training parameters:
enter image description here
enter image description here

parameters: 
model_path=/opt/workspace-cyx/model_test/Qwen1.5-14B train_dataset_dir=alpaca_gpt4_data_en,alpaca_gpt4_data_zh,oaast_sft_zh,oaast_sft

per_device_train_batch_size=4 
gradient_accumulation_steps=2 
output_dir=/opt/workspace-cyx/model_test/output_dir 
accelerate launch --config_file accelerate_config.yaml src/train_bash.py 
--max_samples 1000000 
--stage sft 
--do_train 
--model_name_or_path ${model_path} 
--dataset ${train_dataset_dir} 
--template qwen 
--finetuning_type lora 
--lora_target q_proj,v_proj 
--output_dir ${output_dir} 
--per_device_train_batch_size ${per_device_train_batch_size} 
--gradient_accumulation_steps ${gradient_accumulation_steps} 
--lr_scheduler_type cosine 
--logging_steps 5 
--save_steps 2000 
--learning_rate 1e-5 
--num_train_epochs 1.0 
--plot_loss 
--fp16 
--do_eval 
--save_steps 100 
--eval_steps 100 
--val_size 0.01 
--evaluation_strategy steps 

Qwen and Qwen1.5 both have fine-tuning for 7B and 14B, using the four datasets alpaca_gpt4_data_en, alpaca_gpt4_data_zh, oaast_sft_zh, and oaast_sft that come with llama_factory.
Training process:

  1. What I initially thought was whether the parameter model was not applicable, but after trying several thousand question models, there were varying degrees of oscillations
  2. Then we started modifying the parameters, but when we modified batch size, lora_rank, and other parameters, the results were still almost the same
  3. The dataset is officially provided and there should be no problem, with a total of tens of thousands of instructions

The current idea is:

  1. Does this model have convergence or not? Is there no problem with model training? It’s just that Qwen1.5 has a strong ability to receive these datasets and oscillates normally [because there are no problems with the validation set]
  2. There is an issue with the parameters/dataset, but it has been adjusted many times and still cannot be resolved
    I don’t know if anyone has encountered such problems while fine-tuning, and how they have been resolved. I hope there are someone to help me clarify,thank you!`

New contributor

chenchenchen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT