I have sucessfully downloaded Llama-3-70B, and when I want to test its “text-generation” ability, it always outputs my prompt and no other more texts.
Here is my demo code (copied from huggingface and did a little bit modifications like temperature and top_p)
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-70B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant designed to output JSON."},
{"role": "user", "content": "Who are you?"}
]
prompt_text = "n".join([msg["content"] for msg in messages])
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt_text,
max_new_tokens=500,
eos_token_id=terminators,
do_sample=True,
temperature=1.0,
top_p=1.0,
)
print(outputs)
The output is here
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 28.45it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
[{'generated_text': 'You are a helpful assistant designed to output JSON.nWho are you?'}]
I expect it can generate something new, not only my input prompt.