使用官方提供的脚本创建ChatGLM3的DEMO:
cd basic_demo python web_demo_gradio.py
出现效果异常问题:
====conversation====
[{'role': 'user', 'content': '你好'}, {'role': 'assistant', 'content': '你好,有什么我可以帮助你的吗?<|im_end|>'}, {'role': 'user', 'content': '你好'}]
No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
原因分析:模型版本与代码不匹配,配置文件中缺少prompt模板
解决方案:拉取最新版本ChatGLM3模型,再次尝试
tokenizer.chat_template介绍
Next time you use apply_chat_template(), it will use your new template! This attribute will be saved in the
tokenizer_config.json file, so you can use push_to_hub() to upload your new template to the Hub and make sure everyone’s using the right template for your model!设置tokenizer.chat_template属性后,下次使用apply_chat_template()时,将使用您的新模板!此属性保存在tokenizer_config.json文件中,因此您可以用push_to_hub()将新模板上传到Hub,确保大家都能使用正确的模板!
If a model does not have a chat template set, but there is a default template for its model class, the
ConversationalPipeline class and methods likeapply_chat_template will use the class template instead. You can find out what the default template for your tokenizer is by checking thetokenizer.default_chat_template attribute.如果模型没有设置聊天模板,但有其模型类的默认模板,则ConversationalPipeline类和apply_chat_template等方法将使用类模板代替。你可以通过检查tokenizer.default_chat_template属性来了解你的tokenizer的默认模板是什么。
def predict(history, max_length, top_p, temperature): stop = StopOnTokens() messages = [] for idx, (user_msg, model_msg) in enumerate(history): if idx == len(history) - 1 and not model_msg: messages.append({"role": "user", "content": user_msg}) break if user_msg: messages.append({"role": "user", "content": user_msg}) if model_msg: messages.append({"role": "assistant", "content": model_msg}) print(" ====conversation==== ", messages) print('debug: tokenizer.chat_template: {}'.format(tokenizer.chat_template)) print('debug: tokenizer.default_chat_template: {}'.format(tokenizer.default_chat_template)) model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(next(model.parameters()).device) streamer = TextIteratorStreamer(tokenizer, timeout=600, skip_prompt=True, skip_special_tokens=True) generate_kwargs = { "input_ids": model_inputs, "streamer": streamer, "max_new_tokens": max_length, "do_sample": True, "top_p": top_p, "temperature": temperature, "stopping_criteria": StoppingCriteriaList([stop]), "repetition_penalty": 1.2, } t = Thread(target=model.generate, kwargs=generate_kwargs) t.start() for new_token in streamer: if new_token != '': history[-1][1] += new_token yield history