Octopus V2：斯坦福团队开源手机运行的大模型，一夜下载2千次，端侧AI再进一步！

项目简介

Octopus-v2是Nexa AI开发的一款开源语言模型，具有20亿参数，专为Android API的功能调用而设计。通过采用独特的functional token策略，Octopus-v2在训练和推理阶段都展现出了与GPT-4相媲美的性能，同时大幅提高了推理速度，特别适用于边缘计算设备。

在单个A100 GPU上的推理速度是”Llama7B + RAG解决方案”的36倍，比依赖于A100/H100 GPU集群的GPT-4-turbo快168%。此外，Octopus-v2在功能调用准确性方面也超越了”Llama7B + RAG解决方案”，与GPT-4及RAG + GPT-3.5在各项基准数据集上的表现相当，准确率介于98%至100%之间

Demo

对比

使用

可以使用以下代码在 GPU 上运行模型

from transformers import AutoTokenizer, GemmaForCausalLMimport torchimport time
def inference(input_text): start_time = time.time() input_ids = tokenizer(input_text, return_tensors=”pt”).to(model.device) input_length = input_ids[“input_ids”].shape[1] outputs = model.generate( input_ids=input_ids[“input_ids”], max_length=1024, do_sample=False) generated_sequence = outputs[:, input_length:].tolist() res = tokenizer.decode(generated_sequence[0]) end_time = time.time() return {“output”: res, “latency”: end_time – start_time}
model_id = “NexaAIDev/Octopus-v2″tokenizer = AutoTokenizer.from_pretrained(model_id)model = GemmaForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map=”auto”)
input_text = “Take a selfie for me with front camera”nexa_query = f”Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:”start_time = time.time()print(“nexa model result:\n”, inference(nexa_query))print(“latency:”, time.time() – start_time,” s”)

训练数据

·数据集的创建涉及三个关键阶段：

（1）生成相关查询及其关联的函数调用参数；

（2）开发不相关的查询并附带合适的函数体；

（3）通过谷歌Gemini实现二进制验证支持。

·谷歌Gemini生成的查询和函数调用

创建高质量数据集依赖于制定明确的查询和准确的函数调用参数。我们的策略强调生成单一API可以解决的正面查询。有了查询和预定的API描述后，我们利用随后的谷歌Gemini API调用来产生所需的函数调用参数。

·负面样本

为了增强模型的分析能力和实际应用，我们在正面和负面数据集中都包含了示例。这些集合之间的平衡由图3中的比例所表示，这是我们实验方法论的基础。具体来说，我们选择M和N的值相等，每个都指定为1000。

编写了 20 个 Android API 描述用于训练模型，下面是一个Android API描述示例

def get_trending_news(category=None, region=’US’, language=’en’, max_results=5): “”” Fetches trending news articles based on category, region, and language.
Parameters: – category (str, optional): News category to filter by, by default use None for all categories. Optional to provide. – region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses ‘US’. Optional to provide. – language (str, optional): ISO 639-1 language code for article language, by default uses ‘en’. Optional to provide. – max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.
Returns: – list[str]: A list of strings, each representing an article. Each string contains the article’s heading and URL. “””

THE END

AI资讯