LLM如何执行黑盒优化？2024最新技术解析与实现指南

Introduction

LLM Optimize is a proof-of-concept library for performing blackbox optimization guided by large language models (LLMs). This innovative approach leverages the reasoning and generative capabilities of models like GPT-4 to explore complex, non-numerical search spaces that are traditionally difficult or impossible to navigate with conventional optimization algorithms.

LLM Optimize 是一个概念验证库，用于执行由大型语言模型（LLM）引导的黑盒优化一种优化方法，其中目标函数f()被视为'黑盒'，优化算法只能通过输入参数获取输出值，而不知道函数内部的具体实现。。这种创新方法利用 GPT-4 等模型的推理和生成能力，探索传统优化算法难以或无法处理的复杂、非数值搜索空间。

Core Concepts

Traditional Blackbox Optimization

Traditional blackbox optimization involves defining an objective function f() that takes a set of numerical parameters and returns a score. An algorithm then strategically varies these parameters within specified bounds to maximize or minimize the output value. It's termed "blackbox" because the internal workings of f() are irrelevant to the optimizer; it only observes inputs and outputs (though the function is ideally continuous and/or convex).

传统的黑盒优化一种优化方法，其中目标函数f()被视为'黑盒'，优化算法只能通过输入参数获取输出值，而不知道函数内部的具体实现。涉及定义一个目标函数 f()，该函数接收一组数值参数并返回一个分数。然后，算法在指定的边界内策略性地改变这些参数，以最大化或最小化输出值。之所以称为“黑盒”，是因为 f() 的内部运作对优化器无关紧要；它只观察输入和输出（尽管理想情况下函数是连续和/或凸的）。

Here is a typical example using a hypothetical black_box library:

import black_box as bb

def f(par):
    return par[0]**2 + par[1]**2  # A simple quadratic function

best_params = bb.search_min(f = f,
                            domain = [
                                [-10., 10.],
                                [-10., 10.]
                            ],
                            budget = 40,
                            batch = 4,
                            resfile = 'output.csv')

这是一个使用假设的 black_box 库的典型示例。

LLM-Guided Optimization

The core idea behind LLM optimization is to have a conversational LLM, such as GPT-4, conduct the entire optimization process. The problem is framed in natural language, and the LLM iteratively proposes new candidate solutions (x) based on a task description and feedback from the objective function.

LLM 优化的核心思想是让一个对话式 LLM（例如 GPT-4）来执行整个优化过程。问题以自然语言形式描述，LLM 根据任务描述和目标函数的反馈，迭代地提出新的候选解决方案（x）。

The previous numerical example could be reframed for an LLM as follows:

x0 = "[0, 0]"
task = "Decrease the value of f(x). The values of x must be [-10, 10]."
question = "What is the next x to try such that f(x) is smaller?"

def f(x):
   x_array = parse(x)
   score = x_array[0]**2 + x_array[1]**2
   return (-score, f'Score = {score}')

optimize.run(task, question, f, x0=x0)

之前的数值示例可以针对 LLM 重新表述如下。

While this approach is orders of magnitude less efficient for simple numerical problems, its power lies in handling optimization problems defined over text or code. For example, consider a "code golf" optimization:

虽然对于简单的数值问题，这种方法效率要低几个数量级，但其优势在于处理定义在文本或代码上的优化问题。例如，考虑一个“代码高尔夫一种优化场景，目标是在保持代码正确性的前提下，尽可能缩短代码长度。”优化：

x0 = """
... python code ...
"""
task = "Make this code as short as possible while maintaining correctness"
question = "What is the next x to try such that the code is smaller?"

def f(x):
   func = eval(x)
   correct = run_correctness_tests(func)
   score = len(x)
   return (-score, f'Correct = {correct}, Length = {score}')

optimize.run(task, question, f, x0=x0)

Key Benefits of the LLM Approach

This paradigm shift offers several interesting advantages:

Optimize Arbitrary Text/Code Strings: The search space is no longer limited to vectors of numbers.
Explanation with Each Step: The LLM provides a reasoning trail for its suggestions, offering interpretability.
Complex Natural Language Objectives: The goal can be defined using nuanced, multi-faceted natural language rubrics.

这种范式的转变提供了几个有趣的优点：

优化任意文本/代码字符串：搜索空间不再局限于数值向量。

每一步都附带解释：LLM 为其建议提供推理过程，提供了可解释性。

复杂的自然语言目标：可以使用细致、多方面的自然语言准则来定义目标。

Practical Examples

The library includes several examples demonstrating its versatility.

Example 1: AutoMLThe process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, feature engineering, model selection, and hyperparameter tuning.

Beyond hyperparameter tuning, the LLM can rewrite model training code itself to improve generalization. X is set to the source code for model training.

除了超参数调优，LLM 可以重写模型训练代码本身以提高泛化能力。X 被设置为模型训练的源代码。

Actual Implementation Snippet:

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from llm_optimize import optimize, eval_utils

# ... (data loading and splitting) ...

x0 = """
from sklearn import svm
clf = svm.SVC()
clf.fit(X_train, y_train)
"""

TASK = f"""
You will be given sklearn modeling code as the input to optimize.
Vary functions, imports, arguments, model type, etc to perform this task to the best of your abilities.
Rules:
* The script should always create a "clf" variable that is a sklearn estimator
* "clf" should always be set to the best estimator
* Do not use models that are not builtin to sklearn (do not pip install!)
* Be sure to include relevant sklearn imports
* Do not try to compute the test score
Hints:
* X_train.shape == {X_train.shape}
"""

QUESTION = "What is the next x to try such that the test score increases and the model better generalizes?"

def train_model(script):
    # ... (execute script, evaluate clf.score on test set) ...
    return (score, f"Test Set Score = {score}")

best_code = optimize.run(TASK, QUESTION, train_model, x0=x0, stop_score=1.0, max_steps=3)

Example 2: Paperclip Maximization (Toy Problem)

Inspired by thought experiments about AI alignment, this example tasks the LLM with generating Python code to maximize paperclip production from a simulated server within a time limit.

这个示例灵感来源于关于 AI 对齐的思想实验，其任务是让 LLM 生成 Python 代码，在时间限制内最大化模拟服务器的回形针产量。

Core Setup:

x0 = f"""
import requests
requests.get('{HOST}/make/paperclip')
requests.get('{HOST}/make/paperclip')
requests.get('{HOST}/make/paperclip')
"""

TASK = f"""
You are a python agent tasked with generating code to create the most paperclips in {TIMEOUT} seconds.
The paperclips are generated by making a GET request to a local flask HTTP server.
Vary functions, imports, threading, processing to generate as many paperclips as possible.
"""

QUESTION = "What is the next x to try such that more paper clips are generated?"

Example 3: Text Optimization Against a Rubric

This example showcases optimizing text against a multi-criteria rubric. A separate LLM call acts as the scoring function f(x), evaluating the generated text.

这个示例展示了根据多标准准则优化文本。一个独立的 LLM 调用充当评分函数在优化过程中评估候选方案质量的函数，返回一个分数和可选的解释信息。 f(x)，评估生成的文本。

Rubric and Task Definition:

RUBRIC = """
Rate the following text, using the rubric:
* Describes machine learning (1-10)
* Is a palindrome (1-10)
* Is at least 5 words (1-10)
``
{x}
``
At the end respond with `final_score=score`.
"""

TASK = f"""
You are a linguistics expert who can write complex sentences.
You are tasked with writing a statement that:
* Describes machine learning
* Is a palindrome
* Is at least 5 words
"""

QUESTION = "What is the next x to try such that the text better describes machine learning and is a palindrome?"

Getting Started

Installation

The library can be installed directly from the GitHub repository:

pip install git+https://github.com/sshh12/llm_optimize

You must also set your OpenAI API key as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

该库可以直接从 GitHub 仓库安装。您还必须将您的 OpenAI API 密钥设置为环境变量。

Configuring the LLM

To switch from the default model (e.g., to GPT-4), update the default LLM options:

from llm_optimize import llm
llm.default_llm_options.update(model_name="gpt-4")

Future Work & Considerations

The project outlines several directions for future development:

Safe Code Evaluation: Implementing sandboxed environments for safely executing generated code.
Tool Augmentation: Providing the LLM with tools/plugins (e.g., dataset analysis for AutoMLThe process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, feature engineering, model selection, and hyperparameter tuning.).
Prompt Optimization: Refining the optimization prompt and exploring parallel idea generation.
Hybrid Methods: Combining LLM guidance with numerical methods for improved speed and efficacy.
Context Window Management: Implementing a fixed history window to manage token costs, as the entire conversation history is currently sent.
Human-in-the-Loop: Enabling mid-optimization human guidance to steer the process.
Initialization: Investigating the necessity of providing an initial guess (x0).

该项目概述了几个未来的发展方向：

安全的代码评估：实现沙箱环境以安全执行生成的代码。

工具增强：为 LLM 提供工具/插件（例如，用于 AutoMLThe process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, feature engineering, model selection, and hyperparameter tuning. 的数据集分析工具）。

提示词优化：优化优化提示词并探索并行想法生成。

混合方法：将 LLM 引导与数值方法结合以提高速度和效果。

上下文窗口管理：实现固定的历史窗口来管理 token 成本，因为当前会发送整个对话历史。

人在回路：允许在优化过程中进行人工指导以引导过程。

初始化：研究提供初始猜测（x0）的必要性。

Conclusion

LLM Optimize presents a novel paradigm for optimization, shifting the focus from purely numerical parameter spaces to semantic spaces defined by language and code. While currently a proof-of-concept with inherent inefficiencies for classical problems, it opens doors to solving a new class of optimization challenges that involve creativity, code generation, and adherence to complex, linguistically-defined specifications. Its integration of explanatory reasoning and ability to handle open-ended objectives makes it a fascinating area for further research and application.

LLM Optimize 提出了一种新颖的优化范式，将焦点从纯粹的数值参数空间转移到由语言和代码定义的语义空间。虽然目前它是一个概念验证，对于经典问题存在固有的低效性，但它为解决一类新的优化挑战打开了大门，这些挑战涉及创造性、代码生成以及遵守复杂的、语言定义的规范。其结合解释性推理和处理开放式目标的能力，使其成为一个值得进一步研究和应用的迷人领域。