Files
AutoTemp/README.md

5.2 KiB

AutoTemp

AutoTemp is a Python tool that enhances language model interactions by intelligently selecting the optimal temperature setting for generating responses. It leverages multiple temperature settings to produce a variety of outputs and then evaluates these outputs to determine which temperature yields the best result for a given prompt.

Features

  • Multi-Temperature Evaluation: Tests multiple temperature settings to find the best output for a given prompt.
  • Multi-Judge Structured Scoring: Runs N independent judges and aggregates relevance, clarity, utility, creativity, coherence, safety, and overall.
  • Advanced Optimization (UCB Bandit): Optional Upper Confidence Bound (UCB1) optimizer iteratively explores/exploits temperatures.
  • External Metrics (Optional): BLEU (sacrebleu), ROUGE (rouge-score), and BERTScore (bert-score) with graceful fallback if not installed.
  • Automatic or Manual Selection: Supports both automatic selection of the best output based on scores and manual selection by presenting options to the user.
  • Customizable Temperature Range: Users can define a custom range of temperatures to be tested for each prompt.
  • Easy Integration: Designed to work with OpenAI's GPT-3.5 or GPT-4 and is compatible with other language models that support temperature settings.

Installation

To install AutoTemp, you can simply clone the repository and install the required dependencies.

git clone https://github.com/elder-plinius/AutoTemp.git
cd AutoTemp
pip install -r requirements.txt

OpenAI API Key

Before running AutoTemp, you need to set up your API key in an .env file at the root of the project:

OPENAI_API_KEY='your-api-key-here'

This file should not be committed to your version control system as it contains sensitive information.

Usage

To use AutoTemp, simply run the autotemp.py script with Python:

python autotemp.py

You can use the Gradio UI to enter your prompt, select temperatures (comma-separated), configure top-p, toggle Auto Select, and enable Advanced Mode (UCB) with Rounds, number of Judges, and Exploration c.

Modes

  • Standard Mode: Generates one output per temperature, scores with multiple judges, and ranks by mean overall.
  • Advanced Mode (UCB): Treats each temperature as a bandit arm, pulls arms iteratively for the specified rounds using UCB1, and returns the best observed output with diagnostics.

Research Notes

  • The multi-judge rubric aims to reduce variance and bias of single-judge evaluations and returns mean scores across judges.
  • UCB1 balances exploration and exploitation, offering higher sample-efficiency than naive uniform evaluation, especially on longer prompts.

GitHub Pages (Static Web App)

This repo includes a static SPA in docs/ suitable for GitHub Pages. It runs entirely in the browser and calls the OpenAI API with a user-provided key.

Deploy Steps

  1. Commit and push the repo to GitHub.
  2. In your GitHub repo: Settings → Pages → Build and deployment → Source: Deploy from branch → Branch: main and Folder: /docs.
  3. Save. Your app will be published at https://<your-username>.github.io/<repo-name>/.

Using the Web App

  • Open the GitHub Pages URL.
  • Paste your OpenAI API key (optionally store it in your browser).
  • Enter your prompt, temperatures, top-p, judges, and Advanced/UCB options.
  • Click "Run AutoTemp" to see results and judge scores.

Security: The API key stays in the browser. Avoid sharing sensitive keys. For shared production, consider a server-side proxy.

Benchmarking

You can benchmark across a dataset of {prompt, reference} pairs and produce summary stats, confidence intervals, and a CSV of item-level results.

Example (in code):

from autotemp import AutoTemp

dataset = [
  {"prompt": "Summarize: ...", "reference": "Expected summary ..."},
  {"prompt": "Translate to French: ...", "reference": "..."},
]

agent = AutoTemp(judges=3, model_version="gpt-3.5-turbo")
summary = agent.benchmark(
  dataset=dataset,
  temperature_string="0.4,0.7,1.0",
  top_p=0.9,
  models=["gpt-3.5-turbo", "gpt-4"],
  advanced=True,
  rounds=8,
  judges=3,
  csv_path="results.csv",
)
print(summary)

Summary includes:

  • mean_overall and bootstrap CI
  • external metric means and CIs (if dependencies installed)
  • token usage and estimated USD cost per model

Configuration

You can customize the behavior of AutoTemp by setting the following parameters when initializing AutoTemp:

default_temp: The default temperature to use for initial output.
alt_temps: A list of alternative temperatures to evaluate.
auto_select: Whether to automatically select the best output or present options to the user.
max_workers: The maximum number of threads to use for concurrent API calls.
model_version: Specifies the model version to use, such as "gpt-3.5-turbo" or "gpt-4".
judges: Number of independent judges to run for scoring (default 3).
advanced_mode: Toggle UCB bandit optimization in the UI.
rounds: Number of bandit rounds (>=1) when Advanced Mode is enabled.
exploration c: UCB exploration coefficient; higher favors exploration.

Optional Dependencies

If you want external metrics, install:

pip install sacrebleu rouge-score bert-score