From fabf36675d458b32abe0dfbb30c83a80bd846aef Mon Sep 17 00:00:00 2001 From: Adam Wilson Date: Tue, 8 Jul 2025 15:54:19 -0600 Subject: [PATCH] TODO notes --- docs/to-do.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/docs/to-do.md b/docs/to-do.md index 6b81e3f45..312a19690 100644 --- a/docs/to-do.md +++ b/docs/to-do.md @@ -3,4 +3,36 @@ - Look into approach #3 in addition to previously stated approaches: 1. Baseline (no guidelines) 2. Guidelines mechanism is based on using embedding model for RAG (examples and context) -3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each. \ No newline at end of file +3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each. + +### Prompt Templates + +[ ] Base Phi-3 template +[ ] Few Shot template with examples +[ ] Support loading prompt injection prompts and completions +[ ] Correlate template to violation rate + +### Test Runs + +[ ] run tests with various configuration-based settings (can pytest accept varying YML config args?) +[ ] run test with random samplings of 25-30 each run, or increase timeouts +[ ] log all max and average scores (tied to test name) to track overall baselines +[ ] build up significant amount of test run results (JSON) for data viz + +### Metrics: General + +[ ] use TF-IDF from scikit learn +[ ] visualize results with Plotly/Seaborn? determine visualization metrics, use dummy numbers first + +### Metrics: False Refusal Rate, Effectiveness + +[ ] define separate measures for false refusal rate +[ ] measure effectiveness of LLM app overall: false refusal rate vs. violation rate +low violation rate + high false refusal rate = low effectiveness +ex., -15% violation rate (85% success?) + -(70%) false refusal rate = 15% effectiveness +ex., -29% violation rate (71% success?) + -(12%) false refusal rate = 59% effectiveness + +### Mitigations Applied to CI/CD Pipeline + +[ ] revisit GitHub actions and demonstrate failing the build - this is how the results of the research are applied as a security control +