TODO notes

2026-02-12 14:42:48 +00:00 · 2025-07-08 15:54:19 -06:00
parent af75e9aabf
commit fabf36675d
1 changed files with 33 additions and 1 deletions
--- a/docs/to-do.md
+++ b/docs/to-do.md
@@ -3,4 +3,36 @@
 - Look into approach #3 in addition to previously stated approaches:
 1. Baseline (no guidelines)
 2. Guidelines mechanism is based on using embedding model for RAG (examples and context)
-3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each.
+3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each.
+
+### Prompt Templates
+
+[   ] Base Phi-3 template
+[   ] Few Shot template with examples
+[   ] Support loading prompt injection prompts and completions
+[   ] Correlate template to violation rate
+
+### Test Runs
+
+[   ] run tests with various configuration-based settings (can pytest accept varying YML config args?)
+[   ] run test with random samplings of 25-30 each run, or increase timeouts
+[   ] log all max and average scores (tied to test name) to track overall baselines
+[   ] build up significant amount of test run results (JSON) for data viz
+
+### Metrics: General
+
+[   ] use TF-IDF from scikit learn
+[   ] visualize results with Plotly/Seaborn? determine visualization metrics, use dummy numbers first
+
+### Metrics: False Refusal Rate, Effectiveness
+
+[   ] define separate measures for false refusal rate
+[   ] measure effectiveness of LLM app overall: false refusal rate vs. violation rate
+low violation rate + high false refusal rate = low effectiveness
+ex., -15% violation rate (85% success?) + -(70%) false refusal rate = 15% effectiveness 
+ex., -29% violation rate (71% success?) + -(12%) false refusal rate = 59% effectiveness 
+
+### Mitigations Applied to CI/CD Pipeline
+
+[   ] revisit GitHub actions and demonstrate failing the build - this is how the results of the research are applied as a security control
+