mirror of
https://github.com/lightbroker/llmsecops-research.git
synced 2026-02-12 14:42:48 +00:00
TODO notes
This commit is contained in:
@@ -3,4 +3,36 @@
|
||||
- Look into approach #3 in addition to previously stated approaches:
|
||||
1. Baseline (no guidelines)
|
||||
2. Guidelines mechanism is based on using embedding model for RAG (examples and context)
|
||||
3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each.
|
||||
3. Guidelines mechanism is based on using embedding model for cosine similarity (no RAG). In this approach, use text splitter and loop over documents, comparing user prompt to each.
|
||||
|
||||
### Prompt Templates
|
||||
|
||||
[ ] Base Phi-3 template
|
||||
[ ] Few Shot template with examples
|
||||
[ ] Support loading prompt injection prompts and completions
|
||||
[ ] Correlate template to violation rate
|
||||
|
||||
### Test Runs
|
||||
|
||||
[ ] run tests with various configuration-based settings (can pytest accept varying YML config args?)
|
||||
[ ] run test with random samplings of 25-30 each run, or increase timeouts
|
||||
[ ] log all max and average scores (tied to test name) to track overall baselines
|
||||
[ ] build up significant amount of test run results (JSON) for data viz
|
||||
|
||||
### Metrics: General
|
||||
|
||||
[ ] use TF-IDF from scikit learn
|
||||
[ ] visualize results with Plotly/Seaborn? determine visualization metrics, use dummy numbers first
|
||||
|
||||
### Metrics: False Refusal Rate, Effectiveness
|
||||
|
||||
[ ] define separate measures for false refusal rate
|
||||
[ ] measure effectiveness of LLM app overall: false refusal rate vs. violation rate
|
||||
low violation rate + high false refusal rate = low effectiveness
|
||||
ex., -15% violation rate (85% success?) + -(70%) false refusal rate = 15% effectiveness
|
||||
ex., -29% violation rate (71% success?) + -(12%) false refusal rate = 59% effectiveness
|
||||
|
||||
### Mitigations Applied to CI/CD Pipeline
|
||||
|
||||
[ ] revisit GitHub actions and demonstrate failing the build - this is how the results of the research are applied as a security control
|
||||
|
||||
|
||||
Reference in New Issue
Block a user