Commit Graph

32 Commits

Author SHA1 Message Date
Adam Wilson c1d943195b fix 2025-08-19 20:40:04 -06:00
Adam Wilson 0465d77c13 fix 2025-08-19 20:36:00 -06:00
Adam Wilson cc124a91a3 support batch tests 2025-08-19 20:09:34 -06:00
Adam Wilson 378aea7a66 100 math prompts, not 150 2025-07-30 11:13:09 -06:00
Adam Wilson 2659e6e43c more updates for reflexion 2025-07-28 10:31:55 -06:00
Adam Wilson 5bc9f480f9 all domain unit tests pass 2025-07-27 18:53:30 -06:00
Adam Wilson eddacd87fa LLM config output 2025-07-27 11:21:12 -06:00
Adam Wilson a7a6873e73 update prompt templates; support LLM config logging 2025-07-26 22:10:04 -06:00
Adam Wilson 741629908c updates for RAG + CoT tests 2025-07-25 18:11:49 -06:00
Adam Wilson 72785c6420 updates for RAG + CoT 2025-07-25 17:24:01 -06:00
Adam Wilson d15e9d6794 more test and template setup 2025-07-25 09:45:03 -06:00
Adam Wilson 3a62ecfae8 add test 0 results 2025-07-25 08:47:56 -06:00
Adam Wilson ae279a512d log LLM config 2025-07-23 20:21:42 -06:00
Adam Wilson cb92890bb9 break tests into separate files; test 0 results 2025-07-23 19:06:27 -06:00
Adam Wilson 1b5b808ff6 use new garak true positives in tests 2025-07-23 15:59:56 -06:00
Adam Wilson 41afb99622 dependency fixes, test setup 2025-07-18 18:18:56 -06:00
Adam Wilson 1dba565236 service implementations 2025-07-16 20:21:10 -06:00
Adam Wilson b4b2d792fc more progress on fluent service call 2025-07-09 21:56:44 -06:00
Adam Wilson af75e9aabf support prompt template loading 2025-07-07 21:38:42 -06:00
Adam Wilson ffa2d73ae0 guardrail analyzed response, etc. 2025-07-06 15:15:59 -06:00
Adam Wilson a1d3a8c1b7 adjust assertions for test 3 2025-07-05 20:21:35 -06:00
Adam Wilson 640c261b26 naming updates; fix static analysis script 2025-07-05 13:01:28 -06:00
Adam Wilson cb1be6746f support testing malicious prompts with no guidelines 2025-06-28 12:18:35 -06:00
Adam Wilson 036d36bf4f compare math prompt completions to DAN response 2025-06-25 21:47:22 -06:00
Adam Wilson eed481ee77 document intended test cases/methodology; math prompts 2025-06-25 15:41:11 -06:00
Adam Wilson a530e78399 refactoring 2025-06-25 14:54:12 -06:00
Adam Wilson 9b8b6b7105 add/update services, constants 2025-06-25 12:53:24 -06:00
Adam Wilson 9057b0e977 refactor to use services instead of language model objects directly 2025-06-25 06:28:05 -06:00
Adam Wilson fc4978f43c add RAG-based LM in conftest 2025-06-24 14:23:53 -06:00
Adam Wilson 92e00b9eb2 integration tests 2025-06-24 10:57:44 -06:00
Adam Wilson 0b6b7b79b9 service layer tests 2025-06-12 20:22:39 -06:00
Adam Wilson bc1093988b test config placeholders 2025-05-30 12:46:11 -06:00