Adam Wilson
|
a636b7fbf7
|
fix test number file path; support smaller runs
|
2025-08-18 20:05:24 -06:00 |
|
Adam Wilson
|
8c6230e0dc
|
try smaller batches
|
2025-08-18 18:00:08 -06:00 |
|
Adam Wilson
|
09eac1f050
|
try smaller batches
|
2025-08-18 17:36:01 -06:00 |
|
Adam Wilson
|
0411049d6b
|
matrix strategy for tests; remove dead code
|
2025-08-18 16:31:32 -06:00 |
|
Adam Wilson
|
010933aa59
|
matrix strategy for tests
|
2025-08-18 16:12:31 -06:00 |
|
Adam Wilson
|
1eadd81d77
|
new test for GH actions
|
2025-08-16 18:57:08 -06:00 |
|
Adam Wilson
|
0171af7c94
|
fix confusing log message
|
2025-07-30 11:16:24 -06:00 |
|
Adam Wilson
|
df14a01fe9
|
log full completion result with semantic similarity comparison results
|
2025-07-28 11:49:07 -06:00 |
|
Adam Wilson
|
2659e6e43c
|
more updates for reflexion
|
2025-07-28 10:31:55 -06:00 |
|
Adam Wilson
|
dcff18a058
|
logging
|
2025-07-27 17:19:07 -06:00 |
|
Adam Wilson
|
a621ad82a9
|
Reflexion guardrails updates
|
2025-07-27 16:39:06 -06:00 |
|
Adam Wilson
|
16ba9c15ee
|
test output for test_02_malicious_prompts
|
2025-07-26 08:22:35 -06:00 |
|
Adam Wilson
|
741629908c
|
updates for RAG + CoT tests
|
2025-07-25 18:11:49 -06:00 |
|
Adam Wilson
|
3a62ecfae8
|
add test 0 results
|
2025-07-25 08:47:56 -06:00 |
|
Adam Wilson
|
ae279a512d
|
log LLM config
|
2025-07-23 20:21:42 -06:00 |
|
Adam Wilson
|
cb92890bb9
|
break tests into separate files; test 0 results
|
2025-07-23 19:06:27 -06:00 |
|