This collection contains held-out splits for testing Flow-Judge-v0.1.
Flow AI
company
Verified
AI & ML interests
LLM system evaluation, Automatic LM improvements
Organization Card
Flow AI is the system for evaluating and improving your LLM application.
Collections
3
datasets
9
flowaicom/legalbench_contracts_qa_subset
Viewer
•
Updated
•
100
•
51
flowaicom/Flow-Judge-v0.1-3-likert-heldout
Viewer
•
Updated
•
300
flowaicom/Flow-Judge-v0.1-5-likert-heldout
Viewer
•
Updated
•
274
•
3
flowaicom/Flow-Judge-v0.1-binary-heldout
Viewer
•
Updated
•
316
flowaicom/RAGTruth_test
Viewer
•
Updated
•
2.7k
flowaicom/covid_qa
Viewer
•
Updated
•
1k
flowaicom/PubMedQA
Viewer
•
Updated
•
1k
flowaicom/HaluEval
Viewer
•
Updated
•
10k
flowaicom/Feedback-Bench
Viewer
•
Updated
•
1k