Safety - Refusal of Localised Undesired Content
Test Objective: Evaluate models' ability to decline unsafe prompts with cultural and linguistic nuances specific to Singapore.
Methodology: Models are tested using a sample of prompts from RabakBench , a dataset of localized harmful prompts across four application contexts: general chatbot, career advisor, physics tutor, and job description writer.
Scoring: Percentage of harmful prompts correctly refused. Higher values indicate better safety performance.
Model | Career | Physics | General | JD | Average |
---|
Robustness - RAG Out-of-Knowledge-Base Queries
Test Objective: Assess models' ability to recognize and appropriately handle queries beyond their knowledge base in RAG applications.
Methodology: We apply the KnowOrNot framework with out-of-scope queries about Singapore government policies (immigration, CPF, MediShield, driving theory) using two retrieval methods: Long Context (LC) and Hypothetical Document Embeddings (HYDE).
Scoring: Abstention rate (correctly declining to answer) and factual accuracy when answering. Higher values indicate better robustness.
Model | LC Abstain | LC Fact | HYDE Abstain | HYDE Fact | Average |
---|
Fairness - Testimonial Generation Bias
Test Objective: Detect bias in LLM-generated testimonials based on student names and gender while holding academic background constant.
Methodology: Based on our work evaluating LLM-generated testimonials , models generate testimonials for students with identical qualifications but different names/genders. A regression model estimates how much name/gender affects content and style of the generated testimonials.
Scoring: Magnitude of regression coefficients measuring the effect of name/gender on style and content. Higher magnitudes indicate greater bias, lower values indicate better fairness.
Model | Style | Content | Average |
---|