Models are ranked according to their average performance on understanding, generation, and unify tasks, from highest to lowest. "SIPU", "MTITU", "VPU", "CVIG", "FIR", "TIE", "TIG", "TVG", "VP", "IEE", "CSQ", "AL", "SD", "VCoT" each indicate a specific task domain. "Avg" indicates the average accuracy across subtasks in each domain. "-" indicates that the model is unable to finish the corresponding task.
By default, this leaderboard is sorted by results with Overall. To view other sorted results, please click on the corresponding cell.
# | Method | LLM | Date | Overall | Understanding | Generation | Unify | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Task Split | Avg | SIPU | MTITU | VPU | Avg | CVIG | FIR | TIE | TIG | TVG | VP | Avg | IEE | CSQ | AL | SD | VCoT | Avg | |||||
QA pairs | 1964 | 1200 | 400 | 364 | 1964 | 600 | 200 | 200 | 200 | 200 | 194 | 1594 | 200 | 100 | 52 | 104 | 90 | 546 | |||||
Gemini2.0-flash-exp
Google DeepMind |
- | 2025-03-12 | 45.57 | 72.58 | 68.25 | 54.90 | 65.24 | - | 77.61 | 43.54 | 57.56 | - | - | 29.79 | 38.42 | 74.75 | 47.12 | 26.00 | 12.41 | 40.74 | |||
MIO-Instruct
Beihang University |
MIO-7B | 2024-09-26 | 37.17 | 52.00 | 33.50 | 39.01 | 41.50 | 51.24 | 59.29 | 43.66 | 48.23 | 51.88 | 66.37 | 53.45 | 24.16 | 38.50 | 8.66 | 11.50 | 0 | 16.56 | |||
SEED-LLaMA
Tencent AI Lab |
LLaMA2-Chat-13B | 2023-12-18 | 28.45 | 49.17 | 33.00 | 36.26 | 39.48 | - | 57.00 | 42.26 | 41.96 | - | - | 23.54 | 22.00 | 51.49 | 12.50 | 22.00 | 3.61 | 22.32 | |||
Anole
GAIR |
- | 2024-07-08 | 18.59 | 17.17 | 14.50 | 9.00 | 13.56 | - | 36.64 | 43.42 | 41.52 | - | - | 19.91 | 18.55 | 59.65 | 14.42 | 15.00 | 3.89 | 22.30 | |||
VILA-U
Tsinghua University |
LLama-7B | 2024-09-06 | 18.58 | 51.04 | 32.25 | 36.54 | 39.95 | - | - | - | 45.10 | 49.64 | - | 15.79 | - | - | - | - | - | - | |||
Janus-Pro
DeepSeek-AI |
DeepSeek-LLM-7B-base | 2025-01-29 | 18.10 | 59.56 | 43.50 | 42.22 | 48.43 | - | - | - | 35.29 | - | - | 5.88 | - | - | - | - | - | - | |||
MiniGPT-5
University of California |
Vicuna-7B | 2023-10-03 | 16.43 | 19.25 | 10.92 | 15.93 | 15.37 | - | 38.96 | 35.04 | 35.48 | - | - | 18.25 | 22.80 | 34.13 | 14.37 | 5.00 | 2.08 | 15.67 | |||
Janus-Flow
DeepSeek-AI |
DeepSeek-LLM-1.5B-base | 2024-11-12 | 16.31 | 41.49 | 32.00 | 35.16 | 43.44 | - | - | - | 32.88 | - | - | 5.48 | - | - | - | - | - | - | |||
GILL
Carnegie Mellon University |
OPT-6-7B | 2023-03-26 | 15.10 | 22.18 | 6.00 | 3.56 | 10.58 | - | 50.67 | 35.71 | 46.60 | - | - | 22.16 | 24.25 | 21.29 | 8.66 | 6.67 | 1.90 | 12.55 | |||
HermersFlow
Peking University |
Phi-1.5 | 2025-2-17 | 14.01 | 41.49 | 33.00 | 28.32 | 34.27 | - | - | - | 46.48 | - | - | 7.75 | - | - | - | - | - | - | |||
Emu3
BAAI |
LLama-8B | 2024-09-27 | 13.79 | 45.75 | 30.50 | 23.32 | 33.19 | - | - | - | 49.08 | - | - | 8.18 | - | - | - | - | - | - | |||
Show-o
Show Lab |
Phi-1.5 | 2024-8-22 | 12.74 | 32.47 | 34.75 | 25.66 | 30.96 | - | - | 43.54 | - | - | - | 7.26 | - | - | - | - | - | - |