Abstract: This paper introduces the human-curated Pandas-PlotBench dataset, designed to evaluate language models’ effectiveness as assistants in visual data exploration. Our benchmark focuses on ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results