Yahoo Search Búsqueda en la Web

Resultado de búsqueda

  1. 11 de jun. de 2024 · The Needle-In-A-Haystack (NIAH) test is a classic method in natural language processing used to evaluate the ability to understand long context. The vanilla NIAH benchmark introduces a retrieval task where the model is required to retrieve short text (needle) from a long document (haystack).

  2. 11 de jun. de 2024 · In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

  3. 24 de jun. de 2024 · Our evaluation setup involves the following key components: (a) Needle Sub-Image: The needle sub-image to be retrieved based on the given caption. (b) Haystack Image Inputs: The long-context visual inputs consist of M images, each stitched from N $\times$ N sub-images.

  4. 17 de jun. de 2024 · In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

  5. 14 de jun. de 2024 · The proposed benchmark includes 20 diverse tasks, ranging from simple "needle in a haystack" scenarios with distractor facts to more complex tasks that require counting, logical reasoning, or spatial reasoning. The Figure 6 evaluates the complexity of the base short versions of these tasks.

  6. 11 de jun. de 2024 · In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

  7. 11 de jun. de 2024 · Yuchen Duan. Show all 16 authors. Preprints and early-stage research may not have been peer reviewed yet. References (79) Figures (2) Abstract and Figures. With the rapid advancement of...