Data for paper: "Evaluating Resource-Lean Cross-Lingual Embedding Models in Unsupervised Retrieval"

DOI

Cross-lingual embeddings (CLE) allow for cross-lingual natural language processing and information retrieval. Recently, a wide variety of resource-lean projection-based models for inducing CLEs appeared, requiring limited or no bilingual supervision. Despite potential usefulness in downstream IR and NLP tasks, these CLE models have almost exclusively been evaluated on word translation tasks. In this work, we provide a comprehensive comparative evaluation of projection-based CLE models for both sentence-level and document-level Cross-lingual Information Retrieval (CLIR). We hope our work serves as a guideline for choosing the right model for CLIR practitioners.

Identifier
DOI https://doi.org/10.7801/360
Metadata Access https://api.datacite.org/dois/10.7801/360
Provenance
Creator Litschko, Robert; Glavaš, Goran
Publisher Mannheim University Library
Publication Year 2021
OpenAccess true
Representation
Resource Type Dataset
Format application/gzip
Size 35719773; 3944832662; 4010219682; 4045993164; 4214256917; 3125193783; 4010000992; 4064462499
Version 1
Discipline Social Sciences