forayer.datasets.OpenEADataset¶
- class forayer.datasets.OpenEADataset(ds_pair: str = 'D_W', size: str = '15K', version: int = 1, force: bool = False)[source]¶
The OpenEA datasets contain entity resolution tasks with samples from popular knowledge graphs.
Several different tasks are available with snippets from DBpedia, Wikidata and YAGO. Different sizes refer to the number of entities in the respective graphs (15K or 100K). For each setting two versions are available, where version 1 has lower connectivity in the graph compared to version 2.
More information can be found at the respective github repository and the benchmark publication: Sun et al (2020) A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs, VLDB <http://www.vldb.org/pvldb/vol13/p2326-sun.pdf>
- __init__(ds_pair: str = 'D_W', size: str = '15K', version: int = 1, force: bool = False)[source]¶
Initialize an OpenEA dataset pair.
- ds_pairstr
name of ds pair (either “D_W” or “D_Y”)
- sizestr
size of the task (either “15K” or “100K”)
- versionint
version of task (either 1 or 2)
- forcebool
if true ignores cache
Methods
__init__([ds_pair, size, version, force])Initialize an OpenEA dataset pair.
Load the ERtask via self._load() or from cache.
- load_er_task()¶
Load the ERtask via self._load() or from cache.
