forayer.datasets.OpenEADataset

class forayer.datasets.OpenEADataset(ds_pair: str = 'D_W', size: str = '15K', version: int = 1, force: bool = False)[source]

The OpenEA datasets contain entity resolution tasks with samples from popular knowledge graphs.

Several different tasks are available with snippets from DBpedia, Wikidata and YAGO. Different sizes refer to the number of entities in the respective graphs (15K or 100K). For each setting two versions are available, where version 1 has lower connectivity in the graph compared to version 2.

More information can be found at the respective github repository and the benchmark publication: Sun et al (2020) A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs, VLDB <http://www.vldb.org/pvldb/vol13/p2326-sun.pdf>

__init__(ds_pair: str = 'D_W', size: str = '15K', version: int = 1, force: bool = False)[source]

Initialize an OpenEA dataset pair.

ds_pairstr

name of ds pair (either “D_W” or “D_Y”)

sizestr

size of the task (either “15K” or “100K”)

versionint

version of task (either 1 or 2)

forcebool

if true ignores cache

Methods

__init__([ds_pair, size, version, force])

Initialize an OpenEA dataset pair.

load_er_task()

Load the ERtask via self._load() or from cache.

load_er_task()

Load the ERtask via self._load() or from cache.