forayer.knowledge_graph.ERTask

class forayer.knowledge_graph.ERTask(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]

Class to model entity resolution task on knowledge graphs.

kgs_dict: Dict[str, KG]

dictionary of given KGs, with KG names as keys KGs without names have their list index as key

clusters: ClusterHelper

known entity clusters

__init__(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]

Initialize an ERTask object.

kgsUnion[Dict[str,KG],List[KG]]

list or dict of KGs that are to be integrated

clustersClusterHelper

known entity clusters

Methods

__init__(kgs[, clusters])

Initialize an ERTask object.

all_entities([ignore_only_relational])

Return all entities.

clone()

Create a clone of this object.

inverse_attr_dict()

Create an attributes dictionary with unique attribute values as key.

sample(n[, seed, unmatched])

Create a sample of the ERTask.

without_match()

Return ids of entities without matches in given gold standard.

Attributes

entity_ids

Return entity ids of all knowledge graphs.

all_entities(ignore_only_relational: bool = False) Dict[str, Dict][source]

Return all entities.

ignore_only_relationalbool

If True, ignores entities that only show up in the relations (and not in the entities with attributes)

Dict[str, Dict]

all entities

clone() forayer.knowledge_graph.er_task.ERTask[source]

Create a clone of this object.

clone: ERTask

cloned ERTask

property entity_ids: Set[str]

Return entity ids of all knowledge graphs.

Set[str]

Entity ids of all knowledge graphs as set.

inverse_attr_dict() Dict[Any, Dict[str, str]][source]

Create an attributes dictionary with unique attribute values as key.

Dict[Any, Dict[str,str]]

inverse attribute dict

sample(n: int, seed: Optional[Union[int, random.Random]] = None, unmatched: Optional[int] = None)[source]

Create a sample of the ERTask.

Takes n clusters and creates the respective subgraphs. If unmatched is provided adds a number of entities without match to the subgraphs.

nint

Number of clusters.

seedUnion[int, random.Random]

Seed for randomness or seeded random.Random object. Default is None.

unmatchedint

Number of unmatched entities to include. Default is None.

ERTask

downsampled ERTask

>>> from forayer.datasets import OpenEADataset
>>> ds = OpenEADataset(ds_pair="D_W",size="15K",version=1)
>>> ds.er_task.sample(n=10,unmatched=20)
ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 89),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 102)},ClusterHelper(# elements:20, # clusters:10))

You can use a seed to control reproducibility

>>> ds.er_task.sample(n=10,seed=13,unmatched=20)
ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 93),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 179)},ClusterHelper(# elements:20, # clusters:10))
ValueError

if self.clusters is None

without_match()[source]

Return ids of entities without matches in given gold standard.