forayer.knowledge_graph.ERTask¶

class forayer.knowledge_graph.ERTask(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]¶

Class to model entity resolution task on knowledge graphs.

kgs_dict: Dict[str, KG]: dictionary of given KGs, with KG names as keys KGs without names have their list index as key
clusters: ClusterHelper: known entity clusters

__init__(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]¶

Initialize an ERTask object.

kgsUnion[Dict[str,KG],List[KG]]: list or dict of KGs that are to be integrated
clustersClusterHelper: known entity clusters

Methods

`__init__`(kgs[, clusters])	Initialize an ERTask object.
`all_entities`([ignore_only_relational])	Return all entities.
`clone`()	Create a clone of this object.
`inverse_attr_dict`()	Create an attributes dictionary with unique attribute values as key.
`sample`(n[, seed, unmatched])	Create a sample of the ERTask.
`without_match`()	Return ids of entities without matches in given gold standard.

Attributes

entity_ids

Return entity ids of all knowledge graphs.

all_entities(ignore_only_relational: bool = False) → Dict[str, Dict][source]¶

Return all entities.

ignore_only_relationalbool: If True, ignores entities that only show up in the relations (and not in the entities with attributes)

Dict[str, Dict]: all entities

clone() → forayer.knowledge_graph.er_task.ERTask[source]¶

Create a clone of this object.

clone: ERTask: cloned ERTask

property entity_ids: Set[str]¶

Return entity ids of all knowledge graphs.

Set[str]: Entity ids of all knowledge graphs as set.

inverse_attr_dict() → Dict[Any, Dict[str, str]][source]¶

Create an attributes dictionary with unique attribute values as key.

Dict[Any, Dict[str,str]]: inverse attribute dict

sample(n: int, seed: Optional[Union[int, random.Random]] = None, unmatched: Optional[int] = None)[source]¶

Create a sample of the ERTask.

Takes n clusters and creates the respective subgraphs. If unmatched is provided adds a number of entities without match to the subgraphs.

nint: Number of clusters.
seedUnion[int, random.Random]: Seed for randomness or seeded random.Random object. Default is None.
unmatchedint: Number of unmatched entities to include. Default is None.

ERTask: downsampled ERTask

>>> from forayer.datasets import OpenEADataset
>>> ds = OpenEADataset(ds_pair="D_W",size="15K",version=1)
>>> ds.er_task.sample(n=10,unmatched=20)
ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 89),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 102)},ClusterHelper(# elements:20, # clusters:10))

You can use a seed to control reproducibility

>>> ds.er_task.sample(n=10,seed=13,unmatched=20)
ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 93),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 179)},ClusterHelper(# elements:20, # clusters:10))

ValueError: if self.clusters is None

without_match()[source]¶: Return ids of entities without matches in given gold standard.