forayer.knowledge_graph.ERTask¶
- class forayer.knowledge_graph.ERTask(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]¶
Class to model entity resolution task on knowledge graphs.
- kgs_dict: Dict[str, KG]
dictionary of given KGs, with KG names as keys KGs without names have their list index as key
- clusters: ClusterHelper
known entity clusters
- __init__(kgs: Union[Dict[str, KG], List[KG]], clusters: ClusterHelper = None)[source]¶
Initialize an ERTask object.
- kgsUnion[Dict[str,KG],List[KG]]
list or dict of KGs that are to be integrated
- clustersClusterHelper
known entity clusters
Methods
__init__
(kgs[, clusters])Initialize an ERTask object.
all_entities
([ignore_only_relational])Return all entities.
clone
()Create a clone of this object.
Create an attributes dictionary with unique attribute values as key.
sample
(n[, seed, unmatched])Create a sample of the ERTask.
Return ids of entities without matches in given gold standard.
Attributes
Return entity ids of all knowledge graphs.
- all_entities(ignore_only_relational: bool = False) Dict[str, Dict] [source]¶
Return all entities.
- ignore_only_relationalbool
If True, ignores entities that only show up in the relations (and not in the entities with attributes)
- Dict[str, Dict]
all entities
- clone() forayer.knowledge_graph.er_task.ERTask [source]¶
Create a clone of this object.
- clone: ERTask
cloned ERTask
- property entity_ids: Set[str]¶
Return entity ids of all knowledge graphs.
- Set[str]
Entity ids of all knowledge graphs as set.
- inverse_attr_dict() Dict[Any, Dict[str, str]] [source]¶
Create an attributes dictionary with unique attribute values as key.
- Dict[Any, Dict[str,str]]
inverse attribute dict
- sample(n: int, seed: Optional[Union[int, random.Random]] = None, unmatched: Optional[int] = None)[source]¶
Create a sample of the ERTask.
Takes n clusters and creates the respective subgraphs. If unmatched is provided adds a number of entities without match to the subgraphs.
- nint
Number of clusters.
- seedUnion[int, random.Random]
Seed for randomness or seeded random.Random object. Default is None.
- unmatchedint
Number of unmatched entities to include. Default is None.
- ERTask
downsampled ERTask
>>> from forayer.datasets import OpenEADataset >>> ds = OpenEADataset(ds_pair="D_W",size="15K",version=1) >>> ds.er_task.sample(n=10,unmatched=20) ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 89),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 102)},ClusterHelper(# elements:20, # clusters:10))
You can use a seed to control reproducibility
>>> ds.er_task.sample(n=10,seed=13,unmatched=20) ERTask({DBpedia: (# entities: 26, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 26, # attributes: 26, # attr_values: 93),Wikidata: (# entities: 14, # entities_with_rel: 0, # rel: 0, # entities_with_attributes: 14, # attributes: 14, # attr_values: 179)},ClusterHelper(# elements:20, # clusters:10))
- ValueError
if self.clusters is None