forayer.knowledge_graph.ClusterHelper¶
- class forayer.knowledge_graph.ClusterHelper(data: Optional[Union[List[Set], Dict]] = None)[source]¶
Convenience class for entity clusters.
The
ClusterHelper
class holds a dict mapping entities to the respective cluster_id and a dict with cluster_id mapping to entity sets. Theadd()
andremove()
keep the respective dicts in sync.- clusters: Dict[str,Set[str]]
maps cluster id to entity set
- entities: Dict[str,int]
maps entity id to cluster id
>>> from forayer.knowledge_graph import ClusterHelper >>> ch = ClusterHelper([{"a1", "b1"}, {"a2", "b2"}]) >>> print(ch) {0: {'a1', 'b1'}, 1: {'a2', 'b2'}}
Add an element to a cluster
>>> ch.add_to_cluster(0, "c1") >>> print(ch) {0: {'a1', 'b1', 'c1'}, 1: {'a2', 'b2'}}
Add a new cluster
>>> ch.add({"e2", "f1", "c3"}) >>> print(ch) {0: {'a1', 'b1', 'c1'}, 1: {'a2', 'b2'}, 2: {'f1', 'e2', 'c3'}}
Remove an element from a cluster
>>> ch.remove("b1") >>> print(ch) {0: {'a1', 'c1'}, 1: {'a2', 'b2'}, 2: {'f1', 'e2', 'c3'}}
The __contains__ function is smartly overloaded. You can check if an entity is in the ClusterHelper
>>> "a1" in ch True
If a cluster is present
>>> {"c1","a1"} in ch True
And even if a link exists or not
>>> ("f1","e2") in ch True >>> ("a1","e2") in ch False
To know the cluster id of an entity you can look it up with
>>> ch.elements["a1"] 0
To get members of a cluster either use
>>> ch.members(0) {'a1', 'b1', 'c1'}
or simply
>>> ch[0] {'a1', 'b1', 'c1'}
- __init__(data: Optional[Union[List[Set], Dict]] = None)[source]¶
Initialize a ClusterHelper object with clusters.
- dataUnion[List[Set], Dict]
Clusters either as list of sets, or dict with links as key, value pairs, or dict with cluster id and set of members
- TypeError
if data is not dict or list
- ValueError
For dict[cluster_id,member_set] if overlaps between clusters
Will try to merge clusters transitively if necessary.
Methods
__init__
([data])Initialize a ClusterHelper object with clusters.
add
(new_entry[, c_id])Add a new cluster.
add_link
(e1, e2)Add a new entity link.
add_to_cluster
(c_id, new_entity)Add an entity to a cluster.
all_pairs
([key])Get all entity pairs of a specific cluster or of all clusters.
clone
()Create a clone of this object.
get
(key[, value])Return cluster's element or default value.
info
()Print general information about this object.
links
(key[, always_return_set])Get entities linked to this entity.
members
(key)Get members of a cluster.
merge
(c1, c2[, new_id])Merge two clusters.
remove
(entry)Remove an entity.
remove_cluster
(cluster_id)Remove an entire cluster with the given cluster id.
sample
(n[, seed])Sample n clusters.
Attributes
Return the total number of links.
- add(new_entry: Set, c_id=None)[source]¶
Add a new cluster.
- new_entrySet
New cluster as set
- c_id
Cluster id that will be assigned. If None, the next largest cluster id will be assigned Assuming cluster ids are integers
- ValueError
If entity id already present in other cluster Or if new cluster id cannot be inferred automatically by incrementing
- add_link(e1, e2)[source]¶
Add a new entity link.
Either adds a link to an existing entity or creates a new cluster with both.
- e1
Id of one entity that will be linked
- e2
Id of other entity that will be linked
- c_id
Id of cluster that was created, or of existing cluster that was enhanced Returns False if link already was present
- add_to_cluster(c_id, new_entity)[source]¶
Add an entity to a cluster.
- c_id
Cluster id where entity will be added
- new_entity
Id of new entity
- KeyError
If cluster id unknonw
- ValueError
If entity already belongs to other cluster
- all_pairs(key=None) Iterable[Tuple[Any, Any]] [source]¶
Get all entity pairs of a specific cluster or of all clusters.
- key
Cluster id. If None, provides pairs of all clusters.
- Generator[Tuple[Any, Any]]
Generator that produces the wanted pairs.
- clone() forayer.knowledge_graph.clusters.ClusterHelper [source]¶
Create a clone of this object.
- clone: ClusterHelper
cloned ClusterHelper
- get(key, value=None)[source]¶
Return cluster’s element or default value.
Tries to return the cluster with the cluster id == key. If None is found return provided value.
- key
Searched cluster id.
- value
Default value to return in case id is not present.
- Set
Cluster with provided cluster_id.
- info()[source]¶
Print general information about this object.
- str
information about number of entities and clusters
- links(key: str, always_return_set=False) Union[str, Set[str]] [source]¶
Get entities linked to this entity.
- keystr
entity id
- always_return_set: str
If True, return set even if only one entity is contained
- Union[str, Set[str]]
Either the id of the single linked entity or a set of ids if there is more than one link If always_return_set is True, will always return set
- merge(c1, c2, new_id=None)[source]¶
Merge two clusters.
- c1
Id of one cluster to merge
- c2
Id of other cluster to merge
- new_id
New id of cluster, if None take c1
- ValueError
If cluster id(s) do not exist
- property number_of_links¶
Return the total number of links.