forayer.knowledge_graph.ClusterHelper

class forayer.knowledge_graph.ClusterHelper(data: Optional[Union[List[Set], Dict]] = None)[source]

Convenience class for entity clusters.

The ClusterHelper class holds a dict mapping entities to the respective cluster_id and a dict with cluster_id mapping to entity sets. The add() and remove() keep the respective dicts in sync.

clusters: Dict[str,Set[str]]

maps cluster id to entity set

entities: Dict[str,int]

maps entity id to cluster id

>>> from forayer.knowledge_graph import ClusterHelper
>>> ch = ClusterHelper([{"a1", "b1"}, {"a2", "b2"}])
>>> print(ch)
{0: {'a1', 'b1'}, 1: {'a2', 'b2'}}

Add an element to a cluster

>>> ch.add_to_cluster(0, "c1")
>>> print(ch)
{0: {'a1', 'b1', 'c1'}, 1: {'a2', 'b2'}}

Add a new cluster

>>> ch.add({"e2", "f1", "c3"})
>>> print(ch)
{0: {'a1', 'b1', 'c1'}, 1: {'a2', 'b2'}, 2: {'f1', 'e2', 'c3'}}

Remove an element from a cluster

>>> ch.remove("b1")
>>> print(ch)
{0: {'a1', 'c1'}, 1: {'a2', 'b2'}, 2: {'f1', 'e2', 'c3'}}

The __contains__ function is smartly overloaded. You can check if an entity is in the ClusterHelper

>>> "a1" in ch
True

If a cluster is present

>>> {"c1","a1"} in ch
True

And even if a link exists or not

>>> ("f1","e2") in ch
True
>>> ("a1","e2") in ch
False

To know the cluster id of an entity you can look it up with

>>> ch.elements["a1"]
0

To get members of a cluster either use

>>> ch.members(0)
{'a1', 'b1', 'c1'}

or simply

>>> ch[0]
{'a1', 'b1', 'c1'}
__init__(data: Optional[Union[List[Set], Dict]] = None)[source]

Initialize a ClusterHelper object with clusters.

dataUnion[List[Set], Dict]

Clusters either as list of sets, or dict with links as key, value pairs, or dict with cluster id and set of members

TypeError

if data is not dict or list

ValueError

For dict[cluster_id,member_set] if overlaps between clusters

Will try to merge clusters transitively if necessary.

Methods

__init__([data])

Initialize a ClusterHelper object with clusters.

add(new_entry[, c_id])

Add a new cluster.

add_link(e1, e2)

Add a new entity link.

add_to_cluster(c_id, new_entity)

Add an entity to a cluster.

all_pairs([key])

Get all entity pairs of a specific cluster or of all clusters.

clone()

Create a clone of this object.

get(key[, value])

Return cluster's element or default value.

info()

Print general information about this object.

links(key[, always_return_set])

Get entities linked to this entity.

members(key)

Get members of a cluster.

merge(c1, c2[, new_id])

Merge two clusters.

remove(entry)

Remove an entity.

remove_cluster(cluster_id)

Remove an entire cluster with the given cluster id.

sample(n[, seed])

Sample n clusters.

Attributes

number_of_links

Return the total number of links.

add(new_entry: Set, c_id=None)[source]

Add a new cluster.

new_entrySet

New cluster as set

c_id

Cluster id that will be assigned. If None, the next largest cluster id will be assigned Assuming cluster ids are integers

ValueError

If entity id already present in other cluster Or if new cluster id cannot be inferred automatically by incrementing

Add a new entity link.

Either adds a link to an existing entity or creates a new cluster with both.

e1

Id of one entity that will be linked

e2

Id of other entity that will be linked

c_id

Id of cluster that was created, or of existing cluster that was enhanced Returns False if link already was present

add_to_cluster(c_id, new_entity)[source]

Add an entity to a cluster.

c_id

Cluster id where entity will be added

new_entity

Id of new entity

KeyError

If cluster id unknonw

ValueError

If entity already belongs to other cluster

all_pairs(key=None) Iterable[Tuple[Any, Any]][source]

Get all entity pairs of a specific cluster or of all clusters.

key

Cluster id. If None, provides pairs of all clusters.

Generator[Tuple[Any, Any]]

Generator that produces the wanted pairs.

clone() forayer.knowledge_graph.clusters.ClusterHelper[source]

Create a clone of this object.

clone: ClusterHelper

cloned ClusterHelper

get(key, value=None)[source]

Return cluster’s element or default value.

Tries to return the cluster with the cluster id == key. If None is found return provided value.

key

Searched cluster id.

value

Default value to return in case id is not present.

Set

Cluster with provided cluster_id.

info()[source]

Print general information about this object.

str

information about number of entities and clusters

Get entities linked to this entity.

keystr

entity id

always_return_set: str

If True, return set even if only one entity is contained

Union[str, Set[str]]

Either the id of the single linked entity or a set of ids if there is more than one link If always_return_set is True, will always return set

members(key) Set[source]

Get members of a cluster.

key

cluster id

Set

cluster members

merge(c1, c2, new_id=None)[source]

Merge two clusters.

c1

Id of one cluster to merge

c2

Id of other cluster to merge

new_id

New id of cluster, if None take c1

ValueError

If cluster id(s) do not exist

Return the total number of links.

remove(entry: str)[source]

Remove an entity.

entrystr

entity to remove

remove_cluster(cluster_id)[source]

Remove an entire cluster with the given cluster id.

cluster_id

id of the cluster to remove

sample(n: int, seed: Optional[Union[int, random.Random]] = None)[source]

Sample n clusters.

nint

Number of clusters to return.

seedUnion[int, random.Random]

Seed for randomness or seeded random.Random object. Default is None.

ClusterHelper

ClusterHelper with n clusters.