hansel package

Submodules

hansel.hansel module

class hansel.hansel.Hansel[source]

Bases: numpy.ndarray

API to a numpy-backed graph-inspired data structure for determining likely sequences of symbols from breadcrumbs of evidence.

Given a numpy array and a list of permitted symbols, Hansel presents a user friendly API to store and operate on a graph-like data structure that can be used to investigate likely sequences of symbols based on observations of pairwise co-occurrence of those symbols in a dimension such as space or time.

Parameters:
  • input_arr (4D numpy array) – A numpy array (typically initialised with zeros) of size (A, A, B+2, B+2), where A is the number of symbols + unsymbols and B are the points in time or space on which pairwise observations between symbols can be observed.
  • symbols (list{str}) – A list of permitted states or symbols as strings.
  • unsymbols (list{str}) – A list of permitted states that may represent the known absence of a symbol.
Variables:
  • n_slices(sources) (int) – The number of distinct sources from which observations were received.
  • n_crumbs(observations) (int) – The number of pairwise observations that were observed.
  • symbols (list{str}) – The list of permitted states or symbols.
  • unsymbols (list{str}) – A list of states that represent an absence of a symbol.
  • is_weighted (boolean) – Whether or not the underlying numpy array has been modified by the reweight_observation function at least once.
  • L (int, optional(default=1)) – The number of positions back from the current position (inclusive) to consider when calculating conditional probabilities.
add_observation(symbol_from, symbol_to, pos_from, pos_to)[source]

Add a pairwise observation to the data structure.

Parameters:
  • symbol_from (str) – The first observed symbol of the pair (in space or time).
  • symbol_to (str) – The second observed symbol of the pair (in space or time).
  • pos_from (int) – The “position” at which symbol_from was observed.
  • pos_to (int) – The “position” at which symbol_to was observed.
get_conditional_of_at(symbol_from, symbol_to, pos_from, pos_to)[source]

Given a symbol and position, calculate the conditional for co-occurrence with another positioned symbol.

Parameters:
  • symbol_from (str) – The first (given) symbol of the pair on which to condition.
  • symbol_to (str) – The second (predicted) symbol of the pair on which to condition.
  • pos_from (int) – The “position” at which the given symbol_from was observed.
  • pos_to (int) – The “position” at which the predicted symbol_to was observed.
Returns:

Conditional probability – The conditional probability of symbol_from occurring at pos_from given observation of a predicted symbol_to at pos_to.

Return type:

float

get_counts_at(at_pos)[source]

Get the counts for each symbol that appears at a given position.

Parameters:at_pos (int) – The “position” for which to return the number of occurrences of each symbol.
Returns:Symbol counts – A dictionary whose keys are each of the symbols that were observed at position at_pos and a special “total” key. The values are the number of observations of that symbol at at_pos. The “total” is the sum of all observation counts.
Return type:dict{str, float}
get_edge_weights_at(symbol_pos, current_path)[source]

Get the outgoing weighted edges at some position, given a path to that position.

Parameters:
  • symbol_pos (int) – The index of the current position.
  • current_path (list{str}) – A list of symbols representing the path of selected symbols that led to the current position, symbol_pos.
Returns:

Conditional distribution – A dictionary whose keys are each of the possible symbols that may be reached from the current position, given the observed path. The values are log10 conditional probabilities of the next symbol in the path (or sequence) being that of the key.

Return type:

dict{str, float}

get_marginal_of_at(of_symbol, at_symbol)[source]

Get the marginal distribution of a symbol appearing at a position.

Parameters:
  • of_symbol (str) – The symbol for which to calculate the marginal distribution.
  • at_symbol (int) – The position at which to calculate the marginal distribution.
Returns:

Marginal probability – The probability a random symbol drawn from all observations at at_symbol being equal to of_symbol. Alternatively, the proportion of all symbols observed at at_symbol being equal to of_symbol.

Return type:

float

get_observation(symbol_from, symbol_to, pos_from, pos_to)[source]

Get the number of co-occurrences of a pair of positioned symbols.

Parameters:
  • symbol_from (str) – The first observed symbol of the pair (in space or time).
  • symbol_to (str) – The second observed symbol of the pair (in space or time).
  • pos_from (int) – The “position” at which symbol_from was observed.
  • pos_to (int) – The “position” at which symbol_to was observed.
Returns:

Number of observations – The number of times symbol_from was observed at pos_from with symbol_to at pos_to across all sources (slices).

Note

It is possible for the number of observations returned to be a float if hansel.hansel.Hansel.is_weighted is True.

Return type:

float

get_spanning_support(symbol_to, pos_from, pos_to)[source]

Get the number of observations that span over two positions of interest, that also feature some symbol.

Parameters:
  • symbol_to (str) – The symbol that should appear at pos_to.
  • pos_from (int) – A position that appears “before” pos_to in space or time that must be overlapped by a source to be counted. The symbol at pos_from is not relevant.
  • pos_to (int) – The second position a source must overlap (but not necessarily terminate at) that must be the symbol symbol_to.
Returns:

Number of observations – The number of observations yielded from sources that overlap both pos_from and pos_to, that also feature symbol_to at pos_to.

Note

It is possible for the number of observations returned to be a float if hansel.hansel.Hansel.is_weighted is True.

Return type:

float

observations

An alias for n_crumbs

reweight_observation(symbol_from, symbol_to, pos_from, pos_to, ratio)[source]

Alter the number of co-occurrences between a pair of positioned symbols by some ratio.

Note

This function will set hansel.hansel.Hansel.is_weighted to True.

Parameters:
  • symbol_from (str) – The first observed symbol of the pair to be reweighted (in space or time).
  • symbol_to (str) – The second observed symbol of the pair to be reweighted (in space or time).
  • pos_from (int) – The “position” at which symbol_from was observed.
  • pos_to (int) – The “position” at which symbol_to was observed.
  • ratio (float) – The ratio by which to subtract the current number of observations. That is, new_value = old_value - (ratio * old_value).
sources

An alias for n_slices

Module contents