Tools

commonnexus provides a couple of tools, implementing common operations on NEXUS objects. These tools are often functions operating on (a) commonnexus.Nexus instance(s), and returning a new of modified commonnexus.Nexus object.

combine

Combine data from multiple NEXUS files and put it in a new one.

The following blocks can be handled:

  • TAXA: Taxa are identified across NEXUS files based on label (not number).

  • CHARACTERS/DATA: Characters are aggregated across NEXUS files (with character labels prefixed, for disambiguation).

  • TREES: Trees are (translated and) aggregated across NEXUS files.

commonnexus.tools.combine.combine(*nexus, **kw)[source]
Parameters:

nexus (commonnexus.nexus.Nexus) – Nexus objects to be combined.

Return type:

commonnexus.nexus.Nexus

Returns:

A new Nexus object with the combined data.

normalise

Normalise a NEXUS file.

Normalisation includes

  • converting CHARACTERS/DATA matrices to non-transposed, non-interleaved representation with taxon labels (and resolved EQUATEs), extracting taxon labels into a TAXA block;

  • converting a DISTANCES matrix to non-interleaved matrices with diagonal and both triangles and taxon labels;

  • translating all TREEs in a TREES block (such that the TRANSLATE command becomes superfluous).

In addition, after normalisation, the following assumptions hold:

  • All commands start on a new line.

  • All command names (not block names) are in uppercase with no “in-name-comment”, like “MA[c]TRiX”

  • The “;” terminating MATRIX commands is on a separate line, allowing more simplistic parsing of matrix rows.

commonnexus.tools.normalise.normalise(nexus, data_to_characters=False, strip_comments=False, remove_taxa=None, rename_taxa=None)[source]

Normalise a Nexus object as described above.

Parameters:
  • nexus (commonnexus.nexus.Nexus) – A Nexus object to be normalised in-place.

  • data_to_characters (bool) – Flag signaling whether DATA blocks should be converted to CHARACTER blocks.

  • strip_comments (bool) – Flag signaling whether to remove all non-command comments.

  • remove_taxa (typing.Optional[typing.Container[str]]) – Container of taxon labels specifying taxa to remove from relevant blocks.

  • rename_taxa (typing.Union[typing.Callable[[str], str], typing.Dict[str, str], None]) – Specification of taxa to rename; either a dict, mapping old names to new names, or a callable, accepting the old name as sole argument and returning the new name.

Return type:

commonnexus.nexus.Nexus

Returns:

The modified Nexus object.

Warning

remove_taxa and rename_taxa only operate on TAXA, CHARACTERS/DATA, DISTANCES and TREES blocks. Thus, normalisation may result in an inconsistent NEXUS file, if the file contains other blocks which reference taxa (e.g. NOTES).

>>> from commonnexus import Nexus
>>> from commonnexus.tools import normalise
>>> print(normalise(Nexus('''#NEXUS
... BEGIN CHARACTERS;
... DIMENSIONS NCHAR=3;
... FORMAT DATATYPE=STANDARD MISSING=x GAP=- SYMBOLS="01" INTERLEAVE;
... MATRIX
...     t1 10
...     t2    01
...     t3    00
...     t1 0
...     t2 0
...     t3 1;
... END;
... BEGIN DISTANCES;
... DIMENSIONS NTAX=3;
... FORMAT NODIAGONAL MISSING=?;
... MATRIX
...     t1
...     t2    1.0
...     t3    2.0 3.0;
... END;
... BEGIN TREES;
... TRANSLATE a t1, b t2, c t3;
... TREE 1 = (a,b,c);
... END;''')))
#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX=3;
TAXLABELS t1 t2 t3;
END;
BEGIN CHARACTERS;
DIMENSIONS NCHAR=3;
FORMAT DATATYPE=STANDARD MISSING=? GAP=- SYMBOLS="01";
MATRIX
t1 100
t2 010
t3 001
;
END;
BEGIN DISTANCES;
DIMENSIONS NTAX=3;
FORMAT TRIANGLE=BOTH MISSING=?;
MATRIX
t1 0 1.0 2.0
t2 1.0 0 3.0
t3 2.0 3.0 0
;
END;
BEGIN TREES;
TREE 1 = (t1,t2,t3);
END;

matrix

CHARACTERS matrices are arguably the most complex objects in NEXUS files. Thus, manipulations of such matrices is implemented in a separate module.

Tools to manipulate matrices as returned by commonnexus.blocks.characters.Characters.get_matrix().

class commonnexus.tools.matrix.CharacterMatrix[source]

A wrapper for the nested ordered dicts returned by commonnexus.blocks.characters.Characters.get_matrix(), providing simpler access to some properties of the data and some conversion functionality.

iter_rows()[source]

Iterate lists of states per taxon.

Return type:

typing.Generator[typing.List[typing.Union[None, str, typing.Set[str], typing.Tuple[str]]], None, None]

iter_columns()[source]

Iterate lists of states per character.

Return type:

typing.Generator[typing.List[typing.Union[None, str, typing.Set[str], typing.Tuple[str]]], None, None]

property taxa: List[str]

The list of taxa (labels or numbers) in a matrix.

property characters: List[str]

The list of characters (labels or numbers) in a matrix.

property distinct_states: Set[None | str | FrozenSet[str] | Tuple[str]]

The set of distinct states in a matrix (including missing and gap, if found).

property symbols: Set[str | FrozenSet[str]]

The set of state symbols, excluding missing and gapped.

classmethod multistatised(matrix, multicharlabel=None)[source]

Convert character data of the form 0010000 to a single multi-state character. This kind of data may be obtained from coding wordlist data as “word belongs to cognate set” vectors.

If 26..52 characters are given, RESPECTCASE is added to FORMAT, and A-Za-z is used as symbol set.

Parameters:
  • matrix (typing.OrderedDict[str, typing.OrderedDict[str, typing.Union[None, str, typing.Set[str], typing.Tuple[str]]]]) –

  • multicharlabel (typing.Optional[str]) –

Return type:

commonnexus.tools.matrix.CharacterMatrix

classmethod from_characters(matrix, drop_chars=None, inverse=False, drop_uncertain=False, drop_polymorphic=False, drop_missing=False, drop_gapped=False, drop_constant=False)[source]
Parameters:
  • chars

  • inverse (bool) –

  • matrix (typing.OrderedDict[str, typing.OrderedDict[str, typing.Union[None, str, typing.Set[str], typing.Tuple[str]]]]) –

  • drop_chars (typing.Optional[typing.Iterable[str]]) –

  • drop_uncertain (bool) –

  • drop_polymorphic (bool) –

  • drop_missing (bool) –

  • drop_gapped (bool) –

  • drop_constant (bool) –

Return type:

commonnexus.tools.matrix.CharacterMatrix

Returns:

A new matrix constructed as copy, omitting specified characters.

to_fasta()[source]
Return type:

str

Returns:

The character matrix serialized in the FASTA format

classmethod from_fasta(fasta)[source]
>>> from commonnexus import Nexus
>>> from commonnexus.blocks import Data
>>> from commonnexus.tools.matrix import CharacterMatrix
>>> print(Nexus.from_blocks(Data.from_data(CharacterMatrix.from_fasta(
...     '> t1\nABA BAA\n> t2\nBAB ABA'))))
#NEXUS
BEGIN DATA;
    DIMENSIONS NCHAR=6;
    FORMAT DATATYPE=STANDARD MISSING=? GAP=- SYMBOLS="AB";
    MATRIX
    t1 ABABAA
    t2 BABABA;
END;
Parameters:

fasta (str) –

Return type:

commonnexus.tools.matrix.CharacterMatrix