NEXUS Files

Reading NEXUS data

Since NEXUS is an Extensible File Format, it’s natural habitat is the file system. Thus, to instantiate a Nexus object, we typically read a file to access NEXUS data:

>>> from commonnexus import Nexus
>>> nex = Nexus.from_file('tests/fixtures/ape_random.trees')
>>> for name in nex.blocks:
...     print(name)
...
TAXA
TREES
class commonnexus.nexus.Config(hyphenminus_is_text=True, asterisk_is_text=True, validate_newick=False, ignore_unsupported=True, encoding='utf8', no_default_matchchar=False, strict=False)[source]

The global behaviour of a Nexus instance can be configured. The available configuration options are set and accessed from an instance of Config.

Parameters:
  • hyphenminus_is_text (bool) –

  • asterisk_is_text (bool) –

  • validate_newick (bool) –

  • ignore_unsupported (bool) –

  • encoding (str) –

  • no_default_matchchar (bool) –

  • strict (bool) –

hyphenminus_is_text: bool = True

Specifies whether “-”, aka ASCII hyphen-minus, is considered punctuation or not.

asterisk_is_text: bool = True

Specifies whether “*”, aka asterisk, is considered punctuation or not.

validate_newick: bool = False

Specifies whether Newick nodes for TREEs are constructed by parsing the Newick string or from the Nexus tokens. The latter is slightly faster but will bypass some input validation.

ignore_unsupported: bool = True

Specifies whether unsupported NEXUS commands/options are ignored or raise an error. Note that the effect of this option may only set in when a block or command is accessed.

encoding: str = 'utf8'

Specifies the text encoding of a NEXUS file.

no_default_matchchar: bool = False

The NEXUS spec does not explicitly state a default value for the MATCHCHAR directive in the FORMAT command of a CHARACTERS block. commonnexus - in agreement with many NEXUS files encountered “in the wild” - assumes a default of “.”. To force no default value for MATCHCHAR, e.g. because matrix data uses “.” as regular state symbol, set no_default_matchchar to True.

strict: bool = False

Sometimes the NEXUS spec is not followed entirely by files found in the wild. If somewhat lax interpretation does not lead to ambiguities, that’s what commonnexus does. To force stricter adherence to the spec, set strict to True.

class commonnexus.nexus.Nexus(s=None, block_implementations=None, config=None, **kw)[source]

A NEXUS object implemented as list of commands with methods to read and write blocks.

From the spec:

The tokens in a NEXUS file are organized into commands, which are in turn organized into blocks.

This is reflected in the Nexus object. The Nexus object is just a list of Commands, and has a property Nexus.blocks() giving access to commands grouped by block:

>>> nex = Nexus('#NEXUS BEGIN myblock; mycmd a b c; END;')
>>> nex[0].__class__
<class 'commonnexus.nexus.Command'>
>>> len(nex.blocks['MYBLOCK'])
1

Note

NEXUS is for the most part case-insensitive. commonnexus reflects this by giving all blocks and commands uppercase names. Thus, even if a command or block has a lowercase or mixed-case name in the file, the corresponding Command or Block object must be addressed using the uppercase name.

Parameters:
  • s (typing.Union[typing.Iterable, typing.List[commonnexus.command.Command], None]) –

  • block_implementations (typing.Optional[typing.Dict[str, commonnexus.blocks.base.Block]]) –

  • config (typing.Optional[commonnexus.nexus.Config]) –

__init__(s=None, block_implementations=None, config=None, **kw)[source]
Parameters:
  • s (typing.Union[typing.Iterable, typing.List[commonnexus.command.Command], None]) – The NEXUS content.

  • block_implementations (typing.Optional[typing.Dict[str, commonnexus.blocks.base.Block]]) – Custom implementations for non-public blocks.

  • config (typing.Optional[commonnexus.nexus.Config]) – Configuration.

  • kw – If no Config object is passed as config, keyword parameters will be interpreted as configuration options. Thus,

>>> nex = Nexus(encoding='latin')

is a shortcut for

>>> nex = Nexus(config=Config(encoding='latin'))
classmethod from_file(p, config=None, **kw)[source]

Instantiate a Nexus object from the contents of a NEXUS file.

Parameters:
  • p (typing.Union[str, pathlib.Path]) – Path of the file.

  • config (typing.Optional[commonnexus.nexus.Config]) – An optional configuration object.

  • kw – Configuration options, if no Config object is passed in.

Return type:

commonnexus.nexus.Nexus

Returns:

A Nexus instance.

property blocks: Dict[str, List[Block]]

A dict mapping uppercase block names to lists of instances of these blocks ordered as they appear in the NEXUS content.

For a shortcut to access blocks which are known to appear just once in the NEXUS content, see Nexus.__getattribute__().

__getattribute__(name)[source]

NEXUS does not make any prescriptions regarding how many blocks with the same name may exist in a file. Thus, the primary way to access blocks is by looking up the list of blocks for a given name in Nexus.blocks(). If it can be assumed that just one block for a name exists, or only the first block with that name is of interest, this block can also be accessed as Nexus.<BLOCK_NAME>, i.e. using the uppercase block name as attribute of the Nexus instance.

>>> nex = Nexus('#NEXUS begin block; cmd; end;')
>>> nex.BLOCK.name
'BLOCK'
>>> len(nex.BLOCK.commands)
1
__str__()[source]

The string representation of a Nexus object is just its NEXUS content.

>>> nex = Nexus()
>>> nex.append_block(Block.from_commands([]))
>>> print(nex)
#NEXUS
BEGIN BLOCK;
END;
to_file(p)[source]

Write the NEXUS content of a Nexus object to a file.

Parameters:

p (typing.Union[str, pathlib.Path]) –

property comments: List[str]

Comments may appear anywhere in a NEXUS file. Thus, they are the only kind of tokens not really grouped into a command.

While comments in commands can also be accessed from the command, comments preceding any command (and all others) can accessed via this property.

>>> nex = Nexus("#nexus [created by commonnexus] begin block; cmd [does nothing]; end;")
>>> nex.BLOCK.CMD.comments
['does nothing']
>>> nex.comments[0]
'created by commonnexus'
get_numbers(object_name, items)[source]

Determine object numbers suitable for inclusion in a set spec.

resolve_set_spec(object_name, spec, chars=None)[source]

Resolve a set spec to a list of included items, specified by label or number.

Parameters:
  • object_name

  • spec

Returns:

__weakref__

list of weak references to the object (if defined)

property characters: Block | None

Shortcut to get around the DATA/CHARACTERS ambiguity.

I.e. if one is interested in the characters matrix of a NEXUS file no matter whether this is included in a DATA or CHARACTERS block, Nexus.characters.get_matrix() can be used rather than (Nexus.DATA or NEXUS.CHARACTERS).get_matrix().

Returns:

The first DATA or CHARACTERS block.

property taxa: List[str] | None

Shortcut to retrieve the list of taxa a NEXUS file provides data on.

Returns:

The list of taxa labels used in a NEXUS file.

Note

There are various ways to encode taxa labels in a NEXUS file. This method looks up different places ordered by explicitness, i.e.

  1. A TAXLABELS command in a TAXA block.

  2. A TAXLABELS command in a DATA or CHARACTERS block.

  3. Taxa labels given implicitly as labels in a MATRIX command.

  4. A TAXLABELS command in a DISTANCES block.

  5. Taxa labels given implicitly as labels in a DISTANCES.MATRIX command.

  6. Taxa labels given as mappings in the TRANSLATE command of a TREES block.

  7. Taxa labels given implicitly as node names in the Newick representation of a tree in a TREE command in a TREES block.

Warning

Taxa descriptions in NEXUS may be inconsistent, e.g. a NEXUS file might contain a TAXA block, but introduce new taxa via NEWTAXA/TAXLABELS in a CHARACTERS block. commonnexus does not make an effort to check for consistency.

Writing NEXUS data

commonnexus provides functionality to write NEXUS by manipulating commonnexus.nexus.Nexus objects, which can then be written to a file.

>>> nex = Nexus()
>>> nex.to_file('test.nex')

will write a minimal NEXUS file containing just the text #NEXUS.

Since blocks are the somewhat self-contained units of information in NEXUS, the main ways to manipulate a Nexus object are

Nexus.append_block(block)[source]
Parameters:

block (commonnexus.blocks.base.Block) –

Nexus.remove_block(block)[source]
Parameters:

block (commonnexus.blocks.base.Block) –

Nexus.replace_block(old, new)[source]
Parameters:
  • old (commonnexus.blocks.base.Block) –

  • new (typing.Union[commonnexus.blocks.base.Block, typing.List[typing.Tuple[str, str]]]) –

The methods to add blocks accept Block instances as argument. Such instances can be obtained by calling the generic factory method

classmethod Block.from_commands(commands, nexus=None, name=None, comment=None, TITLE=None, LINK=None, ID=None)[source]

Generic factory method for blocks.

This method will create a block with the uppercase name of the cls as name (or the explicitly passed block name). The (name str, payload str) tuples from commands are simply passed to commonnexus.command.Command.from_name_and_payload() to assemble the commands in the block.

This method should be used to create custom, non-public NEXUS blocks, while for public blocks the from_data method of the class implementing the block should be preferred, because the latter will make sure that consistent, valid block data is written.

Parameters:
  • commands (typing.Iterable[typing.Union[str, typing.Tuple[str, str], typing.Tuple[str, str, str]]]) – The commands to be inserted in the body of the block. A command can be specified as single string, which is taken as the name of the command, a pair (name, payload) or a triple (name, payload, comment).

  • nexus (typing.Optional[commonnexus.nexus.Nexus]) – A Nexus instance to lookup global config options.

  • name (typing.Optional[str]) – Explicit name of the block to be created.

Return type:

commonnexus.blocks.base.Block

Returns:

The instantiated Block object.

>>> from commonnexus import Nexus, Block
>>> nex = Nexus()
>>> nex.append_block(Block.from_commands([('mycommand', 'with data')], name='myblock'))
>>> print(nex)
#NEXUS
BEGIN myblock;
    mycommand with data;
END;
>>> str(nex.MYBLOCK.MYCOMMAND)
'with data'
Parameters:
  • comment (typing.Optional[str]) –

  • TITLE (typing.Optional[str]) –

  • LINK (typing.Optional[str]) –

  • ID (typing.Optional[str]) –

or specific implementations of Block.from_data, such as commonnexus.blocks.characters.Characters.from_data() or commonnexus.blocks.trees.Trees.from_data()

Comments

The from_data methods of blocks accept a keyword argument comment to add a comment to a block construct.

To add a comment to the top of a NEXUS file, one can proceed as follows:

>>> nex = Nexus('#NEXUS\n[{}]\n'.format('Comment goes here'))
>>> nex.append_block(Block.from_commands([], name='theblock'))
>>> print(nex)
#NEXUS
[Comment goes here]

BEGIN theblock;
END;