Parse a Nexus or TNT file, reading character states and names.

  character_num = NULL,
  encoding = "UTF8",
  session = NULL

  character_num = NULL,
  type = NULL,
  session = NULL,
  encoding = "UTF8"

ReadNotes(filepath, encoding = "UTF8")






character string specifying location of file, or a connection to the file.


Index of character(s) to return. NULL, the default, returns all characters.


Character encoding of input file.


(Optional) A Shiny session with a numericInput named character_num whose maximum should be updated.


Character vector specifying categories of data to extract from file. Setting type = c('num', 'dna') will return only characters following a &[num] or &[dna] tag in a TNT input file, listing num character blocks before dna characters. Leave as NULL (the default) to return all characters in their original sequence.


Parameters to pass to Read[Tnt]Characters().


list of taxa and characters, in the format produced by a list of sequences each made of a single character vector, and named with the taxon name.


ReadCharacters() and ReadTNTCharacters() return a matrix whose row names correspond to tip labels, and column names correspond to character labels, with the attribute state.labels listing the state labels for each character; or a list of length one containing a character string explaining why the function call was unsuccessful. ReadAsPhyDat() and ReadTntAsPhyDat() return a phyDat object. ReadNotes() returns a list in which each entry corresponds to a single character, and itself contains a list of with two elements:

  1. A single character object listing any notes associated with the character

  2. A named character vector listing the notes associated with each taxon for that character, named with the names of each note-bearing taxon.


Tested with matrices downloaded from MorphoBank, but should also work more widely; please report incompletely or incorrectly parsed files.

Matrices must contain only continuous or only discrete characters; maximum one matrix per file. Continuous characters will be read as strings (i.e. base type 'character').

The encoding of an input file will be automatically determined by R. Errors pertaining to an invalid multibyte string or string invalid at that locale indicate that R has failed to detect the appropriate encoding. Either re-save the file in a supported encoding (UTF-8 is a good choice) or specify the file encoding (which you can find by, for example, opening in Notepad++ and identifying the highlighted option in the "Encoding" menu) following the example below.


  • PhyDat: A convenient wrapper for phangorn's phyDat(), which converts a list of morphological characters into a phyDat object. If your morphological characters are in the form of a matrix, perhaps because they have been read using read.table(), try MatrixToPhyDat() instead.


Maddison DR, Swofford DL, Maddison WP (1997). “Nexus: an extensible file format for systematic information.” Systematic Biology, 46, 590--621. doi: 10.1093/sysbio/46.4.590 .

See also


Martin R. Smith (


fileName <- paste0(system.file(package = 'TreeTools'),
#>         Character one Character two lots-of-punctuation, and "so on"!
#> taxon_a "0"           "0"           "0"                              
#> taxon_b "0"           "0"           "0"                              
#> taxon_c "1"           "1"           "1"                              
#> taxon_d "1"           "1"           "1"                              
#> taxon_e "1"           "1"           "1"                              
#>         Character n Character 5 Character 6 final character
#> taxon_a "0"         "0"         "0"         "0"            
#> taxon_b "0"         "0"         "0"         "0"            
#> taxon_c "1"         "?"         "0"         "0"            
#> taxon_d "?"         "?"         "1"         "1"            
#> taxon_e "1"         "?"         "1"         "1"            
#> attr(,"state.labels")
#> attr(,"state.labels")[[1]]
#> [1] "absent"  "present"
#> attr(,"state.labels")[[2]]
#> [1] "absent"  "present"
#> attr(,"state.labels")[[3]]
#> [1] "here"       "there"      "everywhere"
#> attr(,"state.labels")[[4]]
#> [1] "a long description" "present"           
#> attr(,"state.labels")[[5]]
#> [1] "simple"                "more complex"          "with (parentheses)"   
#> [4] "more complex, 6 still"
#> attr(,"state.labels")[[6]]
#> [1] "this one has"   "multiple lines"
#> attr(,"state.labels")[[7]]
#> [1] "absent"  "present"

fileName <- paste0(system.file(package = 'TreeTools'),

continuous <- ReadCharacters(fileName, encoding = 'UTF8')

# To convert from strings to numbers:
at <- attributes(continuous)
continuous <- suppressWarnings(as.numeric(continuous))
attributes(continuous) <- at
#>            [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
#> A_taxon   1.111 1.000 1.330 1.444 1.555 1.666
#> B_alienus 2.111 2.222 2.333    NA 2.550 2.666
#> C_andinus 3.111 3.222 3.333 3.444 3.555 3.666