Balanced parenthesis sequences

In this page we provide the balanced parentheses representation of several trees. The balanced parentheses representation of a tree consists of a depth-first preorder traversal of the tree, writing an opening parenthesis when visiting a forward edge (parent to child), and a closing parenthesis when visiting a backward edge (child to parent). Thus, a balanced parentheses representation of a tree with n nodes has 2n parentheses.

To read the datasets as bitarrays, use this code.

Wikipedia (13 MB)

Download dataset

Balanced parentheses representation of the XML of a Wikipedia dump (January 12, 2015). This dataset has 498,753,914 parentheses.

Proteins (82 MB)

Download dataset

Balanced parentheses representation of the suffix tree of the protein data from the Pizza&Chili corpus. The suffix tree was constructed using this code. This dataset has 670,721,006 parentheses.

DNA (135 MB)

Download dataset

Balanced parentheses representation of the suffix tree of the DNA data from the Pizza&Chili corpus. The suffix tree was constructed using this code. This dataset has 1,154,482,174 parentheses.

Complete tree (18 MB)

Download dataset

Balanced parentheses representation of a complete binary tree of depth 30. This dataset has 2,147,483,644 parentheses.

OpenStreetMap (76 MB)

Download dataset

Balanced parentheses representation of the XML of an OpenStreetMap dump (January 10, 2015). This dataset has 4,675,776,358 parentheses.