An algorithm to parse arrays.

A possible way to parse array structures from textual data.

Code

source

Motivation

Recently at work, I was working on this problem which included parsing a nested array type structure.

Example 
.data = "{
    {
        name1,
        {
            phone_number_1, 
            address_1,
            roll_number_1
        }
    },
    {
        name2,
        {
            phone_number_2, 
            address_2,
            roll_number_2
        }
    }    
}"

The crux of the problem is to parse this .data into a list([]) which can be indexed, modified, appended to later.

Possible Approaches

  • For illustration purposes, lets take .data = "{name,{name,{jalotra, shivam, machine}}}"

  • First one that came to mind is to use ast.literal_eval() that ast module gives out of the bat.

    • Just as it is .data will give errors but if we change values to string types, it would work.

My Solution

  • Is to make a tree out of the nested arrays and then modify this tree in-place(equivalent to modifying some array value) or delete nodes(equivalent to deleting some array values), adding new node(equivalent to adding new array values) and finally return the parsed array.

  • Algorithm

    • We start with a root node.
    • And for every open_parentheses a new node in this tree rooted at Root is created.
    • Now the structure that you describe for this node can be used recursively. I call this structure NArrayTree
    • And you can keep on adding the current value (string) this NArrayTree node can hold, like if the example was { jalotra, { } } first you encounter a opening bracket { and you will create a new Node NArrayTree([]) and till jalotra, you will append chars to this node.
    • Finally, your original node will look like NArrayTree(['j', 'a', 'l', 'o', 't', 'r', 'a']) and then once more you encounter a { do same for child node recursively.

    • After making this tree, you are left will parsing this tree, which is a trivial algo that uses dfs or give me the answer for children.

        # Parses tree starting from self recursively.
        def parse_tree(self):       
            res = []
            if len(self.value) > 0:
                res = self.get_value()
      
            for child in self.children:
                res.append(child.parse_tree())
            return res
      

Result

    • data = "{name}"
    • result = ["name"]


    • data = "{name, section}"
    • result = ["name", "section"]


    • data = "{name, {section}}"
    • result = ["name", ["section"]]


    • data = "{name, {section, gender, {medadata, {XII, India}}}}"
    • result = ['name', ['section', 'gender', ['medadata', ['XII', 'India']]]]