An algorithm to parse arrays.
A possible way to parse array structures from textual data.
Photo by Marcus Urbenz on Unsplash
Table of contents
Code
Motivation
Recently at work, I was working on this problem which included parsing a nested array type structure.
Example
.data = "{
{
name1,
{
phone_number_1,
address_1,
roll_number_1
}
},
{
name2,
{
phone_number_2,
address_2,
roll_number_2
}
}
}"
The crux of the problem is to parse this .data
into a list([])
which can be indexed, modified, appended to later.
Possible Approaches
For illustration purposes, lets take
.data = "{name,{name,{jalotra, shivam, machine}}}"
First one that came to mind is to use
ast.literal_eval()
that ast module gives out of the bat.- Just as it is .data will give errors but if we change values to string types, it would work.
My Solution
Is to make a tree out of the nested arrays and then modify this tree in-place(equivalent to modifying some array value) or delete nodes(equivalent to deleting some array values), adding new node(equivalent to adding new array values) and finally return the parsed array.
Algorithm
- We start with a root node.
- And for every open_parentheses a new node in this tree rooted at Root is created.
- Now the structure that you describe for this node can be used recursively. I call this structure
NArrayTree
- And you can keep on adding the current value (string) this
NArrayTree
node can hold, like if the example was{ jalotra, { } }
first you encounter a opening bracket{
and you will create a new NodeNArrayTree([])
and tilljalotra,
you will appendchars
to this node. Finally, your original node will look like
NArrayTree(['j', 'a', 'l', 'o', 't', 'r', 'a'])
and then once more you encounter a{
do same for child node recursively.After making this tree, you are left will parsing this tree, which is a trivial algo that uses
dfs or give me the answer for children
.# Parses tree starting from self recursively. def parse_tree(self): res = [] if len(self.value) > 0: res = self.get_value() for child in self.children: res.append(child.parse_tree()) return res
Result
data = "{name}"
result = ["name"]
data = "{name, section}"
result = ["name", "section"]
data = "{name, {section}}"
result = ["name", ["section"]]
data = "{name, {section, gender, {medadata, {XII, India}}}}"
result = ['name', ['section', 'gender', ['medadata', ['XII', 'India']]]]