Hi Vicențiu, Vladislav and the community,
I'm ready to answer the questions asked in the last email:
a. How are arrays sorted if the values inside them are a mix of objects, arrays, literals, numbers.
The order of the values in JSON array is preserved, referring to https://stackoverflow.com/a/7214312/547065 thanks to Vladislav Vaintroub, so there’s no need to sort the arrays, we just need to parse the element in it and sort the objects recursively.
b. How do you define a sorting criteria between two JSON objects in this case?
It can be sorting the keys in ASCIIbetical order (https://en.wikipedia.org/wiki/ASCII#Character_order) since it’s easier to realize in C++ compared to other sorting criteria.
c. JSON is represented as text, however one can use it to store floating point values. How do you plan to compare doubles and how would those values be sorted? For example: 1 vs 1.0 vs 1.00 or 1000 vs 1e3?
That’s really a problem to solve. I plan to convert every number into long double, and then rounded to fixed decimals (such as 8 digits after the decimal point?), then convert it to string again, the numbers can be unified as a result.
d. What's the priority of null values, are they first, last?
As a. described, this question is not applicable now.
Here’s some test cases applying my ideas:
TEST CASE #1
'{"a": 0, "B": {"C": 1}, "D": 2}', '{"A": 7, "C": 9, "B": 8}'
JSON_ NORMALIZE
Return '{"B":{"C":1.00000000},"D":2.00000000,"a":0.00000000}','{"A":7.00000000,"B":8.00000000,"C":9.00000000}' separately
JSON_EQUALS return 0
TEST CASE #2
'{"a": 0, "B": {"C": 100}, "D": 2}', '{"B": {"C": 1e2}, "a": 0.0, "D": 2.00}'
JSON_ NORMALIZE
Return '{"B":{"C":100.00000000},"D":2.00000000,"a":0.00000000}', '{"B":{"C":100.00000000},"D":2.00000000,"a":0.00000000}' separately
JSON_EQUALS return 1
TEST CASE #3
'{"A": 0, "B": [{"C": 1, "E": 2}, {"A": 0, "D": 2}], "D": 2}', '{"A": 0, "B": [{"A": 0, "D": 2}, {"C": 1, "E": 2},], "D": 2}'
JSON_ NORMALIZE
Return '{"A":0.00000000,"B":[{"C":1.00000000,"E":2.00000000},{"A":0.00000000,"D":2.00000000}],"D":2.00000000}','{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000},],"D":2.00000000}' separately
JSON_EQUALS return 0
TEST CASE #3
'{"A": 0, "B": [{"A": 0, "D": 2}, {"C": 1, "E": 2}], "D": 2}', '{"A": 0, "B": [{"A": 0, "D": 2}, {"C": 1, "E": 2},], "D": 2}'
JSON_ NORMALIZE
Return '{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000}],"D":2.00000000}','{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000},],"D":2.00000000}' separately
JSON_EQUALS return 1
TEST CASE #4
[null,1,[2,3],true,false]', '[null,1,[2],false]'
JSON_ NORMALIZE
Return [null,1.00000000,[2.00000000,3.00000000],true,false]', '[null,1.00000000,[2.00000000],false]' separately
JSON_EQUALS return 0
TEST CASE #5
'{}', '{}'
JSON_ NORMALIZE
Return '{}', '{}' separately
JSON_EQUALS return 1
TEST CASE #6
'[]', '[]'
JSON_ NORMALIZE
Return '[]', '[]' separately
JSON_EQUALS return 1
In addition, I’ve also checked pandas python library https://github.com/pandas-dev/pandas/blob/master/pandas/io/json/_normalize.p... and noticed that they use the json_normalize function to normalize semi-structured JSON data into a flat table. This gives me another idea; we can also just create the function to act like that for JSON_ NORMALIZE to generate a flat table (make sure the row name is in ASCIIbetical order) and to produce a row name vector, a column number counter and a matrix for storing the values. Then to JSON_EQUALS, first compare if the column number count is same, and then the row name vector, finally the value matrix to ensure a fast and efficient JSON array compare algorithm.
E.G.
JSON data:
[{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
Table:
info.governor name population state shortname
0 Rick Scott Dade 1 2345 Florida FL
1 Rick Scott Broward 40000 Florida FL
2 Rick Scott Palm Beach 60000 Florida FL
3 John Kasich Summit 1234 Ohio OH
4 John Kasich Cuyahoga 1337 Ohio OH
Column Number Counter: 5
Row Name Vector: ["info.governor","name","population","state","shortname"]
Data Matrix:
Rick Scott Dade 1 2345 Florida FL
Rick Scott Broward 40000 Florida FL
Rick Scott Palm Beach 60000 Florida FL
John Kasich Summit 1234 Ohio OH
John Kasich Cuyahoga 1337 Ohio OH
That's all for my ideas so far. Please correct me if I made some mistakes.
Cheers!
Songlin
------------------ Original ------------------
From: "Hollow Man"