c14n package included in GOBL, is inspired by the works of others and aims to define a simple standardized approach to canonical JSON that could potentially be implemented easily in other languages.
GOBL JSON C14n
GOBL considers the following JSON values as explicit types:- a string
- a number, which extends the JSON spec and is split into:
- an integer
- a float
- an object
- an array
- a boolean
- null
- Must be encoded in valid UTF-8. A document with invalid character encoding will be rejected.
- Must not include superfluous or non-semantic whitespace.
- Must order the attributes of objects lexicographically by the code points of their names.
-
Must remove attributes from objects whose value is
null. - Must not remove null values from arrays.
-
Must represent numbers that are mathematically integers—i.e., those with a zero-valued fractional part—using the canonical JSON integer form. These numbers must not be represented with:
- a leading minus sign when the value is zero (i.e., use
0, not-0); - a decimal point (e.g.,
3, not3.0); - exponent notation (e.g.,
1000, not1e3); - leading zeroes (e.g.,
42, not042), as already prohibited by the JSON specification.
- a leading minus sign when the value is zero (i.e., use
-
Must represent floating-point numbers in exponential notation, adhering to the following format:
- A nonzero single-digit integer part to the left of the decimal point (e.g.,
1.23E+3, not12.3E+2); - A nonempty fractional part to the right of the decimal point (e.g.,
1.2E3, not1.E3); - No trailing zeroes in the fractional part, unless required to satisfy the condition above;
- A capital
Eas the exponent separator (not lowercasee); - No plus sign (
+) in either the mantissa or the exponent; - No leading zeroes in the exponent (e.g.,
1.2E3, not1.2E003).
- A nonzero single-digit integer part to the left of the decimal point (e.g.,
-
Must represent all strings, including object attribute keys, in their minimal length UTF-8 encoding:
- using two-character escape sequences where possible for characters that require escaping, specifically:
Character Escape Sequence Unicode "Quotation Mark\"U+0022\Reverse Solidus (backslash)\\U+005C⌫Backspace\bU+0008⇥Character Tabulation (tab)\tU+0009␊Line Feed (newline)\nU+000A␌Form Feed\fU+000C↵Carriage Return\rU+000D- using six-character
\u00XXuppercase hexadecimal escape sequences for control characters that require escaping but lack a two-character sequence described previously, and - reject any string containing invalid encoding.
encoding/json library’s streaming methods to parse and recreate a document in memory. A simplified object model is used to map JSON structures ready to be converted into canonical JSON.