c14n
package included in GOBL, is inspired by the works of others and aims to define a simple standardized approach to canonical JSON that could potentially be implemented easily in other languages.
GOBL JSON C14n
GOBL considers the following JSON values as explicit types:- a string
- a number, which extends the JSON spec and is split into:
- an integer
- a float
- an object
- an array
- a boolean
- null
- Must be encoded in valid UTF-8. A document with invalid character encoding will be rejected.
- Must not include superfluous or non-semantic whitespace.
- Must order the attributes of objects lexicographically by the code points of their names.
-
Must remove attributes from objects whose value is
null
. - Must not remove null values from arrays.
-
Must represent numbers that are mathematically integers—i.e., those with a zero-valued fractional part—using the canonical JSON integer form. These numbers must not be represented with:
- a leading minus sign when the value is zero (i.e., use
0
, not-0
); - a decimal point (e.g.,
3
, not3.0
); - exponent notation (e.g.,
1000
, not1e3
); - leading zeroes (e.g.,
42
, not042
), as already prohibited by the JSON specification.
- a leading minus sign when the value is zero (i.e., use
-
Must represent floating-point numbers in exponential notation, adhering to the following format:
- A nonzero single-digit integer part to the left of the decimal point (e.g.,
1.23E+3
, not12.3E+2
); - A nonempty fractional part to the right of the decimal point (e.g.,
1.2E3
, not1.E3
); - No trailing zeroes in the fractional part, unless required to satisfy the condition above;
- A capital
E
as the exponent separator (not lowercasee
); - No plus sign (
+
) in either the mantissa or the exponent; - No leading zeroes in the exponent (e.g.,
1.2E3
, not1.2E003
).
- A nonzero single-digit integer part to the left of the decimal point (e.g.,
-
Must represent all strings, including object attribute keys, in their minimal length UTF-8 encoding:
- using two-character escape sequences where possible for characters that require escaping, specifically:
Character Escape Sequence Unicode "
Quotation Mark\"
U+0022
\
Reverse Solidus (backslash)\\
U+005C
⌫
Backspace\b
U+0008
⇥
Character Tabulation (tab)\t
U+0009
␊
Line Feed (newline)\n
U+000A
␌
Form Feed\f
U+000C
↵
Carriage Return\r
U+000D
- using six-character
\u00XX
uppercase hexadecimal escape sequences for control characters that require escaping but lack a two-character sequence described previously, and - reject any string containing invalid encoding.
encoding/json
library’s streaming methods to parse and recreate a document in memory. A simplified object model is used to map JSON structures ready to be converted into canonical JSON.