Skip to content

escape data strings according to WebAssembly spec , Fixes #37#947

Open
RayanVSS wants to merge 2 commits intoOCamlPro:mainfrom
RayanVSS:main
Open

escape data strings according to WebAssembly spec , Fixes #37#947
RayanVSS wants to merge 2 commits intoOCamlPro:mainfrom
RayanVSS:main

Conversation

@RayanVSS
Copy link
Copy Markdown

Fix for Issue #37: WebAssembly Data Pretty Printing

The Initial Problem

Issue #37 (#37) reported a problem in the pretty-printing of the data instruction in WebAssembly Text format.

The original code used OCaml's %S tag to display the initialization string:

pf fmt {|(data%a %a %S)|} pp_id_opt d.id Mode.pp d.mode d.init

Why Was This a Problem?

OCaml's %S tag escapes characters according to OCaml conventions, not according to WebAssembly Text Format conventions. This caused issues

Concrete Example

With the old code, a string containing special characters like \n, \r, \t, etc. was not escaped correctly, making it impossible to re-read the file.

Modifications in src/ir/text.ml

I added two new pretty-printing functions that comply with the WebAssembly Text Format specification:

New Test Files in test/fmt/

To validate the fix, I added three new test files:

  1. data_special_chars.wat: Test with special characters (\n, \t, \r, ", ', \)
  2. data_bytes.wat: Test with raw bytes and non-printable characters
  3. data_roundtrip.t: Round-trip test to verify format stability
  4. Modifications to print.t: Added test cases for the two files above

These tests are automatically executed with dune runtest test/fmt.

Added Functions

1. pp_name_inner - Correct Escaping

let pp_name_inner fmt s =
  let pp_hex_char fmt c = pf fmt "\\%02x" (Char.code c) in
  let pp_char fmt = function
    | '\n' -> string fmt "\\n"
    | '\r' -> string fmt "\\r"
    | '\t' -> string fmt "\\t"
    | '\'' -> string fmt "\\'"
    | '"' -> string fmt "\\\""
    | '\\' -> string fmt "\\\\"
    | c ->
      let ci = Char.code c in
      if 0x20 <= ci && ci < 0x7f then char fmt c else pp_hex_char fmt c
  in
  let pp_unicode_char fmt = function
    | (0x09 | 0x0a) as c -> pp_char fmt (Char.chr c)
    | uc when 0x20 <= uc && uc < 0x7f -> pp_char fmt (Char.chr uc)
    | uc -> pf fmt "\\u{%02x}" uc
  in
  String.iter (fun c -> pp_unicode_char fmt (Char.code c)) s

This function implements WebAssembly escaping rules:

  • Special characters: \n, \r, \t, \', \", \\ are explicitly escaped
  • Printable ASCII characters (0x20-0x7F): displayed as-is
  • Other characters: escaped in hexadecimal notation \xx
  • Special Unicode codes: \u{xx} notation for tab (0x09) and newline (0x0a)

2. pp_name - Wrapper with Quotes

let pp_name fmt s = pf fmt {|"%a"|} pp_name_inner s

Simple wrapper that surrounds the result of pp_name_inner with double quotes.

3. Modification to Data.pp

let pp fmt (d : t) =
  pf fmt {|(data%a %a %a)|} pp_id_opt d.id Mode.pp d.mode pp_name d.init

Change: %Spp_name to use our custom escaping function.

Specification

The solution follows the WebAssembly Text Format specification:

The implemented escaping rules exactly match the spec.

Inspiration

PR #391 (which attempted to solve this problem) referenced a PR in the official WebAssembly repo:

this implementation aligns with these changes.

Non-regression Tests

All existing tests pass:

dune runtest test/fmt

Stability Property (Idempotence)

The round-trip test verifies the idempotence property:

format(format(x)) = format(x)

Case Analysis

I manually tested these cases:

Input Expected Output
\n \n
\r \u{0d}
\t \t
" \"
' \'
\ \\
Characters 0x20-0x7F Identical
Bytes < 0x20 or > 0x7F \xx

References

Modified and Created Files

Modified Files

  1. src/ir/text.ml: Added pp_name_inner and pp_name functions, modified Data.pp
  2. test/fmt/dune: Added new test files to dependencies
  3. test/fmt/print.t: Added tests for special characters and raw bytes

New Test Files

  1. test/fmt/data_special_chars.wat: Test with special characters
  2. test/fmt/data_bytes.wat: Test with raw bytes
  3. test/fmt/data_roundtrip.t: Round-trip test

Comment thread test/fmt/data_roundtrip.t Outdated
@@ -0,0 +1,8 @@
test data special chars round-trip:
$ owi fmt data_special_chars.wat > /tmp/owi_test_output.wat
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the file in /tmp is likely going to cause various issues. Doing ... > ./owi_test_output.wat is fine (dune takes care of having the files in the right place).

Comment thread test/fmt/data_roundtrip.t Outdated
(memory 1)
(data (memory 0) (offset i32.const 0) "hello\n\t\u{0d}\"\'\\world")
)
$ rm /tmp/owi_test_output.wat
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to do this if you put it in the current directory

Comment thread src/ir/text.ml Outdated
| '\\' -> string fmt "\\\\"
| c ->
let ci = Char.code c in
if 0x20 <= ci && ci < 0x7f then char fmt c else pp_hex_char fmt c
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be rewritten as:

| '\x20' .. '\x7e' as c -> char fmt c
| c -> pp_hex_char fmt c 

Comment thread src/ir/text.ml Outdated
in
let pp_unicode_char fmt = function
| (0x09 | 0x0a) as c -> pp_char fmt (Char.chr c)
| uc when 0x20 <= uc && uc < 0x7f -> pp_char fmt (Char.chr uc)
Copy link
Copy Markdown
Member

@redianthus redianthus Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you could use a character interval for the pattern (and it's likely not necessary to use integers for this function, using char should be enough)

@RayanVSS
Copy link
Copy Markdown
Author

Thanks for the review, I've made the suggested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants