Optimise `unescape` by andyundso · Pull Request #457 · ruby-rdf/rdf

andyundso · 2026-03-30T15:05:25Z

Can be considered a follow-up to #453.

I did again some profiling and noted that a lot of time was still spent in unescape. The main problem is that the double gsub will allocate two copies of the string to the potential third copy when the encoding was not UTF-8. This is solved by combining the two regex into one. as only the UCHAR contains capture groups, it is clear within the block of gsub which one to replace.

The second optimization is to add a match? for an early return. Now this means the UNESCAPE_COMBINED regex is executed twice when encountering a match, but since most things parsed won't contain these special cases, the early return will have a positive benefit for most parsing operations.

Benchmark script:

$:.unshift(File.expand_path(File.join(File.dirname(__FILE__), 'lib')))
require 'benchmark/ips'
require 'rdf'

Benchmark.ips do |x|
  x.report('without') do
    RDF::NTriples::Reader.unescape("D\u00FCrst")
    RDF::NTriples::Reader.unescape("Hello world!")
  end

  if ENV['WITH_MODULE'] == 'true'
    module RDF::NTriples
      class Reader
        UNESCAPE_COMBINED = Regexp.union(UCHAR, ESCAPE_CHARS_ESCAPED_REGEXP).freeze

        def self.unescape(string)
          # Note: avoiding copying the input string when no escaping is needed
          # greatly reduces the number of allocations and the processing time.
          string = string.dup.force_encoding(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8

          # Early return when nothing to unescape: avoids string allocation entirely.
          return string unless string.match?(UNESCAPE_COMBINED)

          # Single pass handles both \uXXXX/\UXXXXXXXX and backslash escape chars.
          string.gsub(UNESCAPE_COMBINED) do |match|
              ($1 || $2) ? [($1 || $2).hex].pack('U*') : ESCAPE_CHARS_ESCAPED[match]
          end
        end
      end
    end
  end

  x.report('with') do
    RDF::NTriples::Reader.unescape("D\u00FCrst")
    RDF::NTriples::Reader.unescape("Hello world!")
  end
  x.hold! 'temp_results'
  x.compare!
end

Results:

ruby 4.0.2 (2026-03-17 revision d3da9fec82) +PRISM [x86_64-linux]
Warming up --------------------------------------
                with   363.408k i/100ms
Calculating -------------------------------------
                with      3.654M (± 3.6%) i/s  (273.68 ns/i) -     18.534M in   5.081023s

Comparison:
                with:  3653882.0 i/s
             without:  1521371.4 i/s - 2.40x  slower

coveralls · 2026-03-30T15:08:51Z

coverage: 91.805% (+0.003%) from 91.802%
when pulling 130e396 on andyundso:optimise-unescape
into d6dd27d on ruby-rdf:develop.

Optimise unescape

130e396

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise `unescape`#457

Optimise `unescape`#457
andyundso wants to merge 1 commit intoruby-rdf:developfrom
andyundso:optimise-unescape

andyundso commented Mar 30, 2026

Uh oh!

coveralls commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andyundso commented Mar 30, 2026

Uh oh!

coveralls commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants