diff --git a/docs/embedding-extending.md b/docs/embedding-extending.md index 97a3199b..bd17b853 100644 --- a/docs/embedding-extending.md +++ b/docs/embedding-extending.md @@ -6,10 +6,12 @@ sidebar_label: Embedding and Extending JSONata ## API -### jsonata(str) +### jsonata(str[, options]) Parse a string `str` as a JSONata expression and return a compiled JSONata expression object. +`options`, if present, is used to control certain aspects of the evaluator, and can be used to protect the server from expressions that take longer to execute than expected. See [Configuring Guardrails](guardrails) for more details. + ```javascript var expression = jsonata("$sum(example.value)"); ``` diff --git a/docs/guardrails.md b/docs/guardrails.md new file mode 100644 index 00000000..d4dd478f --- /dev/null +++ b/docs/guardrails.md @@ -0,0 +1,159 @@ +--- +id: guardrails +title: Configuring Guardrails +sidebar_label: Configuring Guardrails +--- + +## Guardrails + +This page contains information relating to the JavaScript [reference implementation](https://github.com/jsonata-js/jsonata) of JSONata, and not the JSONata expression language itself. + +JSONata is a Turing-complete expression language, and as such, it is possible to write unbounded, or infinite loops. This can be a potential problem if an application using JSONata is exposing the ability for client users to input expressions that are evaluated on the server. A user could accidently or maliciously provide an expression that, if evaluated unchecked, could cause a denial of service situation. + +This JSONata library provides a set of configurable 'guardrails' that limit the compute and memory resources that a single expression can consume. If this library is being used in a hosted environment to allow end users to provide their own expressions, then it would be prudent to set constraints. The following sections describe each of the guardrails and how to configure them. It does not provide recommended values or defaults. + +### Stack overflow + +In common with other functional languages, JSONata supports looping by writing [recursive functions](https://en.wikipedia.org/wiki/Functional_programming#Recursion). The JSONata evaluator processes an expression using a set of mutually recursive functions (eval-apply cycle). When a function is invoked (by itself or by another function), the call stack in the host JavaScript runtime will grow. If this stack grows too deep, evaluator could exhaust the memory of the host process causing it to crash. + +The JSONata evaluator can be configured with a maximum stack[^stack] limit to prevent an expression from doing this by specifying the `stack` option. Error `D1011` will be thrown if the expression grows the stack beyond the specified limit. + +```javascript +const jsonata = require('jsonata'); + +const data = {JSON: data}; +const options = { + stack: 500 +}; + +(async () => { + const expression = jsonata('', options); + const result = await expression.evaluate(data); +})() +``` + + +As an example, the [Ackermann function](https://en.wikipedia.org/wiki/Ackermann_function) could be implemented in JSONata using: + +``` +( + $ack := function($m, $n) { + $m = 0 ? $n + 1 : + $n = 0 ? $ack($m - 1, 1) : + $ack($m - 1, $ack($m, $n - 1)) + }; + + $ack(3, 4) +) +``` + +Invoked as `$ack(3, 4)` would quickly evaluate to `125`. However, `$ack(4, 3)`, although theoretically computable, will readily hit the configured stack guardrail before causing any problems to the host server. + +[^stack]: The term 'stack' is a slight misnomer here; it actually limits the number of times round the eval-apply cycle, which is related to the JavaScript stack depth. + +### Excessive execution time + +It's possible (and desirable) to write [tail recursive](programming#tail-call-optimization-tail-recursion) functions that don't grow the stack at all. For these types of functions, a [stack guardrail](#stack-overflow) would not be sufficient to protect against unbounded loops. + +The JSONata evaluator can be configured with a maximum time limit to protect against runaway expressions by specifying the `timeout` option. Error `D1012` will be thrown if the expression runs for longer than the specified timeout (in milliseconds). + +It's good practice to specify both `stack` and `timeout`. + +```javascript +const jsonata = require('jsonata'); + +const data = {JSON: data}; +const options = { + stack: 500, + timeout: 1000 // in milliseconds +}; + +(async () => { + const expression = jsonata('', options); + const result = await expression.evaluate(data); +})() +``` + +As an example, an infinite loop could be written in JSONata: + +``` +( + $inf := function() { + $inf() + }; + + $inf() +) +``` + +This is tail recursive, and would run forever without the timeout guardrail. + +### Excessive sequence length + +It's possible to write expressions that result in excessively long result sequences. This could ultimately lead to memory exhaustion in the host server. The `sequence` option can be set to specify the maximum sequence length that can be created by an expression, including any intermediate sequences created by sub-expressions. Error `D2015` will be thrown if, during the evaluation of an expression, the evaluator attempts to generate a sequence exceeding this upper limit. + + +```javascript +const jsonata = require('jsonata'); + +const data = {JSON: data}; +const options = { + sequence: 1e6 // maximum of one million items in a sequence +}; + +(async () => { + const expression = jsonata('', options); + const result = await expression.evaluate(data); +})() +``` + +As an example, the following JSONata expression attempts to generate a sequence of 100 million numbers. The guardrail configured above would prevent this. + +``` +[1..10000].([1..10000]) +``` + +### Rogue regular expressions + +A number of functions use [regular expressions](regex) to process strings. Alongside the power and flexibility that regexes provide, there are situations whereby badly crafted or malicious expressions could cause the processing engine take an [excessive amount of time](https://en.wikipedia.org/wiki/ReDoS) (exponential to the input string length). Since the regex processing is not implemented in the core JSONata (eval-apply) evaluator, the `timeout` guardrail cannot protect against this. + +It is possible to specify which regex processor is invoked by the JSONata evaluator. This is configured using the `RegexEngine` option. When this is not set, the evaluator will use the default JavaScript [RegExp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) class. + +The [packaged version of JSONata](https://www.npmjs.com/package/jsonata) has no runtime dependencies on other packages, but it is possible to use the `RegexEngine` option to invoke a third-party ReDoS library whenever a regular expression is encountered in a JSONata expression. + +The following code shows how this is done using the [redos-detector](https://github.com/tjenkinson/redos-detector) module: + +```javascript +const jsonata = require('jsonata'); +const redos = require('redos-detector'); + +// Simple wrapper that invokes redos-detector before delegating +// to built-in RegExp class +const SafeRegExp = function(regex) { + if (!redos.isSafe(regex).safe) { + throw { + code: 'U1001', + stack: (new Error()).stack, + value: regex, + message: 'Rejecting regex (potential ReDoS): ' + regex + }; + } + this.regex = regex; + }; + +SafeRegExp.prototype.exec = function(str) { + return this.regex.exec(str); +} + +const data = {JSON: data}; +const options = { + RegexEngine: SafeRegExp +}; + +(async () => { + const expression = jsonata('', options); + const result = await expression.evaluate(data); +})() +``` + +Other similar libraries are available. This is not an endorsement of any particular one. The developer should choose one according to their requirements. diff --git a/src/functions.js b/src/functions.js index 2c6ae16a..a3dbbe36 100644 --- a/src/functions.js +++ b/src/functions.js @@ -12,7 +12,6 @@ const functions = (() => { var isNumeric = utils.isNumeric; var isArrayOfStrings = utils.isArrayOfStrings; var isArrayOfNumbers = utils.isArrayOfNumbers; - var createSequence = utils.createSequence; var isSequence = utils.isSequence; var isFunction = utils.isFunction; var isLambda = utils.isLambda; @@ -382,7 +381,7 @@ const functions = (() => { }; } - var result = createSequence(); + var result = this.createSequence(); if (typeof limit === 'undefined' || limit > 0) { var count = 0; @@ -1490,7 +1489,7 @@ const functions = (() => { return undefined; } - var result = createSequence(); + var result = this.createSequence(); // do the map - iterate over the arrays, and invoke func for (var i = 0; i < arr.length; i++) { var func_args = hofFuncArgs(func, arr[i], i, arr); @@ -1516,7 +1515,7 @@ const functions = (() => { return undefined; } - var result = createSequence(); + var result = this.createSequence(); for (var i = 0; i < arr.length; i++) { var entry = arr[i]; @@ -1659,18 +1658,18 @@ const functions = (() => { * @returns {Array} Array of keys */ function keys(arg) { - var result = createSequence(); + var result = this.createSequence(); if (Array.isArray(arg)) { // merge the keys of all of the items in the array var merge = {}; - arg.forEach(function (item) { - var allkeys = keys(item); + for(var ii = 0; ii < arg.length; ii++) { + var allkeys = keys.call(this, arg[ii]); allkeys.forEach(function (key) { merge[key] = true; }); - }); - result = keys(merge); + } + result = keys.call(this, merge); } else if (arg !== null && typeof arg === 'object' && !isFunction(arg)) { Object.keys(arg).forEach(key => result.push(key)); } @@ -1687,9 +1686,9 @@ const functions = (() => { // lookup the 'name' item in the input var result; if (Array.isArray(input)) { - result = createSequence(); + result = this.createSequence(); for(var ii = 0; ii < input.length; ii++) { - var res = lookup(input[ii], key); + var res = lookup.call(this, input[ii], key); if (typeof res !== 'undefined') { if (Array.isArray(res)) { res.forEach(val => result.push(val)); @@ -1720,7 +1719,7 @@ const functions = (() => { } // if either argument is not an array, make it so if (!Array.isArray(arg1)) { - arg1 = createSequence(arg1); + arg1 = this.createSequence(arg1); } if (!Array.isArray(arg2)) { arg2 = [arg2]; @@ -1747,13 +1746,13 @@ const functions = (() => { * @returns {*} - the array */ function spread(arg) { - var result = createSequence(); + var result = this.createSequence(); if (Array.isArray(arg)) { // spread all of the items in the array - arg.forEach(function (item) { - result = append(result, spread(item)); - }); + for(var ii = 0; ii < arg.length; ii++) { + result = append.call(this, result, spread.call(this, arg[ii])); + } } else if (arg !== null && typeof arg === 'object' && !isLambda(arg)) { for (var key in arg) { var obj = {}; @@ -1819,7 +1818,7 @@ const functions = (() => { * @returns {Array} - the resultant array */ async function each(obj, func) { - var result = createSequence(); + var result = this.createSequence(); for (var key in obj) { var func_args = hofFuncArgs(func, obj[key], key, obj); @@ -2020,7 +2019,7 @@ const functions = (() => { return arr; } - var results = isSequence(arr) ? createSequence() : []; + var results = isSequence(arr) ? this.createSequence() : []; for(var ii = 0; ii < arr.length; ii++) { var value = arr[ii]; diff --git a/src/jsonata.js b/src/jsonata.js index c21eb1e2..4e54a8a4 100644 --- a/src/jsonata.js +++ b/src/jsonata.js @@ -27,7 +27,6 @@ var jsonata = (function() { var isNumeric = utils.isNumeric; var isArrayOfStrings = utils.isArrayOfStrings; var isArrayOfNumbers = utils.isArrayOfNumbers; - var createSequence = utils.createSequence; var isSequence = utils.isSequence; var isFunction = utils.isFunction; var isLambda = utils.isLambda; @@ -50,6 +49,8 @@ var jsonata = (function() { async function evaluate(expr, input, environment) { var result; + environment.base.depth++; + environment.base.guardrails(); var entryCallback = environment.lookup(Symbol.for('jsonata.__evaluate_entry')); if(entryCallback) { await entryCallback(expr, input, environment); @@ -141,6 +142,8 @@ var jsonata = (function() { } + environment.base.depth--; + return result; } @@ -160,7 +163,7 @@ var jsonata = (function() { inputSequence = input; } else { // if input is not an array, make it so - inputSequence = createSequence(input); + inputSequence = environment.base.createSequence(input); } var resultSequence; @@ -201,7 +204,7 @@ var jsonata = (function() { // tuple stream is carrying ancestry information - keep this resultSequence = tupleBindings; } else { - resultSequence = createSequence(); + resultSequence = environment.base.createSequence(); for (ii = 0; ii < tupleBindings.length; ii++) { resultSequence.push(tupleBindings[ii]['@']); } @@ -211,7 +214,7 @@ var jsonata = (function() { if(expr.keepSingletonArray) { // if the array is explicitly constructed in the expression and marked to promote singleton sequences to array if(Array.isArray(resultSequence) && resultSequence.cons && !resultSequence.sequence) { - resultSequence = createSequence(resultSequence); + resultSequence = environment.base.createSequence(resultSequence); } resultSequence.keepSingleton = true; } @@ -249,7 +252,7 @@ var jsonata = (function() { return result; } - result = createSequence(); + result = environment.base.createSequence(); for(var ii = 0; ii < input.length; ii++) { var res = await evaluate(expr, input[ii], environment); @@ -263,7 +266,7 @@ var jsonata = (function() { } } - var resultSequence = createSequence(); + var resultSequence = environment.base.createSequence(); if(lastStep && result.length === 1 && Array.isArray(result[0]) && !isSequence(result[0])) { resultSequence = result[0]; } else { @@ -316,7 +319,7 @@ var jsonata = (function() { result = await evaluateSortExpression(expr, tupleBindings, environment); } else { var sorted = await evaluateSortExpression(expr, input, environment); - result = createSequence(); + result = environment.base.createSequence(); result.tupleStream = true; for(var ss = 0; ss < sorted.length; ss++) { var tuple = {'@': sorted[ss]}; @@ -330,7 +333,7 @@ var jsonata = (function() { return result; } - result = createSequence(); + result = environment.base.createSequence(); result.tupleStream = true; var stepEnv = environment; if(tupleBindings === undefined) { @@ -384,12 +387,12 @@ var jsonata = (function() { * @returns {*} Result after applying predicates */ async function evaluateFilter(predicate, input, environment) { - var results = createSequence(); + var results = environment.base.createSequence(); if( input && input.tupleStream) { results.tupleStream = true; } if (!Array.isArray(input)) { - input = createSequence(input); + input = environment.base.createSequence(input); } if (predicate.type === 'number') { var index = Math.floor(predicate.value); // round it down @@ -486,7 +489,7 @@ var jsonata = (function() { result = evaluateStringConcat(lhs, rhs); break; case '..': - result = evaluateRangeExpression(lhs, rhs); + result = evaluateRangeExpression(lhs, rhs, environment); break; case 'in': result = evaluateIncludesExpression(lhs, rhs); @@ -510,6 +513,9 @@ var jsonata = (function() { async function evaluateUnary(expr, input, environment) { var result; + var focus = { + createSequence: environment.base.createSequence + }; switch (expr.value) { case '-': result = await evaluate(expr.expression, input, environment); @@ -541,7 +547,7 @@ var jsonata = (function() { if(item.value === '[') { result.push(value); } else { - result = fn.append(result, value); + result = fn.append.call(focus, result, value); } } } @@ -571,7 +577,10 @@ var jsonata = (function() { */ function evaluateName(expr, input, environment) { // lookup the 'name' item in the input - return fn.lookup(input, expr.value); + var focus = { + createSequence: environment.base.createSequence + }; + return fn.lookup.call(focus, input, expr.value); } /** @@ -589,8 +598,11 @@ var jsonata = (function() { * @param {Object} input - Input data to evaluate against * @returns {*} Evaluated input data */ - function evaluateWildcard(expr, input) { - var results = createSequence(); + function evaluateWildcard(expr, input, environment) { + var focus = { + createSequence: environment.base.createSequence + }; + var results = focus.createSequence(); if (Array.isArray(input) && input.outerWrapper && input.length > 0) { input = input[0]; } @@ -599,7 +611,7 @@ var jsonata = (function() { var value = input[key]; if(Array.isArray(value)) { value = flatten(value); - results = fn.append(results, value); + results = fn.append.call(focus, results, value); } else { results.push(value); } @@ -636,9 +648,9 @@ var jsonata = (function() { * @param {Object} input - Input data to evaluate against * @returns {*} Evaluated input data */ - function evaluateDescendants(expr, input) { + function evaluateDescendants(expr, input, environment) { var result; - var resultSequence = createSequence(); + var resultSequence = environment.base.createSequence(); if (typeof input !== 'undefined') { // traverse all descendants of this object/array recurseDescendants(input, resultSequence); @@ -900,9 +912,12 @@ var jsonata = (function() { var result = {}; var groups = {}; var reduce = input && input.tupleStream ? true : false; + var focus = { + createSequence: environment.base.createSequence + }; // group the input sequence by 'key' expression if (!Array.isArray(input)) { - input = createSequence(input); + input = focus.createSequence(input); } // if the array is empty, add an undefined entry to enable literal JSON object to be generated if (input.length === 0) { @@ -941,7 +956,7 @@ var jsonata = (function() { } // append it as an array - groups[key].data = fn.append(groups[key].data, item); + groups[key].data = fn.append.call(focus, groups[key].data, item); } else { groups[key] = entry; } @@ -955,7 +970,7 @@ var jsonata = (function() { var context = entry.data; var env = environment; if (reduce) { - var tuple = reduceTupleStream(entry.data); + var tuple = reduceTupleStream(entry.data, environment); context = tuple['@']; delete tuple['@']; env = createFrameFromTuple(environment, tuple); @@ -974,15 +989,18 @@ var jsonata = (function() { return result; } - function reduceTupleStream(tupleStream) { + function reduceTupleStream(tupleStream, environment) { if(!Array.isArray(tupleStream)) { return tupleStream; } var result = {}; + var focus = { + createSequence: environment.base.createSequence + }; Object.assign(result, tupleStream[0]); for(var ii = 1; ii < tupleStream.length; ii++) { for(const prop in tupleStream[ii]) { - result[prop] = fn.append(result[prop], tupleStream[ii][prop]); + result[prop] = fn.append.call(focus, result[prop], tupleStream[ii][prop]); } } return result; @@ -994,7 +1012,7 @@ var jsonata = (function() { * @param {Object} rhs - RHS value * @returns {Array} Resultant array */ - function evaluateRangeExpression(lhs, rhs) { + function evaluateRangeExpression(lhs, rhs, environment) { var result; if (typeof lhs !== 'undefined' && !Number.isInteger(lhs)) { @@ -1033,6 +1051,13 @@ var jsonata = (function() { value: size }; } + if(environment.base.options && size > environment.base.options.sequence) { + throw { + code: "D2015", + stack: (new Error()).stack, + value: size + }; + } result = new Array(size); for (var item = lhs, index = 0; item <= rhs; item++, index++) { @@ -1101,8 +1126,8 @@ var jsonata = (function() { * @param {Object} expr - expression containing regex * @returns {Function} Higher order function representing prepared regex */ - function evaluateRegex(expr) { - var re = new jsonata.RegexEngine(expr.value); + function evaluateRegex(expr, input, environment) { + var re = new environment.base.RegexEngine(expr.value); var closure = function(str, fromIndex) { var result; re.lastIndex = fromIndex || 0; @@ -1365,7 +1390,6 @@ var jsonata = (function() { async function evaluateApplyExpression(expr, input, environment) { var result; - var lhs = await evaluate(expr.lhs, input, environment); if(expr.rhs.type === 'function') { // this is a function _invocation_; invoke it with lhs expression as the first argument @@ -1514,7 +1538,8 @@ var jsonata = (function() { } else if (proc && proc._jsonata_function === true) { var focus = { environment: environment, - input: input + input: input, + createSequence: environment.base.createSequence }; // the `focus` is passed in as the `this` for the invoked function result = proc.implementation.apply(focus, validatedArgs); @@ -1611,11 +1636,11 @@ var jsonata = (function() { }; } if (isLambda(proc)) { - result = partialApplyProcedure(proc, evaluatedArgs); + result = partialApplyProcedure(proc, evaluatedArgs, environment); } else if (proc && proc._jsonata_function === true) { - result = partialApplyNativeFunction(proc.implementation, evaluatedArgs); + result = partialApplyNativeFunction(proc.implementation, evaluatedArgs, environment); } else if (typeof proc === 'function') { - result = partialApplyNativeFunction(proc, evaluatedArgs); + result = partialApplyNativeFunction(proc, evaluatedArgs, environment); } else { throw { code: "T1008", @@ -1670,9 +1695,9 @@ var jsonata = (function() { * @param {Array} args - Arguments * @returns {{lambda: boolean, input: *, environment: {bind, lookup}, arguments: Array, body: *}} Result of partially applied procedure */ - function partialApplyProcedure(proc, args) { + function partialApplyProcedure(proc, args, environment) { // create a closure, bind the supplied parameters and return a function that takes the remaining (?) parameters - var env = createFrame(proc.environment); + var env = createFrame(proc.environment || environment); var unboundArgs = []; proc.arguments.forEach(function (param, index) { var arg = args[index]; @@ -1698,7 +1723,7 @@ var jsonata = (function() { * @param {Array} args - Arguments * @returns {{lambda: boolean, input: *, environment: {bind, lookup}, arguments: Array, body: *}} Result of partially applying native function */ - function partialApplyNativeFunction(native, args) { + function partialApplyNativeFunction(native, args, environment) { // create a lambda function that wraps and invokes the native function // get the list of declared arguments from the native function // this has to be picked out from the toString() value @@ -1711,7 +1736,7 @@ var jsonata = (function() { var bodyAST = parser(body); bodyAST.body = native; - var partial = partialApplyProcedure(bodyAST, args); + var partial = partialApplyProcedure(bodyAST, args, environment); return partial; } @@ -1729,7 +1754,8 @@ var jsonata = (function() { }); var focus = { - environment: env + environment: env, + createSequence: env.base.createSequence }; var result = proc.apply(focus, args); if (isPromise(result)) { @@ -1783,7 +1809,7 @@ var jsonata = (function() { input = focus; // if the input is a JSON array, then wrap it in a singleton sequence so it gets treated as a single input if(Array.isArray(input) && !isSequence(input)) { - input = createSequence(input); + input = this.createSequence(input); input.outerWrapper = true; } } @@ -1863,6 +1889,7 @@ var jsonata = (function() { if(framePushCallback) { framePushCallback(enclosingEnvironment, newFrame); } + newFrame.base = enclosingEnvironment.base; } @@ -1991,6 +2018,8 @@ var jsonata = (function() { "D1009": "Multiple key definitions evaluate to same key: {{value}}", "D1010": "Attempted to access the Javascript object prototype", // Javascript specific "T1010": "The matcher function argument passed to function {{token}} does not return the correct object structure", + "D1011": "Stack overflow. Check for non-terminating recursive function. Consider rewriting as tail-recursive", + "D1012": "Evaluation timeout after {{value}} milliseconds. Check for infinite loop", "T2001": "The left side of the {{token}} operator must evaluate to a number", "T2002": "The right side of the {{token}} operator must evaluate to a number", "T2003": "The left side of the range operator (..) must evaluate to an integer", @@ -2004,7 +2033,8 @@ var jsonata = (function() { "T2011": "The insert/update clause of the transform expression must evaluate to an object: {{value}}", "T2012": "The delete clause of the transform expression must evaluate to a string or array of strings: {{value}}", "T2013": "The transform expression clones the input object using the $clone() function. This has been overridden in the current scope by a non-function.", - "D2014": "The size of the sequence allocated by the range operator (..) must not exceed 1e6. Attempted to allocate {{value}}.", + "D2014": "The size of the sequence allocated by the range operator (..) must not exceed 1e7. Attempted to allocate {{value}}.", + "D2015": "The maximum sequence length of {{value}} was exceeded.", "D3001": "Attempting to invoke string function on Infinity or NaN", "D3010": "Second argument of replace function cannot be an empty string", "D3011": "Fourth argument of replace function must evaluate to a positive number", @@ -2078,6 +2108,9 @@ var jsonata = (function() { * @param {Object} options * @param {boolean} options.recover: attempt to recover on parse error * @param {Function} options.RegexEngine: RegEx class constructor to use + * @param {Integer} options.timeout: evaluation timeout + * @param {Integer} options.stack: max stack depth + * @param {Integer} options.sequence: max sequence length * @returns {{evaluate: evaluate, assign: assign}} Evaluated expression */ function jsonata(expr, options) { @@ -2102,12 +2135,6 @@ var jsonata = (function() { return timestamp.getTime(); }, '<:n>')); - if(options && options.RegexEngine) { - jsonata.RegexEngine = options.RegexEngine; - } else { - jsonata.RegexEngine = RegExp; - } - return { evaluate: async function (input, bindings, callback) { // throw if the expression compiled with syntax errors @@ -2137,13 +2164,69 @@ var jsonata = (function() { // the $now() and $millis() functions will return this value - whenever it is called timestamp = new Date(); exec_env.timestamp = timestamp; + exec_env.options = options; + + exec_env.createSequence = function() { + var sequence = []; + if (options && options.sequence) { + sequence.push = function(...items) { + if(sequence.length + items.length > options.sequence) { + throw { + code: "D2015", + stack: (new Error()).stack, + value: options.sequence + }; + } + return Array.prototype.push.apply(sequence, items); + }; + } + sequence.sequence = true; + if (arguments.length === 1) { + sequence.push(arguments[0]); + } + return sequence; + } + // if the input is a JSON array, then wrap it in a singleton sequence so it gets treated as a single input if(Array.isArray(input) && !isSequence(input)) { - input = createSequence(input); + input = exec_env.createSequence(input); input.outerWrapper = true; } + if (options && (options.timeout || options.stack)) { + const time = Date.now(); + exec_env.guardrails = function() { + if (options.stack > 0 && exec_env.depth > options.stack) { + // stack too deep + throw { + code: 'D1011', + value: options.stack, + stack: (new Error()).stack + }; + } + if (options.timeout > 0 && Date.now() - time > options.timeout) { + // expression has run for too long + throw { + code: 'D1012', + value: options.timeout, + stack: (new Error()).stack + }; + } + + } + } else { + exec_env.guardrails = function() {}; + } + exec_env.base = exec_env; + exec_env.depth = 0; + + if(options && options.RegexEngine) { + exec_env.RegexEngine = options.RegexEngine; + } else { + exec_env.RegexEngine = RegExp; + } + var it; try { it = await evaluate(ast, input, exec_env); diff --git a/src/utils.js b/src/utils.js index 39d824b6..15cb9792 100644 --- a/src/utils.js +++ b/src/utils.js @@ -54,19 +54,6 @@ const utils = (() => { return result; } - /** - * Create an empty sequence to contain query results - * @returns {Array} - empty sequence - */ - function createSequence() { - var sequence = []; - sequence.sequence = true; - if (arguments.length === 1) { - sequence.push(arguments[0]); - } - return sequence; - } - /** * Tests if a value is a sequence * @param {*} value the value to test @@ -204,7 +191,6 @@ const utils = (() => { isNumeric, isArrayOfStrings, isArrayOfNumbers, - createSequence, isSequence, isFunction, isLambda, diff --git a/test/implementation-tests.js b/test/implementation-tests.js index 784d9b67..3c2f292f 100644 --- a/test/implementation-tests.js +++ b/test/implementation-tests.js @@ -1001,12 +1001,15 @@ describe("Test that yield platform specific results", () => { describe("Tests that include infinite recursion", () => { describe("stack overflow - infinite recursive function - non-tail call", function() { it("should throw error", function() { - var expr = jsonata("(" + " $inf := function($n){$n+$inf($n-1)};" + " $inf(5)" + ")"); - timeboxExpression(expr, 1000, 300); + const options = { + 'timeout': 1000, + 'stack': 300 + } + const expr = jsonata("($inf := function($n){$n+$inf($n-1)}; $inf(5))", options); expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ token: "inf", - position: 32, - code: "U1001", + position: 30, + code: "D1011", }); }); }); @@ -1014,11 +1017,96 @@ describe("Tests that include infinite recursion", () => { describe("stack overflow - infinite recursive function - tail call", function() { this.timeout(5000); it("should throw error", function() { - var expr = jsonata("( $inf := function(){$inf()}; $inf())"); - timeboxExpression(expr, 1000, 500); + const options = { + 'timeout': 1000, + 'stack': 500 + } + const expr = jsonata("( $inf := function(){$inf()}; $inf())", options); expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ token: "inf", - code: "U1001", + code: "D1012", + }); + }); + }); + + describe("stack overflow - infinite recursive function - tail call (no stack guardrail)", function() { + this.timeout(5000); + it("should throw error", function() { + const options = { + 'timeout': 1000 + } + const expr = jsonata("( $inf := function(){$inf()}; $inf())", options); + expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ + token: "inf", + code: "D1012", + }); + }); + }); + + describe("guardrails on Ackermann function", function() { + this.timeout(5000); + const ackermann = (m, n) => ` + ( + $ack := function($m, $n) { + $m = 0 ? $n + 1 : + $n = 0 ? $ack($m - 1, 1) : + $ack($m - 1, $ack($m, $n - 1)) + }; + + $ack(${m}, ${n}) + )`; + + it("should complete for small parameters", async function() { + const options = { + 'timeout': 1000, + 'stack': 500 + } + const expr = jsonata(ackermann(3, 4), options); + const result = await expr.evaluate(); + expect(result).to.equal(125); + }); + + it("larger inputs cause stack overflow", function() { + const options = { + 'stack': 500 + } + const expr = jsonata(ackermann(4, 4), options); + expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ + token: "ack", + code: "D1011", + }); + }); + + it("larger inputs cause stack overflow", function() { + const options = { + 'stack': 500 + } + const expr = jsonata(ackermann(4, 4), options); + expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ + token: "ack", + code: "D1011", + }); + }); + }); + + describe("guardrails on sequence length", function() { + it("prevents large ranges", function() { + const options = { + 'sequence': 1000 + } + const expr = jsonata('[0..1001]', options); + expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ + code: "D2015", + }); + }); + + it("prevents large intermediate sequences", function() { + const options = { + 'sequence': 1000 + } + const expr = jsonata('[0..100].([0..100]) ~> count()', options); + expect(expr.evaluate()).to.eventually.be.rejected.to.deep.contain({ + code: "D2015", }); }); }); @@ -1038,46 +1126,3 @@ describe("Tests that use internal frame push callbacks", () => { }); }); }); - -/** - * Protect the process/browser from a runnaway expression - * i.e. Infinite loop (tail recursion), or excessive stack growth - * - * @param {Object} expr - expression to protect - * @param {Number} timeout - max time in ms - * @param {Number} maxDepth - max stack depth - */ -function timeboxExpression(expr, timeout, maxDepth) { - var depth = 0; - var time = Date.now(); - - var checkRunnaway = function() { - if (depth > maxDepth) { - // stack too deep - throw { - message: - "Stack overflow error: Check for non-terminating recursive function. Consider rewriting as tail-recursive.", - stack: new Error().stack, - code: "U1001" - }; - } - if (Date.now() - time > timeout) { - // expression has run for too long - throw { - message: "Expression evaluation timeout: Check for infinite loop", - stack: new Error().stack, - code: "U1001" - }; - } - }; - - // register callbacks - expr.assign(Symbol.for('jsonata.__evaluate_entry'), function() { - depth++; - checkRunnaway(); - }); - expr.assign(Symbol.for('jsonata.__evaluate_exit'), function() { - depth--; - checkRunnaway(); - }); -} diff --git a/test/parser-pluggable-regex.js b/test/parser-pluggable-regex.js index efea9b04..390a71ea 100644 --- a/test/parser-pluggable-regex.js +++ b/test/parser-pluggable-regex.js @@ -5,6 +5,7 @@ var assert = require('assert'); var chai = require("chai"); var chaiAsPromised = require("chai-as-promised"); chai.use(chaiAsPromised); +var expect = chai.expect; describe('Invoke parser with custom RegexEngine param', function() { @@ -27,3 +28,45 @@ describe('Invoke parser with custom RegexEngine param', function() { assert.deepEqual(regexEvalSpy, "foo"); }); }); + +describe('Detect and repel ReDoS attack', function() { + const SafeRegExp = function(regex) { + // Perform static analysis on the regex before it's used. + // Trivial check for test purposes - use a suitable ReDoS library + if (regex.toString().includes('(a+)+')) { + throw { + code: 'U1001', + stack: (new Error()).stack, + value: regex, + message: 'Rejecting regex (potential ReDoS): ' + regex + }; + } + this.regex = regex; + }; + + SafeRegExp.prototype.exec = function(str) { + return this.regex.exec(str); + } + + it('should successfully process a safe regex', async function() { + const safeExpr = jsonata('$contains(data, /^a+$/)', {RegexEngine: SafeRegExp}); + const data = {data: 'aaaaaaaaaaaaaaa'}; + const result = await safeExpr.evaluate(data); + expect(result).to.be.true; + }) + + it('should behave like the build-in regex processor (copied from regex test case)', async function() { + const expr = jsonata('$match("ababbabbcc",/a(b+)/, 1)'); + const result = await expr.evaluate(); + const expected = { match: "ab", index: 0, groups: ["b"] }; + expect(result).to.deep.equal(expected); + }) + + it('should reject a potential ReDoS attack', async function() { + const safeExpr = jsonata('$contains(data, /^(a+)+$/)', {RegexEngine: SafeRegExp}); + const data = {data: 'aaaaaaaaaaaaaaa'}; + expect(safeExpr.evaluate(data)).to.eventually.be.rejected.to.deep.contain({ + code: 'U1001', + }); + }) +})