Number value literal lookahead restrictions

At a glance

Identifier: #601
Stage: RFC3: Accepted
Champion: @leebyron
PR: Number value literal lookahead restrictions

Timeline

Added to 2019-10-10 WG agenda
Mentioned in 2019-10-10 WG notes
Added to 2019-09-12 WG agenda
Mentioned in 2019-09-12 WG notes
Added to 2019-08-01 WG agenda
Spec PR created on 2019-07-30 by leebyron
Commit pushed: RFC: Number value literal lookahead restrictions on 2019-07-30 by @leebyron

This RFC proposes adding a lookahead restriction to the IntValue and FloatValue lexical grammars to not allow following a number with a letter.

Problem:

Currently there are some language ambiguities and underspecification for lexing numbers which each implementation has handled slightly differently.

Because commas are optional and white space isn't required between tokens, these two snippets are equivalent: [123, abc], [123abc]. This may be confusing to read, but it should parse correctly. However the opposite is not true, since digits may belong in a Name, the following two are not equivalent: [abc, 123], [abc123]. This could lead to mistakes.

Ambiguity and underspecification enter when the Name starts with "e", since "e" indicats the beginning of an exponent in a FloatValue. 123efg is a lexical error in GraphQL.js which greedily starts to lex a FloatValue when it encounters the "e", however you might also expect it to validly lex (123, efg) and some implementations might do this.

Further, other languages offer variations of numeric literals which GraphQL does not support, such as hexidecimal literals. The input 0x1F properly lexes as (0, x, 1, F) however this is very likely a confusing syntax error. A similar issue exists for some languages which allow underscores in numbers for readability, 1_000 lexes a 1 and _ but fails when 000 is not a valid number.

Proposed Solution:

Add a lookahead restriction to IntValue and FloatValue to disallow any NameStart character (including letters and _) to follow.

This makes it clear that 1e5 can only possibly be one FloatValue and not three tokens, makes lexer errors specified clearly to remove ambiguity, and provides clear errors for mistaken input.

Precedent

Javascript applies this same restriction for similar reasons, I believe originally to produce an early error if C-style typed literals were used in a Javascript program.

https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-numeric-literals

Cost of change

While this is technically a breaking change to the language grammar, it seeks to restrict cases that are almost certainly already producing either syntax or validation errors.

This is different from the current implementation of GraphQL.js and I believe other parsers, and will require minor implementation updates.

At a glance​

Timeline​

At a glance

Timeline