Skip to content

Embedding.from_config does not validate that input_dim and `output_d#22716

Closed
cantenesse wants to merge 4 commits intokeras-team:masterfrom
cantenesse:swe/session-4-bug-embedding-from-config-accepts-float
Closed

Embedding.from_config does not validate that input_dim and `output_d#22716
cantenesse wants to merge 4 commits intokeras-team:masterfrom
cantenesse:swe/session-4-bug-embedding-from-config-accepts-float

Conversation

@cantenesse
Copy link
Copy Markdown

Problem

Embedding.from_config does not validate that input_dim and output_dim are Python ints before constructing the layer, allowing float values (e.g. 3.7) to be silently accepted as valid configuration, potentially causing silent truncation or unexpected behavior when the layer is built.

Approach

Add explicit type validation in Embedding.from_config (keras/src/layers/core/embedding.py) to check that input_dim and output_dim in the config dict are Python int instances before calling super().from_config(config). Raise a descriptive ValueError matching the style of the existing validation in __init__ if either value is not an int. Add corresponding test cases in embedding_test.py that call Embedding.from_config with float values and assert a ValueError is raised.

Review comments

  • WARN keras/src/layers/core/embedding.py:348: If input_dim or output_dim is absent from the config dict, config.get(dim_name) returns None, causing the validator to raise "input_dim must be a Python int. Received: input_dim=None (of type <class 'NoneType'>)". A missing required key is a different error condition from a wrong type, and the current message is misleading. Consider checking if dim_name not in config first and raising a KeyError or a more descriptive ValueError, or at minimum noting the missing-key case in the message.

  • NIT keras/src/layers/core/embedding.py:348: Python's bool is a subclass of int, so isinstance(True, int) returns True. input_dim=True or output_dim=False would silently pass validation as 1 and 0 respectively. This is consistent with the __init__ validation added in earlier sessions, but if stricter rejection is desired, add and not isinstance(value, bool) to each check.

  • NIT keras/src/layers/core/embedding_test.py:882: test_from_config_accepts_valid_int_dims passes a bare minimal config {"input_dim": 4, "output_dim": 3} missing many keys that from_config may expect (e.g. quantization_config deserialization path). If serialization_lib.deserialize_keras_object(None) or subsequent layer construction raises for other reasons, this test fails for reasons unrelated to type validation. Consider deriving the config from Embedding(4, 3).get_config() to test the real round-trip path.


Generated by swe session #4

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 18, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces explicit type validation for input_dim and output_dim in the Embedding layer, ensuring they are Python integers during both initialization and deserialization via from_config. Corresponding unit tests have been added to verify that non-integer values trigger a ValueError. The review feedback points out that because booleans are a subclass of int in Python, the current isinstance checks will silently accept True or False as valid dimensions. Suggestions were provided to explicitly exclude booleans from the validation logic and to include boolean edge cases in the test suite.

Comment on lines +106 to +115
if not isinstance(input_dim, int):
raise ValueError(
"`input_dim` must be a Python int. "
f"Received: input_dim={input_dim!r} (of type {type(input_dim)})"
)
if not isinstance(output_dim, int):
raise ValueError(
"`output_dim` must be a Python int. "
f"Received: output_dim={output_dim!r} (of type {type(output_dim)})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Python, bool is a subclass of int, so isinstance(value, int) returns True for booleans. This allows True or False to be silently accepted as valid dimensions (e.g., input_dim=True becomes 1), which is almost certainly a user error. To ensure robust validation of "Python int" dimensions, booleans should be explicitly excluded.

Suggested change
if not isinstance(input_dim, int):
raise ValueError(
"`input_dim` must be a Python int. "
f"Received: input_dim={input_dim!r} (of type {type(input_dim)})"
)
if not isinstance(output_dim, int):
raise ValueError(
"`output_dim` must be a Python int. "
f"Received: output_dim={output_dim!r} (of type {type(output_dim)})"
)
if not isinstance(input_dim, int) or isinstance(input_dim, bool):
raise ValueError(
"`input_dim` must be a Python int. "
f"Received: input_dim={input_dim!r} (of type {type(input_dim)})"
)
if not isinstance(output_dim, int) or isinstance(output_dim, bool):
raise ValueError(
"`output_dim` must be a Python int. "
f"Received: output_dim={output_dim!r} (of type {type(output_dim)})"
)

Comment on lines +866 to +872
def test_input_dim_output_dim_must_be_int(self):
with self.assertRaisesRegex(ValueError, "input_dim"):
layers.Embedding(input_dim=3.7, output_dim=2)
with self.assertRaisesRegex(ValueError, "output_dim"):
layers.Embedding(input_dim=3, output_dim=2.0)
# Valid ints should not raise
layers.Embedding(input_dim=3, output_dim=2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to add test cases for boolean values to ensure they are correctly rejected, as they are technically subclasses of int in Python but invalid as layer dimensions.

Suggested change
def test_input_dim_output_dim_must_be_int(self):
with self.assertRaisesRegex(ValueError, "input_dim"):
layers.Embedding(input_dim=3.7, output_dim=2)
with self.assertRaisesRegex(ValueError, "output_dim"):
layers.Embedding(input_dim=3, output_dim=2.0)
# Valid ints should not raise
layers.Embedding(input_dim=3, output_dim=2)
def test_input_dim_output_dim_must_be_int(self):
for val in [3.7, True, False]:
with self.assertRaisesRegex(ValueError, "input_dim"):
layers.Embedding(input_dim=val, output_dim=2)
for val in [2.0, True, False]:
with self.assertRaisesRegex(ValueError, "output_dim"):
layers.Embedding(input_dim=3, output_dim=val)
# Valid ints should not raise
layers.Embedding(input_dim=3, output_dim=2)

@hertschuh
Copy link
Copy Markdown
Collaborator

Please accept the CLA and Contributor Agreement

@hertschuh
Copy link
Copy Markdown
Collaborator

Fixed by #22718

@hertschuh hertschuh closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants