Skip to content

Hi ITN: Implement Roman class#443

Open
mayuris-00 wants to merge 2 commits into
NVIDIA:staging/hi_itn_v3from
mayuris-00:hi-itn-roman
Open

Hi ITN: Implement Roman class#443
mayuris-00 wants to merge 2 commits into
NVIDIA:staging/hi_itn_v3from
mayuris-00:hi-itn-roman

Conversation

@mayuris-00

Copy link
Copy Markdown

What does this PR do ?

Implement a Hindi inverse text normalization (ITN) Roman numeral class. It converts spoken numbers that follow a small, fixed set of context key words (chapter / section / class numbering) into Roman numerals.

The conversion is deliberately restricted to these predictable contexts; regnal, papal and product names (e.g. भास्कर-II) are a documented limitation because the same spoken number is ambiguous between Arabic and Roman form.

अध्याय तीन    -> अध्याय III
कक्षा बारह    -> कक्षा XII
खंड पाँच      -> खंड V
अध्याय निन्यानवे -> अध्याय XCIX

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR? (pytest --cpu tests/nemo_text_processing/hi/test_roman.py)
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk?
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you added the correct license header to all newly added Python files?

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

Add an inverse-text-normalization Roman numeral class for Hindi that
converts spoken numbers following a fixed set of context key words
(अध्याय / खंड / खण्ड / कक्षा) into Roman numerals.

  अध्याय तीन  -> अध्याय III
  कक्षा बारह  -> कक्षा XII

Conversion is restricted to these predictable contexts; regnal/papal and
product names are a documented limitation. Mirrors the TN Roman class
structure (data/__init__.py, license headers, pytest + sparrowhawk tests).

Signed-off-by: Mayuri S <mayuris@nvidia.com>
Comment thread nemo_text_processing/inverse_text_normalization/hi/taggers/roman.py Outdated
Comment thread nemo_text_processing/inverse_text_normalization/hi/taggers/roman.py Outdated
Comment thread nemo_text_processing/inverse_text_normalization/hi/taggers/roman.py Outdated
Comment thread nemo_text_processing/inverse_text_normalization/hi/taggers/roman.py Outdated
Signed-off-by: Mayuri S <mayuris@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants