Skip to content

find_diseases returns mismatched MeSH UI <-> heading (e.g.: “Breast Neoplasms” reported as D000072656) #5

@4th05

Description

@4th05

Description

medical_named_entity_recognition.find_diseases (medical-named-entity-recognition==0.4) appearently returns incorrect MeSH ID <-> heading mappings. Ex.: the text mention "breast cancer" is returned as name "Breast Neoplasms" with mesh_id D000072656, and "cancer" is returned as name "Neoplasms" with mesh_id D009362. These IDs do not match the official MeSH headings (see links below), so disease normalization is unreliable.

MeSH Browser evidence:

Environment

  • Package: medical-named-entity-recognition==0.4
  • OS: macOS (Apple Silicon)
  • Python: 3.11.9
  • Install method: pip

How to Reproduce

  1. Run the following code:
    from medical_named_entity_recognition import find_diseases
    import re
    
    RE_TOKENISE = re.compile(r"((?:\w|'|’)+)")
    
    text = "mouse models of human breast cancer"
    tokens = RE_TOKENISE.findall(text.lower())
    print(tokens)
    
    for d, i, j in find_diseases(tokens):
        print(d, i, j)
  2. Observe the returned dicts for matching_string = “breast cancer” and “cancer”.
    ['mouse', 'models', 'of', 'human', 'breast', 'cancer']
    
    {'mesh_id': 'D000072656', 'name': 'Breast Neoplasms', 'matching_string': 'breast cancer', ...}
    {'mesh_id': 'D009362', 'name': 'Neoplasms', 'matching_string': 'cancer', ...}
    

Expected Behaviour

  • The mention “breast cancer” / heading “Breast Neoplasms” should map to MeSH ID D001943 (not D000072656).
  • The heading “Neoplasms” should map to MeSH ID D009369 (not D009362).
  • More generally, returned mesh_id values should correspond to the MeSH Browser heading for that ID.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions