Skip to main content

normalize_sentence

Function normalize_sentence 

Source
pub fn normalize_sentence(s: &str) -> String
Expand description

Normalize a sentence for map_store lookup. Rules:

  • ASCII alphanumerics are kept verbatim (case preserved).
  • ' and , are dropped without inserting a space, so contractions (we'veweve) and thousands separators (1,0001000) stay one token.
  • Any other character (whitespace, _, -, ., !, non-ASCII, …) is treated as a word boundary; runs collapse to a single space and leading/trailing spaces are trimmed.