Wordbreakingisacinchwithdata

For the task of word-breaking, many different approaches exist.  Today we’re writing about a purely data-driven approach, and it’s actually quite straightforward — all we do is a consider every character boundary as a potential for a word boundary, and compare the relative joint probabilities, with no insertion penalty applied.  A data-driven approach is great…

0