python - Canonicalisation of usernames -
what best way canonical representation of username idempotent?
i want avoid having same issue spotify: http://labs.spotify.com/2013/06/18/creative-usernames/
i'm looking library in python. prefer not spotify ended doing (running canonicalisation twice test if idempotent), , importing twisted project tad overkill, there stand-alone library this?
would using email addresses instead preferred when comes usernames? how major sites/companies deal this?
first should read wikipedia's article on unicode equivalence. explains caveats , normalization methods there represent unicode string in canonical form.
then can use python's built-in module unicodedata normalization of unicode string preferred normalization form.
a code example:
>>> import unicodedata >>> unicodedata.normalize('nfkc', u'ffñⅨffi⁵kaÅéᴮᴵᴳᴮᴵᴿᴰ') 'ffñixffi5kaÅébigbird' >>> unicodedata.normalize('nfkc', u'ffñⅨffi⁵kaÅéᴮᴵᴳᴮᴵᴿᴰ').lower() 'ffñixffi5kaåébigbird'
Comments
Post a Comment