python - Canonicalisation of usernames -


what best way canonical representation of username idempotent?

i want avoid having same issue spotify: http://labs.spotify.com/2013/06/18/creative-usernames/

i'm looking library in python. prefer not spotify ended doing (running canonicalisation twice test if idempotent), , importing twisted project tad overkill, there stand-alone library this?

would using email addresses instead preferred when comes usernames? how major sites/companies deal this?

first should read wikipedia's article on unicode equivalence. explains caveats , normalization methods there represent unicode string in canonical form.

then can use python's built-in module unicodedata normalization of unicode string preferred normalization form.

a code example:

>>> import unicodedata >>> unicodedata.normalize('nfkc', u'ffñⅨffi⁵kaÅéᴮᴵᴳᴮᴵᴿᴰ') 'ffñixffi5kaÅébigbird' >>> unicodedata.normalize('nfkc', u'ffñⅨffi⁵kaÅéᴮᴵᴳᴮᴵᴿᴰ').lower() 'ffñixffi5kaåébigbird' 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

html - Unable to style the color of bullets in a list -

c# - must be a non-abstract type with a public parameterless constructor in redis -