filefield - Django python-magic identify ppt,docx,word uploaded file as 'application/zip' -


i trying identify file type of files uploaded, after searching, plan use python-magic check mime types of file.

the filefield used in models, modelform used save files.

after files have been uploaded, check mime type in python shell

i find using

magic.from_file("path_to_the_file", mime=true) 

woud give expected mime type image,txt,pdf files have been saved.

however, docx, ppt, excel files, identify them 'application/zip'

can explain why happening(the django auto save ms files zip??). , there way make magic identify docx, ppt, excel files are?

thank much.

i came across issue recently. python-magic uses unix command file uses database file identify documents (see man file). default database not include instructions on how identify .docx, .pptx, , .xlsx file types.

you can give additional information file command identify these types adding instructions /etc/magic (see https://serverfault.com/a/377792).

this should work:

magic.from_file("path_to_the_file.docx", mime=true) 

returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'

one thing note python-magic usage instruction on github - not seem work .docx, .pptx, , .xlsx file types (with additional information in /etc/magic):

magic.from_buffer(open("testdata/test.pdf").read(1024), mime=true) 

returns 'application/zip'

it seems need give more data correctly identify these file types:

magic.from_buffer(open("testdata/test.pdf").read(2000), mime=true) 

returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'

i'm not sure of exact amount needed.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -