filefield - Django python-magic identify ppt,docx,word uploaded file as 'application/zip' -
i trying identify file type of files uploaded, after searching, plan use python-magic check mime types of file.
the filefield
used in models, modelform
used save files.
after files have been uploaded, check mime type in python shell
i find using
magic.from_file("path_to_the_file", mime=true)
woud give expected mime type image,txt,pdf files have been saved.
however, docx, ppt, excel files, identify them 'application/zip'
can explain why happening(the django auto save ms files zip??). , there way make magic identify docx, ppt, excel files are?
thank much.
i came across issue recently. python-magic uses unix command file
uses database file identify documents (see man file
). default database not include instructions on how identify .docx, .pptx, , .xlsx file types.
you can give additional information file
command identify these types adding instructions /etc/magic (see https://serverfault.com/a/377792).
this should work:
magic.from_file("path_to_the_file.docx", mime=true)
returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
one thing note python-magic usage instruction on github - not seem work .docx, .pptx, , .xlsx file types (with additional information in /etc/magic):
magic.from_buffer(open("testdata/test.pdf").read(1024), mime=true)
returns 'application/zip'
it seems need give more data correctly identify these file types:
magic.from_buffer(open("testdata/test.pdf").read(2000), mime=true)
returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
i'm not sure of exact amount needed.
Comments
Post a Comment