How to convert Word Document files into plain-text files

In order to use the contents of a Word Document (“.doc” or “.docx” extension) in a concordancer it must be converted or saved as a plain text file (“.txt” extension). I will outline two different ways you can do this below.

Method 1 (recommended)

  1. open the document in Word,
  2. do a “select all” (ctrl+A),
  3. “copy” (ctrl+C),
  4. open Notepad (found in Start > All Programs > Accessories),
  5. “paste” (ctrl+V) the content into Notepad,
  6. save the file

Method 2

  1. open the document in Word,
  2. do a “Save as” in Word (goto File > Save as),
  3. select “Save as type” (see image) as “plain text”,
  4. click “Save”,
  5. when the dialogue box appears (for non-English OSs) check “allow character substitution” and then click “OK”,

This can be tedious however if you have many files to convert. There are freeware programs that can automate this task. But please be careful as some programs available may be malicious, that is, adware, malware or spyware.

<< back to the AntConc Tutorial Page

Published by

39 responses to “How to convert Word Document files into plain-text files”

  1. Use regex or a find and replace to do this.

    Like

  2. If you need to open .docx try OpenOffice. It will let you open .docx . Good luck.

    Like

  3. So many thank you’s. This doesn’t help if you have received a .docx file but you don’t have Word on your computer. “Open Word” God must hate me. Everywhere I look on the internet gives that as the first step. […]

    Like

  4. You are talking ‘regex’ or ‘regular expressions’. These characters pertaining to layout and formatting of texts. Do a search of these terms and you will find your answer. It can be done in Microsoft Word but it takes a bit of getting used to.

    Like

  5. I would like to have some pseudo formatting in the textfile like empty lines between paragraphs, underlines as a second line with dashes and a character in front of lines of a list. Does anybody know a toll that does this? Example:

    This is an underlined Header
    ————————————-

    See this list:

    * entry 1
    * entry 2
    * entry 3

    Like

  6. I worked as a proposal desktop publisher for years, through all the different versions of Word. Lots of version compatibility problems! It seems that, as soon as I learn one version and all the tricks about how to handle it, here comes another updated Word. Sigh.

    Like

  7. You can try something like SpiceLogic. It is the only one I have tried and worked but that was a while back. Document format has also changed so it may not have kept pace.

    Otherwise do a “doc-to-txt” search on your favourite search engine to find the latest. Good luck.

    Like

  8. wow!

    just kidding.

    Now, for real, how do you convert 50 word documents into .txt files at once?

    Like

  9. Method 1 and method 2 results are not the same. Only method 1 gives true plain text (try to save table with method 2 to see the difference).

    Like

  10. David,
    Thanks for this.

    When I wrote this I had been talking about English corpus linguistics. Much of the problems people had had to do with unwanted Japanese unicode encoded punctuation (apostrophes were a big problem), which is why I showed this method. So I guess I need to write another post to clarify this point.

    Like

  11. To preserve international characters (with accents, etc.), save as Unicode, not as ASCII (“Text”). ASCII is the original coding dating back to the dawn of the computer age; it uses one byte per character (7 of the 8 bits), for a maximum alphabet of 128 possible characters. (Mac and Windows each started using the other 128 for differing sets of special characters, which is why “curly quotes” on one machine will come up as odd characters on the other.)

    Unicode uses two bytes per character, allowing for 65,000 characters in a typeface, ample for all the alphabetic languages in the world. (That’s a lot of characters for a typeface designer!)

    Like

  12. There is also an online Word converter available using you can create and convert your document to any other format.

    Like

  13. It should be there (look for ‘.txt’). Otherwise use the other method.

    Like

  14. How about in Microsoft word 2007 there is no plain text

    Like

  15. If you need access to Word or Excel files you can use the suite of software from openoffice.org which will first allow you access them then save them as a .txt or .csv file then you can read them from a text editor like Notepad.

    Hope that helps.

    Like

  16. I no longer have microsoft word but i have a lot of exel comma seperated values files can i convert them to text files?

    Like

  17. You’re welcome. Glad this page can help so many people.

    Like

  18. Mirana Nightshade

    Thank you so so so much!!!!!!!!!!!!!!!!!!!mwah.. it really helped me.. it’s funny but it took me 2 days on how to figure out this !! hehe ;D

    Like

  19. Ashok,
    Yes, but I am talking about English only here. Sorry, but I cannot help with other languages.

    Like

  20. Hello,

    if we copy paste from word to txt some of the special characther will go out.
    for example copy the acute char from word to txt

    Like

  21. Glad to have solved your problem. It isn’t a pretty method but it works.

    Like

  22. Thanks so much! With the Newgrounds redesign, all of my stories were looking ridiculous, because it kept changing my special characters to jumbled up nonsense!

    Like

  23. Thank you point out what should have been obvious. I got it as soon as I read the instructions.

    Like

  24. Sorry I don’t know anything about C. But there is a program which I have been using recently called Zilla Word to Text which can convert multiple files simply. The output is usable but some characters don’t convert right for my purposes.

    Hope that helps.

    Like

  25. How can I do this …….. Programmatically using … C/ C++?

    Like

  26. It’s really a good tip for me because when i copy something from word file in wordpress it really mess.

    Thanks for sharing it.

    Like

  27. Nayyara,
    Pasting into Notepad will work … if you wait long enough. From personal experience I have waited one night for a file to paste. Your computer seemed to have hanged but it hasn’t. It is working hard to paste everything. To see that it is working open up Task Manager (‘Ctrl’, ‘Alt’ and ‘Delete’ button pressed simultaneously ONCE ONLY. Vista requires you to click the link to Task Manager). Click the ‘CPU’ column to bring up all the working processes. You should see Notepad and/or Word working hard. This is a good sign which means your wait will not be in vain.

    Pasting to Notepad will give you a cleaner result than saving in Word. Also pasting in Vista is much faster than XP.

    Good luck.

    Like

  28. Both the methods are not working for me because I have a very large file more than 6600 pages of word document (moreover unicode text). The problem is if I use Method 1 the MS word just hangs up. Method 2 also does not work, when I try to paste in text file, it pastes nothing. I think my data is too big for clipboard. I can copy paste the text by parts but it is taking just too long. Do you have any idea to handle the issue?

    Like

  29. I am glad this page is of help. I never expected it to be. But this page is the most popular page on Corpora by a long way.

    To be honest, I only use Method Two

    The reason I recommended Method One is because I thought most people who use Word would find this easier and more understandable as it involves fewer software and less external tools and know-how (how many people can’t fathom CTRL-a, -c and -v).

    I was wrong.

    The truth is Method Two is more accurate and, in my opinion, better.

    I am sure you too have also found Method Two more accurate. This is because Notepad cannot and does not handle meta-information which is the root of the problem.

    It goes to show I should have stayed with my initial judgement.

    I have changed my recommendation as I now understand people have also found (as I had) the Word-method error-ridden and annoying … and really not better at all.

    Like

  30. Thanks a lot. I had to use Method 2. It works well.

    Like

  31. hey this is so simple.thank you very very much.i wasted so much of my time without knowing this.once again thank q.

    Like

  32. Thanks so much! I had to do Method 2. So appreciate your help :)

    Like

  33. I thank you so much for this info; I can’t convey to you how frustrating it is to see a job you are fully qualified for and cannot send a “readable” resume to a prospectful employer.
    Your quick and easy tutorial really helped me out!

    Steve

    Like

  34. Thank you so much. I have been trying to figure this out for months and your explanation made it as easy as possible. Again my thanks!

    Like

  35. a thank you also. had to put a word doc into a folder for someone and it had to be in .txt format. didn’t know how. now I do. thanks, again.

    Like

  36. My pleasure, Green Frog.

    Like

  37. Thank you so much! I had spent hours trying to figure out how to convert document to plain text. Method 2 worked. I did not have the option in the first method. Again – Thanks!

    Like

  38. It is so simple yet I couldn’t figure out how to do it. Let alone not wanting to pay for some cheesy program to do it for me. Thanks

    Like

Leave a comment