Cleaning up a Word doc

This applies to using MS word or OO writer, the final outcome is to have a .doc file that is well formatted. Currently it is better to have this file as .doc and not .docx.

I am going to assume that the .doc file you are starting with is poorly formatted, that bolds have been added, that indents are being done by tabs and centering of text is manually applied.

The first thing we need to do is create a totally virgin document.

  1. Open your document in word
  2. ctrl-a to seclect the whole document
  3. ctrl-c to copy the whole document
  4. open notepad
  5. in notepad ctrl-p
  6. in notepad File -> Save-As
  7. save this file somewhere as mycleandocument.txt, or what ever you want to call it
  8. close notepad
  9. just to be save close worda full stop
  10. open word and in word open this file mycleandocument.txt
  11. File -> Save-As, mycleandocument.doc
  12. you now have a clean text file of your orginal document.

What we need to do now in this document is get rid of all the old bad formatting.

  1. remove all double spaces.
    Edit -> Find & Replace, find ”  ” [double space], replace ” ” [single space]
    (do not included speech marks)
    You may need to run this several times until you get no new replacements.
  2. put a space after every period
    Edit -> Find & Replace, find “.”, replace “. ”
    Do this twice, we now have at lease two spaces after every period, this will make for easy reading. Now we need to get rid of any sentences that end with three spaces.
    Edit -> Find & Replace, find “.   “[.spacespacespace], replace “.  “[.spacespace]
    (do not included speech marks, consider this true for all finds and replaces)
    You may need to run this several times until you get no new replacements.
  3. remove all spaces at the end of a paragraph
    Edit -> Find & Replace, find ” ^p” [space & a paragraph marker], replace “^p”
  4. remove all double line
    Edit -> Find & Replace, find “^p^p” [special paragraph marker], replace “^p”
    Repeat.
  5. remove all double tabs, if a tab is needed there should only be one and these should be formatted by the style
    Edit -> Find & Replace, find “^t^t” [special tab marker], replace “^t”
  6. remove all tab at the beginning of a paragrap. This should be formatted via styles
    Edit -> Find & Replace, find “^p^t”, replace “^p”
To get the hang of editing your text and making it clean, play around with the Find & Replace command for a while, see what you can do. This list here will give you a pretty good beginning to cleaning up your document.
Next we will looking at formatting your document ONLY via styles.
Open Office find and replaces.

Tab Mark \t
Paragraph mark $
Empty line ^$

Styles

The way to format a document is via styles and not manually.

What the beginning of a chapter to always start on a new page? Use styles.
What the chapter name to be Bold, right aligned and in Trebuchet font? Use styles.
Have a special comment every so often that you want to stand out? Use styles, don’t manually edit.
Using quotes from other sources that you want specially formatted? Use styles.
Having bullet points? Use styles.

The document I have been working on is a thirty chapter book for a friend, it has been previously published, so I had a pretty good word document to work from, it still needed to go though all the processes above, but now I had it in a clean format.

Looking at the book I wen though and considered the styles this book would need using the original printed book as a guideline.

  • Heading – I used this style only on the opening cover page, bold, centered, Trebuchet MS 16.
  • Small – used in a number of places for very small text that we need but no one ever reads. Arial 10, plain, left justified.
  • Heading 1 – this also becomes <h1> on a website. Times New Roman, 22, color grey, bottom border, 2cm left indent, left justified.
  • Scripture – used for references to the bible, Segoe UI Symbol, 10, right justified.
  • Scripture Ref – used for the chapter and verse reference, Segoe UI Symbol, 8, right justified.
  • Heading 2 – becomes <h2>, Trebuchet MS 13, left justified, 10cm wide, bold.
  • Text Body – the main text of the document. Times New Roman, 12, left justified, .15cm before and after paragraph.
There are more setting than these to these styles, but this gives you a good idea.
Now to apply these styles to the correct paragraphs.

You may have need of more styles than this to give your document the look you want. The idea is to ONLY use styles to change the look and feel and to be very sparing on manual formatting.

 

About howlmc

50 something geek, who has owned way too many computers.
This entry was posted in text to text, Websites. Bookmark the permalink.