1 August 2000

How to index a book semi-automatically

THIS IS NOW IMPROVED AND REPLACED - See here

 =========================
You have finished your magnum opus, and your publisher wants an index.
He refuses to pay a professional indexer, and expects you to do it.
What do you do? Save this email for that eventuality.
These notes will assume you are using Word 2003. Other word processors are similar.
You can index a book the hard way, or the easier way, or the easy way.

THE HARD WAY: MARK EACH WORD MANUALLY
- i.e. go through your document and mark every word which you want indexed.
- to mark a word, highlight it and click on Insert: Reference: Index and Tables...: Mark Entry...: Mark All
- or, use Alt-Shift X, and click on Mark All
- do that for every word which you want in the index
- if you are clever you will make a macro to bypass all those clicks, but you still have to pick out all the words.

AN EASIER WAY: USE AUTO-MARK
- i.e. make a concordance of words you want indexed and use Auto-Mark
- but making a concordance is not easy. You have to collect all the words yourself.
- the Word Help files tell you how to make a concordance, but it is hard work, so...

AN EASY WAY: USE A CONCORDANCE PROGRAM AND AUTO-MARK
- a concordance program collects all the words in your book
- you delete the words you don't want indexed, and tidy it up
- then use the resultant Concordance file for Auto-Mark
- it is a bit complicated, so I have laid out step-by-step instructions


STEP-BY-STEP INSTRUCTIONS
========================

INSTALL "SIMPLE CONCORDANCE PROGRAM" (SCP)
(This is a freeware program available from various sites, including Tyndale)
- download the latest version from www.textworld.com/scp

or get an older copy from the Tyndale site
at http://www.tyndalehouse.co.uk/Download/SCP32x40.zip
On a Mac you can use Conc which does roughly the same job as SCP on a PC.
I haven't used it personally, but you can read about it and download it from:
http://www.indiana.edu/~letrs/help-services/QuickGuides/about-conc.html


After you've installed it, you might like to read the Getting Started instructions.
- It is easy to use, but you probably have never used anything like it before.
- it is a really amazing program. If Young or Strong had one of these, they might still be young and strong
- if you hate reading software manuals, follow these instructions:

MAKE TEXT-ONLY VERSIONS OF THE FILE(S) YOU WANT TO INDEX
(SCP cannot read Word documents, so you have to make .txt versions)
- load the book file(s) into Word and save as text file(s)
- to do this, click on File: Save as. Then change the 'Save as type' to 'Plain Text...txt' and save
(ignore the warnings about losing formatting - that what you want to do)

MAKE A SCP FILE
(SCP uses a special format of file, which it can create from a text file)
- start up SCP and click on File: New
- in the top-right browse for the folder in which you saved your chapters
- in the left-hand box, highlight the Plain Text file(s) of your book and click on 'Add selected files'
- change the Title in the top-left from Project1 to something like MyBook (or whatever you want)
- tick "Build Vocabulary" and "Separate by capitalization"
- click on Save and after much working, it will offer to save the results with a name you choose.
(the program may appear to do nothing for a very long time, so be patient)
- click on OK to return to main SCP program

MAKE A CONCORDANCE LIST IN SCP
- load the .scp file which you just made by clicking on File: Open
- click on the tab 'Word List'
- change the order to 'Decreasing Frequency Order'
- change the layout to One Column
- remove the tick from 'Frequencies'
- click on the button "Word List" to produce the list
- save it by clicking on File: Save: It saves in .rtf format.

EDIT THE CONCORDANCE FILE
- open the SCP concordance file in Word, by clicking on File: Open
- change 'Types of file' to 'Rich Text Format'
- find the file you made (it is probably in the SCP folder)
- double-click on the file.
- remove all the words which you don't want to include
- if words occur twice, once starting with a capital (as used at the start of a sentence), keep both versions, because Word's index markup is case-sensitive
(ie most of the words at the top of the list)

MAKE A WORD CONCORDANCE TABLE
Make a Word concordance table
- the Word concordance needs to be a two column table
- highlight all the text by pressing Control-A
- click on Table: Convert text to table..'
- accept the default (1column, separated by paragraphs) and click OK
- click the cursor just outside the right-hand edge of the table
- click on Table: Select column, then click on Table: Insert columns
- save the file

REFINE YOUR WORD CONCORDANCE TABLE
- the words searched for are in the left hand column
- don't change to upper or lower case, because Word's automatic index is case-sensitive
- in the right hand column put the entry you want in the index
- eg 'Paul', 'Paul's' and 'pauline' might all have the index entry 'Paul'
- and 'Baukham ' should be Baukham, Richard
- do not remove duplicates such as "Sacrifice" and "sacrifice" because Word needs to know that you want to mark up both an instance occurring at the start of a sentence as well as the one inside the sentence.
(If you want more than one index, with Modern Authors separately, see below)
- this large task is easier if you sort the table alphabetically:
- click on Table: Select Table: then on Table: Sort: OK

USE THE WORD AUTO-MARK FUNCTION
- load the whole document to be indexed into Word
- make a copy of it (ie one without index marks, so you can start again if necessary)
- click on Insert: Reference: Index and tables...: AutoMark...:
- when it asks you to find the AutoMark file:
- change 'Types of file' to 'Rich Text Format'
- find the file you made (it may be in the SCP folder)
- double-click on the file
- wait while your text is automatically marked up
- to remove Index marks (eg from the Contents), look for: XE "*"
(with Hidden and Wildcard turned on) and delete the codes.


GENERATE THE INDEX
- to insert the index, move the cursor to the end (where you want the Index)
- click again on Insert: Reference Index and tables...: and click on OK


NOW YOU CAN FORMAT YOUR INDEX
No doubt it is not as you would like it.
To pick a different template: right-click on the index and click on Edit field: Index
Either pick a pre-defined Format, or pick the format "Template"
and click on "Modify", so that you can change each level independently.
You can carry on editing the document, and re-index at any time by right-clicking on the index and clicking on "Update Field".

HOW TO MAKE MORE THAN ONE INDEX IN WORD:
(this is normally impossible. Word can only make one index per document)
- make a copy of the folder which has your document file(s)
- put the indexing concordance file in both folders
- edit the index files so one contains all names, and one has no names
- make an index on both documents, then copy and paste the second index
- OR, make an index concordance file from your Bibliography
- to make a separate Scripture Ref index, you will have to mark up the texts manually
Note: If you are still editing the book, make sure you know which one you edit, and re-copy it into the other folder before you re-index.

Update sent out Aug.2000
Several interesting emails from scholars contained significant nuggets which should be added to the previous email about indexing.
I won't mention names, in case I miss people out.

HOW TO INDEX ON A MAC
There is a Mac concordance program called Conc available from SIL
- http://www.sil.org/computing/catalog/conc.html
Apparently this works in a similar way to SCP. There are instructions at
http://www.sil.org/computing/conc/tutorial.html

HOW TO INDEX PUBLISHER'S PROOFS
If you are not supplying Camera Ready Copy, and you have to index your publisher's proofs, you can't simply supply an index of your files, because the page numbers will be wrong.
A rough work around is to produce a copy of your files with the same pagination as the proofs. You can make the pages match, either by making page sizes which are almost exactly the same as the proofs, or by forcing a New Page where-ever the proofs have a new page. The page layout will look terrible, but it doesn't matter, because you just want to produce an index. (Use Control-Enter to force a New Page).

CONSIDER EMPLOYING A PROFESSIONAL INDEXER
The DIY indexing which I described will always be inferior to that done by a professional. A concordance file only finds the words you have used, and doesn't list the concepts. Also, you need to edit the concordance file rigorously so that you link words with the concepts they represent. So, if you can afford it, employ a professional. You can find one at:
http://www.socind.demon.co.uk/ - an on-line version of Indexers Available 2000, searchable by subject (religion and theology currently has 22 entrants)
http://www.sfep.demon.co.uk/ - an on-line version of the SFEP Directory, currently being updated to 2000/2001. The 1999/2000 printed version has numerous entries under religion and theology: there are 36 entries in bold type that indicate specialists.

WARNINGS ABOUT FOOTNOTES ON WORD 97 & 98
There is a problem with footnotes on PC Word 97 and Mac Word 98. Sometimes a footnote may be split or moved onto the next page, even when there is room on the correct page.
For the Mac problems, see http://www.macfixit.com/ultimate/Forum2/HTML/001011.html
For Camera Ready Copy, it may be safer to print the file from PC Word 95 or Mac Word 5.1.
You will have to use 'Save As..' to save it in the older format, then re-open it in the older word processor.
Apparently this has been fixed in Word 2000, but you need to turn off the check box for tools\ options\compatibility\ layout footnotes like Word 97.
OR you can try the following fix:
Set the paragraph formatting of your body text to Exact line spacing. If you currently use Single line spacing, set it to around 115% of the point size. (The actual % to use depends on the font in use).

4 comments:

Anonymous said...

This article was very helpful. Thank you.

Anonymous said...

thank you so much!
i am currently writting my first book and this just saved my life.

Anonymous said...

Nice post.

I have created my own Index using a program called PDF Index Generator :
www.pdfindexgenerator.com.
But i still needed to follow some of your instructions.

Thanks a lot.

Anonymous said...

This was awesome, thank you so much for sharing.