Proofing of the East Florida Papers Calendar Database
Suggest aiming for 1,000 entries (cards) per person per month
Would like to have first 10,000 entries cleaned up by April 15th, 2005
Comes out to fixing about 50 entries per day
Would like to have all 50,000 entries cleaned up by December 31, 2005
What we need to do
Clean-up is more a matter of "scrubbing" than of proofing. Some
entries and sections will take a lot more work than other.
There are three main types of error
TYPOGRAPHICAL: entry has misspellings or mistakes in
names, words, numbers, places. The program had a terrible
time with capital "M"s. A lot of times it substituted "ivi" or just
"IvI". It also changed "C"s and "G"s and for some reason "in"
often came out as "lp". "Spain" was often "Spalp." The Spanish
preposition "de" was also changed to "d" of "b" in a lot of cases.
Occasionally entries are in Spanish. For some reason, the
program seemed to do better with these than English.
GOBBLEDYGOOK: entry is so bad it needs to be retyped from
"NO PATTERM MATCH:" Most of the entry is missing and has
to be typed in from the card.
The really bad entries tend to occur in batches of between 100 and 200.
In the rest, there are just a lot of typos.
Possible ways to proceed
Card by card: checking each card against each entry
General proofing: Doing the easy fixes first, then going through a
second time to deal with the real problems, referring to cards when
I would suggest going card by card first. Once you get familiar with
the spreadsheet, you can probably figure our short cuts.
Need to fix errors in all spellings, but especially DATES, NAMES of
people, places, ships, etc., and NUMBERS. Also for military titles, be
aware that "Lieut" for "lieutenant" often came out as LT, It, IT, it, etc.
Just use "Lt." "Captain" should be "Cpt." Or "Capt." "Colonel" should
be "Col." "Sergeant" should be "Sgt." "Corporal" should be "Cpl." If
the words are spelled out, that's fine, as long as they are spelled
Places were numerals could be wrong: Section Number; Day and Year
of dates; Microfilm reel numbers; Amounts of money. I've tried to
correct Section and Microfilm numbers, and Years. It's hard to know if
a day or an amount is wrong without checking the cards. Sometimes
the computer software read a "2" as a "3" or an "8" and vice versa.
Less important (fix only if convenient)
Wrong punctuation or extra punctuation at end of sentences, phrases
Also punctuation after abbreviations. "Jan" or "Jan." is okay. Doesn't
matter. Don't bother to add in periods if they are missing.
Extra characters at the end of entries
Capitalization. Correct if convenient.
Standardization. Don't standardize. If it says "St. Aug." instead of "St.
Augustine," leave it. If it say "Gov." instead of "Governor," leave it.
Spanish versus English. If a place name is in Spanish instead of
English, leave it.
Really bad sections
If you hit a section that seems hopeless or more than you can deal
with, enter "$$$$$" and let me know. I will go find that section and
deal with it. The first really bad section starts around line 3000, so I
will take that section.
Backing up work
Please save you corrections every day or every couple of days onto a
disk. I will make master files from the disks and keep them current so
that we always have an archive of the most recent version of work.