Litigation Support: What not to doScanning documents in support of Litigation is challenging and a great responsibility. Errors cost everyone involved time and may create mis-trust and even impact final judgement.
Much can be learned from examples incorrect output of bad output. The following is a summary compiled for aiding Litigation Support Scanning vendors and the firms that use them.
Media Labels
Misspellings: Client names, matter names or client-matter numbers are misspelled. This can be especially
frustrating when the Firm's client wishes to see or get a copy of the vendor product. If our client's name is
SMITH, a CD labeled SITH or SMATH or SMYTHE instantly calls into question the quality of the content of
the media.
Handwritten: Not only is handwriting hard to read, but it also lends itself to missing information.
File / Folder / Volume Name Conventions
- Tilde or otherwise truncated file or folder names. As example, AAA0000001.TIF versus AAA000~1.TIF and
D:\PROGRAM FILES\ versus D:\PROGRA~1\. Whenever possible, volume, file and folder names should not
be wider than eight (8) characters with a suffix not wider than three (3) characters. In technical circles, this is
known as the "16-bit" or "MS-DOS 8.3" naming convention.
- Use of spaces or any characters in a load file that Windows does not allow in a file or folder name. This seems
obvious, but we have received deliveries from vendors who used characters in the database that were not valid
in the filename. This resulted in files that would either not copy to the server or would copy with strange
naming results. We don't know what kind of software or operating system this vendor used to create their
product, but they certainly never tried to load it themselves.
- Missing or empty folders are a big red flag. If your image folder contains 3 subfolders named 012, 014 and 015,
your first inclination is to ask what happened to 001-011 and 013.
Database
- The date field should only include the date. An example of a valid date is "01/01/2004". An example of an
invalid date is "01/01/2004 12:01:01PM".
- Dates should have 4-digit years. "01/01/2004" is valid where "01/01/04" is not.
- OCR and full text from electronic discovery should maintain original formatting. Some EDD and OCR
applications replace spaces, soft returns and hard returns with characters other than spaces, soft returns and hard
returns. If the original text is "Best Practices", then the database OCR field should never contain:
"BestPractices", "Best/Practices" or "Best@Practices".
Media Content
- Each CD should be "self-contained". If 5 CDs arrive and the load files for all 5 CDs reside on CD #5, then that
is wrong. The idea here is to be able to reload any CD as quickly as possible. Sometimes collections become
separated over time. It is conceivable that CD #5, with all the database and image load files, could be lost. This
means CD #1 through #4 are now incomplete. Each CD should be self-contained.
- For a given project, all load files (Concordance .DAT and Opticon .LOG) should use the same field names,
ordering and structure as the first delivery.
- A "synch" file provides the text to go with the video. That term is generic. There are multiple file types to
consider. If you are a Sanction user, you want an .MDB. The vendor cannot tell what formats the Litigation
Support person uses unless they are told or request that information.
Load Files
Concordance Load Files
- A .DAT load file without a supporting file showing: field structure, field size and field sequence.
- The first line of the .DAT file should be the field names. When loading a .DAT file, this is the simplest way to
see if the data loaded correctly.
- Badly formatted body Meta-Data. The spaces and returns must match the original text. No odd characters, such
as a semi-colon, should appear in lieu of a soft-return or a space. These kinds of problems not only make the
text hard to read, but they also interfere with searching.
- More than one document per database record. This kind of error can cost the Firm hours and days or a case.
When the review team identifies all the documents to produce, a ratio other than 1:1 will result in the wrong
documents getting produced along with the right documents.
- Databases and load files should open sorted by "Bates" or "docno". Concordance displays records in the same
order that they were loaded. Therefore a disordered load file results in a disordered database.
- Duplicate, overlapping or gaps in "Bates" or "docno" fields.
- Bates / Docno prefix contains characters other than A...Z.
- Bates / Docno suffix contains letters and is not zero-added to four places (.0001).
- Bates / Docno contains a space, such as "AA 00001".
OCR
- When there is bad OCR, an appropriate error code and warning to the firm is required. Things such as
handwriting and graphics will not provide good OCR results. As such the vendor must warn the firm and Litigation Support about these issues and the associated "<>" text. In this fashion, the law firm
knows a legitimate error from a missed problem. This can result in a "false positive" in terms of QC looking for
errors.
- Vendor must use Auto-Rotate on every image. This ensures the 5 - 10% of images facing sideways or upside-
down get quality OCR. Documents such as hierarchical employee charts are almost always designed landscape
instead of portrait. All of these names and titles should be easy to OCR, unless auto-rotate is off.
/li>
Opticon Load Files
- Image key, "A001" and filename "001.TIF" do not match
(ex: A001,[VOLUME],D:\[VOLUME]\IMAGES\001\001.TIF,Y,,,
)
- This first page of a document is missing a page count
(ex: A001,[VOLUME],D:\[VOLUME]\IMAGES\001\001.TIF,Y,,,
)
- This page is missing the ","s and possibly the begin document "Y" and page count
: A002,[VOLUME],D:\[VOLUME]\IMAGES\001\A002.TIF
- Opticon load file extensions should be .LOG, .TXT or .RXF. Some software vendors used to create the log file
output with an extension of .OPT. Opticon does not look for .OPT when displaying potential load files.
- Image Cross-Reference File - Filename Mismatch. The filename inside of the cross-reference file does not
match the actual filename. Again, this could be a hiccup in processing. This is caught when we run our QC tests
to make sure every file listed is actually on the server.
- Only images belong in the Opticon load file. Sometimes vendors will put the OCR files into the same folder as
the images. This has, on occasion, resulted in a load file that references both the images and the OCR files. In
the following example, lines 2 and 4 should not be included:
- SMI0001,SMI001,D:\IMAGES\SMI0001.tif,Y,,,1
- SMI0001.TXT,SMI001,D:\IMAGES\SMI0001.TXT,,,,
- SMI0002,SMI001,D:\IMAGES\SMI0002.tif,Y,,,1
- SMI0002.TXT,SMI001,D:\IMAGES\SMI0002.TXT,,,,
Every import line for every delivery should be formatted the same, irrespective of the technician who generated
the load file. Right or wrong, at least the delivery is wrong in a consistent fashion from CD to CD. If the path
information isn't "plug and play", Litigation Support has to modify the associated load files. Did the vendor not
know or not care that their CDs contained inconsistent information?
Image Format
- Multi-Page TIFFs. There are two major problems with multi-page TIFFs. The main issue is the inability to
easily divide one document into two. Selecting the "logical bindings" option in scanning along with use of slip-
sheets is a great way to ensure the required one document to one record division in the database.
- Unless otherwise specified, we do not want Bates stamps or any other type of stamp applied to our images.
- TIFF images of Excel spreadsheets where columns are too narrow causing cell content to appear as "#######"
instead of the actual value.
- TIFF images of Excel spreadsheets where the cells show the formula instead of the resulting value. An example
of this would be a summing cell that should show the grand total for a column but instead shows something
such as "=sum(A1..A10)".
Transcripts
- Transcript is in WordPerfect format or some legacy word processing format such as Wang or Wordstar.
- Transcript requires manual editing due to extremely irregular formatting.
- Gaps in text or pages.
- Control characters in transcript text file.
- Each line of text has a "line wrap" instead of "hard return". (Note: UltraEdit, text editor can fix this.)
- Delivery transcripts on floppy instead of CD. (It is safer and the media cheaper.)
General Errors / Issues
- Databases and Opticon load files where every document is one page. While possible and quite likely to have 1
single page document, a database comprised entirely of 13,000 one-page documents is highly unlikely.
- 2. While a document containing 13,000 pages is possible, is it unlikely. A database with several 13,000 page
documents is extremely unlikely. This could be a physical versus logical document breaks issue.
- Do not create a new image sub-folder for each document. A CD with 300 1-page documents should result in 1
folder of 300 images. 300 folders each containing 1 page is incorrect.
- When generating electronic or paper documents, the vendor should never add their company information to the header or footer.
Copyright (c) 2006 Ad Litem Consulting, Inc. This material may be distributed only subject to
the terms and conditions set forth in the document license
Can't find what you're looking for? Contact us directly for more information or use the search box below.
Search:
Thank you for the opportunity to work together.
 Contact us online about our scanning services or call (888) 211-1797 for more information
|