<%BANNER%>

BIODQ

Permanent Link: http://ufdc.ufl.edu/UFE0021491/00001

Material Information

Title: BIODQ A Model for Data Quality Estimation and Management in Biological Databases
Physical Description: 1 online resource (175 p.)
Language: english
Creator: Martinez, Alexandra Maria
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: architecture, biodq, biological, biologists, biology, classification, data, databases, dimensions, estimation, genbank, hammer, integration, management, martinez, measures, metadata, model, ncbi, prototype, quality, refseq, repositories, study
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: We present BIODQ, a model for estimating and managing the quality of biological data in genomics repositories. BIODQ uses our new Quality Estimation Model (QEM) which has been implemented as part of the Quality Management Architecture (QMA). The QEM consists of a set of quality dimensions and their quantitative measures. The QMA combines a series of software components that provide support for the integration of the QEM with existing biological repositories. We describe a research study conducted among biologists, which provides insights into the process of quality assessment in the biological context, and is the basis of our evaluation. The evaluation results show that the QEM dimensions and estimations are biologically-relevant and useful for discriminating high quality from low quality data. Additionally, the evaluation performed on a subset of the National Center for Biotechnology Information?s databases validates the benefits of QMA as a quality-aware interface to genomics repositories. We expect BIODQ to benefit biologists and other users of genomics repositories by providing them with accurate information about the quality of the information that is returned as part of their queries.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Alexandra Maria Martinez.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Hammer, Joachim.
Local: Co-adviser: Dobra, Alin.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021491:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021491/00001

Material Information

Title: BIODQ A Model for Data Quality Estimation and Management in Biological Databases
Physical Description: 1 online resource (175 p.)
Language: english
Creator: Martinez, Alexandra Maria
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: architecture, biodq, biological, biologists, biology, classification, data, databases, dimensions, estimation, genbank, hammer, integration, management, martinez, measures, metadata, model, ncbi, prototype, quality, refseq, repositories, study
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: We present BIODQ, a model for estimating and managing the quality of biological data in genomics repositories. BIODQ uses our new Quality Estimation Model (QEM) which has been implemented as part of the Quality Management Architecture (QMA). The QEM consists of a set of quality dimensions and their quantitative measures. The QMA combines a series of software components that provide support for the integration of the QEM with existing biological repositories. We describe a research study conducted among biologists, which provides insights into the process of quality assessment in the biological context, and is the basis of our evaluation. The evaluation results show that the QEM dimensions and estimations are biologically-relevant and useful for discriminating high quality from low quality data. Additionally, the evaluation performed on a subset of the National Center for Biotechnology Information?s databases validates the benefits of QMA as a quality-aware interface to genomics repositories. We expect BIODQ to benefit biologists and other users of genomics repositories by providing them with accurate information about the quality of the information that is returned as part of their queries.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Alexandra Maria Martinez.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Hammer, Joachim.
Local: Co-adviser: Dobra, Alin.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021491:00001


This item has the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101114_AAAALD INGEST_TIME 2010-11-15T04:27:20Z PACKAGE UFE0021491_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 55646 DFID F20101114_AADUKQ ORIGIN DEPOSITOR PATH martinez_a_Page_165.pro GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
6adf0f581b27239be40bd321a36aa3d9
SHA-1
05842d2f411525379d0fc8ce85cb9138899323ea
708 F20101114_AADULE martinez_a_Page_004.txt
8a767e339134f6744f427017ad59f09f
77295dd688ffe9c6e7d471af3e07748e19390ad5
47323 F20101114_AADUKR martinez_a_Page_166.pro
6f2011b44036c681777ed921aa725228
50d5ac99e8b0d07eb4055e0801ac85ec2637bbc3
3260 F20101114_AADULF martinez_a_Page_005.txt
f1f31c7f349562a7e468340b98d70c21
469ec202498cd62e6e853672d2cd2740e55db997
42620 F20101114_AADUKS martinez_a_Page_167.pro
3eeb1c8351af805f7aa2df05d3bed336
4efcd9a37f1c9c8292cb7fd5dda1f2b58e48b14c
4184 F20101114_AADULG martinez_a_Page_006.txt
6b266abeba82444d34cbcad10a9319c3
eb737481e0fba07af7c20c89593e39bd9380ff7e
41306 F20101114_AADUKT martinez_a_Page_168.pro
849f11aab4bd0671a21858da3e74db3d
4f4f86758afe442efee1f583c3db3caf2242278c
3320 F20101114_AADULH martinez_a_Page_007.txt
664b498c2299699d80b7e09370b84dbb
6e7d3ec5d266f0125a83dc84f7fecfbb20c68c2c
23684 F20101114_AADUKU martinez_a_Page_169.pro
1f2314649d99ae919045023f493d5466
b6216b8381cd2bcd7ea39573d21f8ff05eaf133c
2546 F20101114_AADULI martinez_a_Page_008.txt
626c7bcfa83558522d119a7829096c7a
1f72d6572d5cc60e3aaad99437bd4dae305b34cc
58356 F20101114_AADUKV martinez_a_Page_170.pro
fac2e7c1b06bc09170d42907e0c37690
c29f705559c709c30caabaf469f549bcb9e5f7d6
864 F20101114_AADULJ martinez_a_Page_009.txt
9a0e80160ef959bd48afe455190e53de
f8400c4853044c09aff0780d843d652d3994dffe
63057 F20101114_AADUKW martinez_a_Page_171.pro
7ddc8b05895af3ba7f89f0605cf438dc
27d5ae4337a4adf68a1fe43c60736c49e0f818f7
2746 F20101114_AADULK martinez_a_Page_010.txt
2398d31994180ab10384f2a59ccd9c71
2755b6bceaf5b29aac6f5e7b15421c3a2e691280
62870 F20101114_AADUKX martinez_a_Page_172.pro
54e022d095aa1d7cba8b0737fb580fda
a9646d6f7376b67333fbcd2736230f44faa4b20f
2064 F20101114_AADUMA martinez_a_Page_028.txt
25f9a285502a21433a7b6e9b8ea490e0
61cd3dbe532d011f61a6449adeaa4bf22b433e99
1561 F20101114_AADULL martinez_a_Page_011.txt
2a98257eab0d8fc19bd26851e6deb4e3
038b6385dfdcbe2a8326f0d195596c02b65d4a0f
62144 F20101114_AADUKY martinez_a_Page_173.pro
dc10b214f41c08f08a7d026a3cdd2d6c
11ed6940be88bfed69acb53c6abb3ed6a3daf373
2146 F20101114_AADUMB martinez_a_Page_029.txt
4bc8025385083698bb0f9079da219763
3096d0161e8a9e02f4196555e7eb1cd94795e49f
866 F20101114_AADULM martinez_a_Page_012.txt
1ef6dabc5446fbae40e81e5bf72eead3
dbffd2f5a2afd1a7ee9b0fb242161c1a481c82f6
46753 F20101114_AADUKZ martinez_a_Page_174.pro
448111dda0877ac03ccf5a80fc43c9ea
2654958e1141a078d6a633dc7aa7747a1de67ff7
1828 F20101114_AADULN martinez_a_Page_014.txt
15394cb7fd850248bba2218100f8d82e
c55c01229ee6b3f8f267949f5b025ecf062b4b35
48396 F20101114_AADTJA martinez_a_Page_122.pro
9329b7370f362053a1d23eaaaffdfc56
132ec627b9ab8c6c12cf571496c810ff5df2a012
1955 F20101114_AADUMC martinez_a_Page_030.txt
593e8102cecf5982484ae3ce94819237
899d984fc2a51996eac2b6e7d8700ff7afdc354c
1975 F20101114_AADULO martinez_a_Page_015.txt
013719dbe2c8f41119e92bd42f89c6d7
9674c2a1b6468bb8ac61586ce2230fe31dec2136
96843 F20101114_AADTJB martinez_a_Page_107.jp2
4b5cb9e60ae877b201f2b98194926ce9
e5734ab9becea4233d77a392652a874bd87b6cd5
2080 F20101114_AADUMD martinez_a_Page_031.txt
c6fe8eab0055da67c19262269ce09235
8e56f692085d1326ee154fc5b77c5c5fe7d6ee93
2073 F20101114_AADULP martinez_a_Page_016.txt
882a686dabc7eb07fd41895cc7a6a2db
9a5b40434fd999126d39a53a579c6c8317756aec
53815 F20101114_AADTJC martinez_a_Page_110.pro
a738d6fe71fe1208d13839cc5968d759
0b34f9c0b8b012692c7630588a2878451602f11d
2306 F20101114_AADUME martinez_a_Page_032.txt
4b3b18cfcc983ba6d5d7286577a29e0e
3df5752b510cbf96da46cc21d1aec2f92d7471cb
2110 F20101114_AADULQ martinez_a_Page_017.txt
7cfd732c4f82df7d0a51b7f81dd53196
145391ffe4e3dbe1878af6104647661953832be7
737 F20101114_AADTJD martinez_a_Page_149.txt
a1b9af6756b0d95339b0fbc8a547468a
54f7bdd25d0692af11980787d15070760c771a2f
576 F20101114_AADUMF martinez_a_Page_033.txt
255aa08efb64058ffc864371cefe48af
0a0d4c544a698c03c5b6cd25b4bed571d61fed46
1917 F20101114_AADULR martinez_a_Page_018.txt
ac7190f8373e0b7443e03ab3d5e75d8f
fa59a08bcdb601d03d66f395f362f9725a5dd11a
100882 F20101114_AADTJE martinez_a_Page_030.jpg
474159a254bca8330de97b7f3972bf9a
1e051295985cfe001a5e5500d61ed6764723f773
1938 F20101114_AADUMG martinez_a_Page_034.txt
3599f1a6191a795c09e37f7e04fbbb00
091258556ab04b7c63c6611d768a15e1685407f1
924 F20101114_AADULS martinez_a_Page_020.txt
4a80510a82fa638f22a1a4fa4acfaa1c
2517d1a95c29eaef2450fcbb8c9d0dbfb3f81981
4412 F20101114_AADTJF martinez_a_Page_127thm.jpg
d21cfbbdfa0f1cb10c8499ada9212fa9
f4a5396037be1fa908326f2cdfebbd67cef6fb23
2028 F20101114_AADUMH martinez_a_Page_035.txt
3b3e36dcef734053db9e96d1d2dec4ea
a55b66e6b90b9b26290188eb76ae742e67547dc5
F20101114_AADULT martinez_a_Page_021.txt
cd28fd3797f6c9729cccb45241e254ad
5fdca4a26fe00a861e44485f3e8d8adff9293c1c
2125 F20101114_AADTJG martinez_a_Page_019.txt
c9215a3771677a8e4ac8a1494b9849e2
c704e6e4ebb5d67b4a310617f505d5423b73abbf
1847 F20101114_AADUMI martinez_a_Page_036.txt
11b51a3de4fc7ad11efd8fd8bdc89dd4
5baba3f861b8c174f4f385ffb53b7ba3237b400e
4378 F20101114_AADTIS martinez_a_Page_132thm.jpg
dabfb146249ab1324be4fbf072114623
1505e5019f6038ef91ed9738bad07556a7b9320b
2242 F20101114_AADULU martinez_a_Page_022.txt
a62ee13942503e2b6929d700321d9606
c94a435e9b028ba13a9ee5fb6ebfa88ef05bc8cf
29033 F20101114_AADTJH martinez_a_Page_107.QC.jpg
b6991cdb9ec24a5520c4737f00040687
f0ab69c494d2bb664a973b8fc6e54e41b30bab9a
1944 F20101114_AADUMJ martinez_a_Page_037.txt
a489b3963b851bcc396f9d03be2086b6
9790eb8d517adfc693e0d76a6bfb3a31312cb69a
7679 F20101114_AADTIT martinez_a_Page_044thm.jpg
e7e2ac97d5be017206cefd6a1d1204c6
a5f98edf78f955cd0d6bee108818869b63e65bb8
2184 F20101114_AADULV martinez_a_Page_023.txt
a70a3c49fa1ba5dbf9648b4e6d620f30
ca35774130cef2d39c053dcd2a0d2fc37b18fd6c
12303 F20101114_AADTJI martinez_a_Page_095.QC.jpg
6544af84b4f95c37ad08c22c4a33bd36
c53db00c91816ee96d4814f396c28a4dd6deba04
1837 F20101114_AADUMK martinez_a_Page_038.txt
76dc5a559bbd12cab79bd9d4af6181e0
c1398a8c11d7d9e5e37d9f123a49b1fbf196171e
52684 F20101114_AADTIU martinez_a_Page_061.pro
bc2e0feedf48d3046005f8b5b0b3d1f5
6237a1a4a56d317a1c7586f4d7d509fa64e1e934
995 F20101114_AADULW martinez_a_Page_024.txt
229da9e2f708cbd5fff063874b8e9f98
c4f24bf0592f1116dfd92c8e1b9ae1381cd1b358
803 F20101114_AADUNA martinez_a_Page_054.txt
f03af5bf8ad1bfbd32a9cac600716e39
29ed8c56b03344ed21fc5d9a03a5d0bf3bed0142
36250 F20101114_AADTJJ martinez_a_Page_032.QC.jpg
402b5afd3f114c7ae43cb6821c287c41
6e913f547d74b7848de3490b719ade5fcb7d1037
1481 F20101114_AADUML martinez_a_Page_039.txt
07aea47d85f36b2c481eb0f0316b5326
60c03f83b133fafa7127d1c50c608ebddb3db7d4
8299 F20101114_AADTIV martinez_a_Page_121thm.jpg
2dec760f6a351d1493405a85528f9b9c
ab8b0718dbdd8069d18c4007e65a75000e14a1cd
2082 F20101114_AADULX martinez_a_Page_025.txt
4daf19f487bb94c317d5e1ec86532036
42822d6989c45937cc97ab5ded9685148375624f
1627 F20101114_AADUNB martinez_a_Page_055.txt
10a19432ea16317bbb54d18756061705
c06ebbdb1f05009aef3846bd03da9b171e0872dc
51881 F20101114_AADTJK martinez_a_Page_058.pro
bd9f9bd196ace76755d52ad78b68b893
6b5e10a9ada02542aff235487306c6af201f387e
1758 F20101114_AADUMM martinez_a_Page_040.txt
a4419c954d913a506ac0a832c5297461
133278a226f1789be646b92930d762512d8c8be1
38924 F20101114_AADTIW martinez_a_Page_093.jpg
1783c5c34f1ab59b3e9ba520ed47c7cc
079b35662a50112691006a976e0613a809323d22
2127 F20101114_AADULY martinez_a_Page_026.txt
cc0591c14291a992ab370c264a273a95
17bb5ff87cde1bf890936c95eb986353d50c2965
2162 F20101114_AADUNC martinez_a_Page_056.txt
854afea3cc0a2f0a280f1f9150614b8a
7d682eb55fdababac8152e4e3c52c1ce0c7fe230
6830 F20101114_AADTJL martinez_a_Page_007thm.jpg
a36767c9206e7301cce557bc5149d497
07175b46c1756b7044fbcbc79d47d5de47418461
2151 F20101114_AADUMN martinez_a_Page_041.txt
071df7a52cae6849ddadc74777779a73
0b71825a13d985ec460bc26dbe7211bed6fab9c6
1198 F20101114_AADTIX martinez_a_Page_135.txt
3aced5aff783cb7647e5ab26a0e99be2
2935676aadd56b38937d1d78e326a433df5603bd
1987 F20101114_AADULZ martinez_a_Page_027.txt
45f09d3dc6eeb8ebe64efca66a360cbf
9b5a3f2d60e44be73430a697555056654044371c
116154 F20101114_AADTKA martinez_a_Page_070.jp2
c6f92605236770cd87c7a0f533e08174
eb92c416c6661119402d9ed2eb4f23e47d76c8e5
37436 F20101114_AADTJM martinez_a_Page_134.pro
fead9d11d82724035f3cd40ba97aeb25
106e8c7c7172d168e26bc4085b9c54a92416a275
1857 F20101114_AADUMO martinez_a_Page_042.txt
3debe7b6db9602355503abae864a2fe9
1a2004c9f35a718c04b9b10b9f2fa3ad4e4a0c41
56182 F20101114_AADTIY martinez_a_Page_066.pro
f014f2d0b02fc047dbc5de308f154dd9
c008c6e82d7cccc50fb009aef3354983b2e04857
1902 F20101114_AADUND martinez_a_Page_057.txt
ecbcb3f7e5c1a55645cdf9a04eb910ea
54ec69eaa862397b95e8b347e46671d5f0875717
1053954 F20101114_AADTJN martinez_a_Page_039.tif
426f2b974bbe711525bf777e83548a42
a8c0d9658061313c92c7c0d6f6f26f504a785dc0
2025 F20101114_AADUMP martinez_a_Page_043.txt
fb39f84de52058c4f45ad039e406bc4c
ef3f34c90f39224e4e86c6ff7e8bb57097d91b6d
111590 F20101114_AADTIZ martinez_a_Page_028.jp2
389c30f1fea4c6b0867eef398b1b4cef
cc659f815c9a54c8855fc379f3e7a848db4fe78d
15381 F20101114_AADTKB martinez_a_Page_013.jpg
564df1983d39d8743b5927b1af3e8b8a
8b4724440cb29d9e15d4cb776575316573effe9f
2042 F20101114_AADUNE martinez_a_Page_058.txt
25225aa809cb0d7f84f772a968b62b19
54ad22c98995e04bd5b475fa48553c6bdc7f85fa
36900 F20101114_AADTJO martinez_a_Page_171.QC.jpg
f084f660e1a40ee17e5105885570306f
67163d6e97779a33d4d4e8e703ac8225b56da9c5
1858 F20101114_AADUMQ martinez_a_Page_044.txt
bd2cc0baa073590051eb42db128d7343
3a72b43d2619cd9c8b6d5b73f8d6940992a1d29d
9069 F20101114_AADTKC martinez_a_Page_032thm.jpg
b0f905a0cb5cc95d0acd0a41e0af1baf
2dc4ccd79d1e73c78533fdf6ec45a36a77d71202
1141 F20101114_AADUNF martinez_a_Page_059.txt
300dae7f371366e3c23ac9419dbcabe7
1e4f1a385150426f82d772b4ac8aa17b18ad4f6d
F20101114_AADTJP martinez_a_Page_126.tif
87aa58c7bcf05dda4510e2729c92c6f3
b4a62291462f6de5bc4162a9bfdb9db28e409c08
2010 F20101114_AADUMR martinez_a_Page_045.txt
147cc08cec735c7e8a0dda491a674420
6df0505b1c16b113304adb25e6710781dc9c9163
110472 F20101114_AADTKD martinez_a_Page_110.jpg
f09aeb96e5e2220638773f556e66910e
1f88c04090608228a615cc0f5da0333a3738730e
1499 F20101114_AADUNG martinez_a_Page_060.txt
e776de313a8581c538faf4e42e958ac7
30a556a53569963196c60477fc33f89db1f8512a
20944 F20101114_AADTJQ martinez_a_Page_068.QC.jpg
20b408dcf906b04f651c2dfa1c93a311
3c8ed5e70a6e2be979af2da9df4b19299ddcd13b
1153 F20101114_AADUMS martinez_a_Page_046.txt
83488003ef0b8e9a67ff04b1aeea2ec2
505ef98bf6361859832af6626bfbd476c885af48
12543 F20101114_AADTKE martinez_a_Page_090.QC.jpg
e5aa23ec1bf7e49680f9cd5e08e2afbb
405c0be9df35af7b41e16e3e2eafa0d25444dc73
2076 F20101114_AADUNH martinez_a_Page_061.txt
876eecb05fed3fc663bc67a3082eb9a0
3e7f257a4bff8dc1587fd7444b134a2ac096a15d
27128 F20101114_AADTJR martinez_a_Page_103.QC.jpg
43bf00396ea2a8bb8030c2149e4235ac
ac4d8990906daef87e5f39246c1ced359b2b7749
2041 F20101114_AADUMT martinez_a_Page_047.txt
4cd2169487620a169b3d12d9bb57ec22
d659fbdc247cf35296e5575e89649c1cf87a5ede
21416 F20101114_AADTKF martinez_a_Page_123.jpg
5392b9950e7e939ded5c0aae7f92da12
0152f94b80dffb7d9383eba248050d70c6baa525
2241 F20101114_AADUNI martinez_a_Page_062.txt
89621db36fd6d35a86ed40f82763cb27
3a78f85e29a15c7cba6b40132b6f49656cd680c4
227 F20101114_AADTJS martinez_a_Page_013.txt
5a629089fa87fec6c35681657aa1f1cf
e4cb9216e54c6af0f83bafd6f6b563d571555002
2530 F20101114_AADUMU martinez_a_Page_048.txt
6f57a796651b5adea3a8a045f1b7416a
735486746c5cd52418e0d73e141f301ad3585e23
76876 F20101114_AADTKG martinez_a_Page_137.jpg
83f8e26541caea23fd9fecb877185dc2
a9e33324d47093ca199d4f086ef233dbf6a7cff4
2021 F20101114_AADUNJ martinez_a_Page_063.txt
16a66eaf8610ea6834b72b306d411c2c
88f101e8444f5b028d70197694b6bc4e2a0901fa
2106 F20101114_AADTJT martinez_a_Page_088.txt
7032694d8812cdbebb942209f383d018
a60a74771e190d15d211defe2b2336b89aa42472
2250 F20101114_AADUMV martinez_a_Page_049.txt
8eb791065317a8774fd0ea0e30c83366
94c89c2301d3f0db7f5a1183723eaba3396403ed
F20101114_AADTKH martinez_a_Page_089.tif
dfd926c10910f20012f52aab0b78d9e4
73de341d0f0ff9cecf335b1b04cd242aff14453b
2030 F20101114_AADUNK martinez_a_Page_064.txt
2fb4e2293b90d971987b3f47172289e5
12e8b9fa688885120f16234eb0eeb9184934a3c8
638088 F20101114_AADTJU martinez_a_Page_131.jp2
e3dc09dda7120f11853b3852001bb7a6
d613adc485fe59547a54bf302affafe6485e8253
2308 F20101114_AADUMW martinez_a_Page_050.txt
6125a2ff36c5fca0368c75117ea46851
bb9ad7e0e39d97113a0690ac6e40cdaf7ebc5976
2046 F20101114_AADTKI martinez_a_Page_078.txt
9bab8ba6a71c761579f73321e6d2a3c5
95bb870e27f91f7e461eb87da75e18d4fc98084e
1686 F20101114_AADUOA martinez_a_Page_082.txt
d7fe8bb357b9be0ed4658999798f3a8d
a2ef3e7446d79ebdeeec48254a7b51087a2fbe27
1971 F20101114_AADUNL martinez_a_Page_065.txt
e6d2a02d20b88deb0ba0712849c6da2f
87794961f57fca7d3f00869c6eec2416ef7820f2
7285 F20101114_AADTJV martinez_a_Page_082thm.jpg
b93a16879df090ddf1617e393d842bca
405c2de170a44a15c6ee340aee5fa1e07a664c7d
1322 F20101114_AADUMX martinez_a_Page_051.txt
822f0c859906329a696c91c268ffa9a4
2cd22961ded1226d9ba689da1d012c87616aa9c5
25991 F20101114_AADTKJ martinez_a_Page_005.QC.jpg
965c4d80383924f7b3ee6e96487c7902
4ea74eb81c1ff04605bead1746fc068e6de614aa
1634 F20101114_AADUOB martinez_a_Page_083.txt
7cb99ce538c9f0b95bc07a9512ca135d
d8cad5d1608d046ff1f969014febdc5bdbee79a6
2215 F20101114_AADUNM martinez_a_Page_066.txt
e6da1f8d1591c64de80b39a8423e3c5c
f850c0e6b47c3fa1db2eef625e6714ecef738ce3
2084 F20101114_AADUMY martinez_a_Page_052.txt
e476d503f38fd9fea15cb746d6f9bbb0
f819d70c4e02f014a42dd467ed654b8d8e2a6a4a
2228 F20101114_AADTJW martinez_a_Page_134.txt
d838719ab2d5a4524557688943b55187
eb5f2876d8ad7560bf7e75c5ff66523f04954fe8
32078 F20101114_AADTKK martinez_a_Page_025.QC.jpg
3116c5d5bed964cc7e50fb0194d0f739
0230373323023c794829c476677646fa0970faf0
2223 F20101114_AADUOC martinez_a_Page_084.txt
97ad35acfc0fd82a48cd49edc63d20b9
2aafe7806e776bbcc0522f925068d187c9f7cd83
1597 F20101114_AADUNN martinez_a_Page_067.txt
20b3280a51c3ce576a3eebb431ce9262
4cb536b9147d73fc1c4df11003ce598e49f64c01
2032 F20101114_AADUMZ martinez_a_Page_053.txt
c1751a81f869fd17f388ff7d907492ad
b50f8a2237e5dd131d6487a4544de4e514186c95
8711 F20101114_AADTJX martinez_a_Page_074thm.jpg
3c1edd8f3dea56e4da05c061e619c3b1
e7a7f68f982eb5ff0954874c08a23ce784546f53
82084 F20101114_AADTLA martinez_a_Page_138.jp2
f9001c84dd7c17d3c7164888b0514d6e
fdae79f2a529fa186a7783e6ab6ae89f9c6437af
32139 F20101114_AADTKL martinez_a_Page_140.QC.jpg
6ee803364c043e166fdc1cbdde54856d
366c785182b112c8e44f400e3f7266fc4519d5d0
1989 F20101114_AADUOD martinez_a_Page_085.txt
5b894a0eab241a2bc730798866487c9e
ef907246352a14c650f99463ce34b7a7a29e912e
1922 F20101114_AADUNO martinez_a_Page_069.txt
3691e589e53374887d3ec8803c85a406
fcee6792c996c52351583e4bd93765cdd0dc38a7
25271604 F20101114_AADTJY martinez_a_Page_007.tif
24691ae52042400e8d7fe40b4b1f0f39
fa19661916c92f711573426041d487e0afec0e92
1538 F20101114_AADTLB martinez_a_Page_068.txt
b3a7829f760c66393f93322732bdf6f1
d9c6f9b65ea6ef6de1dfa8b37fdfd8bd5d26d945
56691 F20101114_AADTKM martinez_a_Page_120.pro
2608c696672caefe51ccc5999d1c0480
687bf36f0a91d87f7d10f5850797f0c6f501d7f4
2069 F20101114_AADUNP martinez_a_Page_070.txt
3df85af7de765d6c0fdee84231449221
33c288bbb3009ef580eeea80702c2669032c37e8
88890 F20101114_AADTJZ martinez_a_Page_161.jp2
14351a5296c13c03b18abe6b21ec9ebe
4e98ced23d7add954eab2eb63dc3edc5e6ad563a
33096 F20101114_AADTKN martinez_a_Page_045.QC.jpg
af6a80d28f6baa17891fdb0992cb8787
4ab30d32a01da6290b7f05f3c62e40f70b72016d
F20101114_AADUOE martinez_a_Page_086.txt
27f26b4306227bdd3c491e14033819cf
8b5db2376a357f06db2381f7f68b3e3613c158a2
3472 F20101114_AADUNQ martinez_a_Page_071.txt
5b9b9a8f2aae9fc7149150124dd947b6
72c5c24362fee9ed9610f61ff4ba10c76a2a6025
28670 F20101114_AADTLC martinez_a_Page_055.QC.jpg
2d3d9a6ee766f2e1ff89e5cd63496ceb
a0ce965ca953f9417d1cc79e514a8dc784435475
51188 F20101114_AADTKO martinez_a_Page_063.pro
3f939d60625bc6a7a736ddc36d26b502
71d7cadb098c60bc75198a5fdc3553e5f303c5bc
2171 F20101114_AADUOF martinez_a_Page_089.txt
f2931a2179a02b7c47358a75f0c96edc
ec74b59404219998eefa6b10adf3dc6e2f200ffb
2453 F20101114_AADUNR martinez_a_Page_072.txt
a4db1ad406406e0763f8dcd6905646c8
8d21b46e2aacbc093f7af7b7e4095f7d57b582d2
2163 F20101114_AADTLD martinez_a_Page_087.txt
22d75963b289d6852b2a122e3aa62388
30f37b284c4676b52bc9e3f6f7e1139e155a848b
127591 F20101114_AADTKP martinez_a_Page_144.jpg
fc62f509f7494ef4c25a414e8931c1e0
e4aabeca90f958dc5feb21db4702b89b5d14c366
762 F20101114_AADUOG martinez_a_Page_090.txt
668267e81922247b512b7dec7a8e75a5
5790753559bc5de713430ba61684bcb68ad9146f
F20101114_AADUNS martinez_a_Page_073.txt
ac5f99554e5f4b16cfe8926b17e10203
1fbe4e2cdc28e3a9aec8321886b7b676488262bc
119963 F20101114_AADTLE martinez_a_Page_050.jp2
318672313fa39b9818f2ba3d6ab62d9d
15539eb821d8a137d353188be11474a1ef062857
F20101114_AADTKQ martinez_a_Page_130.tif
8db49b7429603018f4df09e2a9b0e358
6d3df5cb9ff7c6f72b9f6fb8aa6176a9e3fc88cc
590 F20101114_AADUOH martinez_a_Page_091.txt
35223f79bbd269d9c33c9cbd126f2f40
be543cfe6d419c1f310a071debbc01ada412af77
2081 F20101114_AADUNT martinez_a_Page_074.txt
ce1b28e69da67cb1827482cfb7639f25
44bf84f35594326aaf1be05fcd4fefe3885f3ffe
25799 F20101114_AADTLF martinez_a_Page_157.QC.jpg
044bd40ba3a90f08be9a24dad3e77058
3fcbfc8f4e3d389b84d5fc2181b7732c66b400ae
2555 F20101114_AADTKR martinez_a_Page_152.txt
89e030ca5da9fac1e002ef0fceb1a4f8
11bc5034c3357c005ca5e0e992f6c419dfcf60bf
612 F20101114_AADUOI martinez_a_Page_092.txt
db28fa4e0ca7bc7a683c308b6a1b4cc3
07f23799ca862865eda39f67426c7d4a748b4b37
1797 F20101114_AADUNU martinez_a_Page_075.txt
e6b35ee02d97f73454fa6f57ddf2b3b5
083ecdb60934b2601453f60d75d36b40fd31d203
34018 F20101114_AADTLG martinez_a_Page_058.QC.jpg
dff29cf78b250e7f3eb15d4db9ef306e
ad6313a870f29d5e7b03fb8c07ebab514c7e51e3
111975 F20101114_AADTKS martinez_a_Page_116.jpg
bd9d065c91589c8bfecf5ba89d84a25f
0b2b1b09f3fbe0f09d62daa470ed16b2409f3ebc
704 F20101114_AADUOJ martinez_a_Page_093.txt
2f1a6c0f1a3f40db011edeb38f93acbe
2b1f64e7d83fc04393517fa71eba52c3ec205a19
1911 F20101114_AADUNV martinez_a_Page_076.txt
a04fab1ba863334012ced084b986da8a
5f5f9b44c882f7781cedf03c3459eab6ce314e70
263442 F20101114_AADTLH UFE0021491_00001.xml FULL
3afe64bc873c6a1b2d47513dc3bbe51d
8defe6faca3793decb59b7499265edee4aa74e2d
F20101114_AADTKT martinez_a_Page_083.tif
e6a033fd71bbf9123dc7dac26360a0de
e84213eb1ce11003e350d25a17901bcf33f5a9a6
658 F20101114_AADUOK martinez_a_Page_094.txt
a29d0a147e958674911da577483a4042
a02f1abae8db2e0f97801dbc957431e00b37c362
2060 F20101114_AADUNW martinez_a_Page_077.txt
46461147a0b0e0700bb58e78fac0fcac
778af8a35592da4457cf4cd53b6d850a0b936970
2907 F20101114_AADTKU martinez_a_Page_151.txt
ffcbe06add9acf7b0614671e67193432
f3a3f62ef07b305a516fe0628dbb47b90f04318b
2116 F20101114_AADUPA martinez_a_Page_110.txt
87425492196c27c1d2fea5680d83af44
c0a679efe3cd4179597ce2631d1f30cbfd2f6302
668 F20101114_AADUOL martinez_a_Page_095.txt
9eb67e7b41f2ab73976f0c464d0d1495
e8895ff8418c983d6b5e3d4da7baf352057f5e74
2101 F20101114_AADUNX martinez_a_Page_079.txt
f556674908c52d96a97957d98dc5be4c
6e5099725f250120c26822cdbd64cc5a8a8b6d35
98192 F20101114_AADTKV martinez_a_Page_018.jpg
d0b2320c6c9ee0b384991cd7b3f538e2
1d2f0c89b558ada2fcee66e54f28953662f88786
F20101114_AADUPB martinez_a_Page_111.txt
591a65bdec62e71faceda884a2e70fdb
de445e9d52a59ab48c7d06cc65f444a5ae2a583c
901 F20101114_AADUOM martinez_a_Page_096.txt
f6612c67238447ca690bccde081002e3
a141b08927553102f1acf1e9f74a2a8ffd3b2309
2153 F20101114_AADUNY martinez_a_Page_080.txt
8d06c7ce06b9ab8533c227ac81fc4642
b35609067beb05e23f70c8f98413f2c52a1c18ab
27682 F20101114_AADTLK martinez_a_Page_001.jpg
3ae77ffedaf04eee62106155d212de66
ec30a4d1c5c6b0e3cad56a268ea6e0c66cc7b3be
644360 F20101114_AADTKW martinez_a_Page_128.jp2
fa5bdc41e3912475a8c2385d5b10be20
2cf52ea13c8373bccbecb0cf5c25172db4baefa8
1079 F20101114_AADUPC martinez_a_Page_112.txt
cb77f21815c0285f28f6f45ba08a5639
7ef4d79b77d6403bae94a5eb19fb62a7f335b609
2004 F20101114_AADUON martinez_a_Page_097.txt
cf68a103c878dd181cb2c10577de483f
9bc3dc3f3f4e4a0371ddfbbd344424ff4918efd1
F20101114_AADUNZ martinez_a_Page_081.txt
7033d222ff26938231648d8a99e5d9c5
ab1a6aa7cd3b1028e57563b16ee460e900eae0c8
104961 F20101114_AADTMA martinez_a_Page_019.jpg
ccfde3030ee02b49aa45a4d2eb3a1a8f
9f7c5ed8b63c5b9554f4fbba09c72f5d4c97a5b8
4172 F20101114_AADTLL martinez_a_Page_002.jpg
a266f978c333181be0a679c2ab6f754f
0a73297346bd5006d277206d178a1468d0409788
34498 F20101114_AADTKX martinez_a_Page_154.pro
dddcce701d722c2cd10e3d6dd735f81e
c6907f1d7708622122fe751532027e0758bcf999
1959 F20101114_AADUPD martinez_a_Page_113.txt
4f175f449d25190bea77d12965ad6f32
d123b7623d418149ff2337309f0c338ddd484031
2209 F20101114_AADUOO martinez_a_Page_098.txt
9af2f490ddd2b443e8d4baf852f070be
97a79b4763773e883d21a85dada6ad8d6598132e
46259 F20101114_AADTMB martinez_a_Page_020.jpg
2e5bbec5fddc28fafcf2da753859c9bb
1225cbd44219a0507859d58c30e40051d6da7374
3236 F20101114_AADTLM martinez_a_Page_003.jpg
3fc7e302fdbb9aca816b1d1d7509655f
8266202fc90fb254d19249677dc01ad4ab61a4d2
50410 F20101114_AADTKY martinez_a_Page_052.pro
2410f90f46614dcfb5dfdfc543da5d5e
f2fec623cb906de8e195c7bb31ab82cf4aeaedf4
1026 F20101114_AADUPE martinez_a_Page_114.txt
28b6493266ca6d5078bfe4b36d4b4efd
41722b56b3897eb1a801ad85ad5798776de8da85
2015 F20101114_AADUOP martinez_a_Page_099.txt
34b359059b367ed2cd6865ff6742ff64
ccf4cd008f7e0ae67e5770d21fa22057e0e4f44d
105879 F20101114_AADTMC martinez_a_Page_021.jpg
6f3b662ac2c227edb6e6b4d83ed4ff9f
52fd64d7fbaa1c9bdf0b59bf8dbee83d107b511c
38747 F20101114_AADTLN martinez_a_Page_004.jpg
9c1f2ca727b4a66c74d9807ec941eb80
da51a90421a499146f5825678d6c3d88da6ae494
6375 F20101114_AADTKZ martinez_a_Page_005thm.jpg
dab000e6312df9ad02e87971fadec3f9
67367c176cd5a8148f17bf3f8f895079f4531475
1346 F20101114_AADUOQ martinez_a_Page_100.txt
27b79b564dec2a12b0203a365f705ed8
d9dd6da9bff6d0345399623d76ee55739c535c0a
118459 F20101114_AADTLO martinez_a_Page_005.jpg
93475fb07a4e89ea4f709a476324cc61
ec41eafc2893794aa9b0a2ddb77674ab10fb4601
F20101114_AADUPF martinez_a_Page_115.txt
59ddd72a137d242a5dde6e94a2d0c084
5ad307652f5f9133f5da3456614086d85ec778e7
1220 F20101114_AADUOR martinez_a_Page_101.txt
502393a279a9402206b58e3a039e085c
ff9943cc66584312356842485b8002fc9055d788
115256 F20101114_AADTMD martinez_a_Page_022.jpg
e35c323e67e8b2490452c0825470b074
b993be9c9d3af3a8ba3653ffaa465a17d013d812
154946 F20101114_AADTLP martinez_a_Page_006.jpg
5b7a249d8abd6325bfa85b607e4a8904
2c205caad9c7097e9513de266371f1e4ce83118b
2130 F20101114_AADUPG martinez_a_Page_116.txt
19569f6eaed231641f131de2e827f794
6ba442ec1b64a65ffd6c84f58f3214b1bd9f052c
2272 F20101114_AADUOS martinez_a_Page_102.txt
2447fe8a6d74eb7056f06fbe463a2a15
1a3ab4374a908529d99438e375c6adcbd99a7f82
111532 F20101114_AADTME martinez_a_Page_023.jpg
375dfc30a8ef1eb9dbab8162c7f40504
f43a45bd1bf46989fd3fe9201ddab6e9ae5226f3
130338 F20101114_AADTLQ martinez_a_Page_007.jpg
9d3ed0e4c523ca31c7c39059ee9ff6c3
61283ce95b4d12380e092af78e51fe904bdf01e6
2181 F20101114_AADUPH martinez_a_Page_117.txt
b9d9bc882df7556eccac050dd51c2156
aacfb759eef824d871af21d4fe2ca04fd1cf1039
1628 F20101114_AADUOT martinez_a_Page_103.txt
bcba2804e951f5761dffd5d914219103
316b11b1127c8e1a4388b3df0b976719cde2e258
52594 F20101114_AADTMF martinez_a_Page_024.jpg
95aba04d971d0e57ebcfe9374c29df72
516558802b80d6ac247f6dcb3374cd175d522bf3
127799 F20101114_AADTLR martinez_a_Page_008.jpg
bae87fae8a40bd2b830f37de305d1ba4
f4f601c4cf34db2416d0d9ffbfc1d50d6b48fc39
2124 F20101114_AADUPI martinez_a_Page_118.txt
ca871d548bba3e289e492e2fbe5b4b2b
89842550296ba6a0153046640214842c4bdebe09
2205 F20101114_AADUOU martinez_a_Page_104.txt
62ae37ca99dd9a88536f2399bf61b5a8
9ac18a376f4f7a2144dfb9ec0c1cabf4514fd6c3
98918 F20101114_AADTMG martinez_a_Page_025.jpg
25cf29023a4d7d12626f8892619215c4
b658f8b891cdfc8cecfef46846e201af4ecea8dd
45487 F20101114_AADTLS martinez_a_Page_009.jpg
7683ae91a2ad3f41fe6f4be11e8855ab
f7e2a66492f98268aa10a920e566719da7ee6958
1432 F20101114_AADUPJ martinez_a_Page_119.txt
3686f2086d1911d5b7b7a608fafb4f59
8f6919a89dd3424e26b417c2ae6e33fea874462e
2035 F20101114_AADUOV martinez_a_Page_105.txt
89b617a512d9c9525e244fe6a196d9eb
8b72f16e2ec7d51e948e9e99a82e977ea3568857
107302 F20101114_AADTMH martinez_a_Page_026.jpg
2d8c5c75ccc7a171e509a2ed2ed99fb6
e829535ed52d8bf875e5a2613942290900336529
126170 F20101114_AADTLT martinez_a_Page_010.jpg
cebbfa39da9438d3d50335fa6eb8f09e
a4d07347dbd375f9d98ce01e370664bb35820fff
2220 F20101114_AADUPK martinez_a_Page_120.txt
6c1f6ec9042254bb03b48caac127136a
9623cc98211f7ffb8b1c278d91ed3b5969f23420
2273 F20101114_AADUOW martinez_a_Page_106.txt
e24d810806382a32869eb9941310e5c6
90df67cc1ed30cfc84047b41bf3f810fd9439171
101076 F20101114_AADTMI martinez_a_Page_027.jpg
9b63c43af6ea4caeed869e7172a9c96f
fe969626218402700c951113eb746ca0c327a26f
82669 F20101114_AADTLU martinez_a_Page_011.jpg
f3f8650e9a061bab7e5b24241d55ae16
e51bb46652de1eeb70e69942f3d6c2b89a711780
1342 F20101114_AADUQA martinez_a_Page_138.txt
ab6363a02b0e565b937be039b364d325
855152ff73bd4ad71bca30d3b03376c674c13ce3
1990 F20101114_AADUPL martinez_a_Page_121.txt
e9fdfecc085561bbe612a561ae2eb7bc
f7b3d3165525c3926a78b67b0f892046e4f701fe
1820 F20101114_AADUOX martinez_a_Page_107.txt
cd3a17f7e2449060763389e342108c75
6a22d265be247c90224f3dd7198d4cb18df74247
106543 F20101114_AADTMJ martinez_a_Page_028.jpg
e7fa142ffd07352e5e15f4d55cc438a9
06302a1ba0aae8ba2a4759903acd5cdee0cbb136
51333 F20101114_AADTLV martinez_a_Page_012.jpg
dc0775b230406c5d7222585b8d29e994
a95b9205af976c93cbb4ba26ecb7ac63266c06b9
402 F20101114_AADUQB martinez_a_Page_139.txt
44410bbb30cb307ad2903deeeb8aa69e
0f2cd20a99ccd7fbef25eca5b558830f52736a88
F20101114_AADUPM martinez_a_Page_122.txt
3884ee484d1adce7dd3679b84a8a6fe6
db14f134eb4b9123f617edfcc19cc387bfff6223
2077 F20101114_AADUOY martinez_a_Page_108.txt
1b7dc708047fba8e887c7d36a3e4b92a
ba26b58a3dc6b6e77a4190be4407d0fd7a5b6876
113115 F20101114_AADTMK martinez_a_Page_029.jpg
ac293433ec305079aa01cdda4e5f4dc5
dfd8e3ee613ecb855536eeef6430edb8ff2c0194
89743 F20101114_AADTLW martinez_a_Page_014.jpg
abecfd53d3a96d524f6dcb3bf378f2ec
69b8fb99527c13b60572369ba0a40811be6fda97
2684 F20101114_AADUQC martinez_a_Page_140.txt
fc8dc1be3b6419e28c85e0bd7ad5871e
1b71ac5b5cb6d4c1086e9a1d75417742f2f53da5
453 F20101114_AADUPN martinez_a_Page_123.txt
d1cb1f6fad10eb7a8c36ab71a7ce7298
e43dd200323fa8d9b6e7a036a1ea7efb46a66359
2075 F20101114_AADUOZ martinez_a_Page_109.txt
0e19118be07959a67392bb787e3f27b4
f52a7f776ea5ffc3e6bc3e040b5fc7e9648b85ba
106674 F20101114_AADTML martinez_a_Page_031.jpg
2a55b5c58fc76ff41f9d3b1cd2901951
8f0b3a10901f96e9fc811daee8987d802346d42d
98626 F20101114_AADTLX martinez_a_Page_015.jpg
e9fc2fa6418699b983621ab1d803cba0
1307e2b689a9f820c35bbed8d3dd838d2f502f18
71140 F20101114_AADTNA martinez_a_Page_046.jpg
6f727bed38b38a282afd0cfbcc425ecc
9f7e921ccadc83f8f00bb8c4f4812de42efba522
3295 F20101114_AADUQD martinez_a_Page_141.txt
24767224bc958e157135dcfe45bbd2ae
10f5855e6036ecf52c3fa0cc9b65bb44ba4b9f93
2090 F20101114_AADUPO martinez_a_Page_124.txt
c0850fbb47f4cc0a6b036b6714904665
f70f93a34d07fd078a5734ae0fbe51c939f48cfe
115218 F20101114_AADTMM martinez_a_Page_032.jpg
0655f3599d835266687ae23344511794
7f64e86cbf769dddbfb7294b085956027e77235d
106466 F20101114_AADTLY martinez_a_Page_016.jpg
c56a2d765e7e39ab98db7af6f9cc30b9
4c6c5d68144751d39cff1608bb2262c0ea4dd920
97498 F20101114_AADTNB martinez_a_Page_047.jpg
70e84a5765c1cd0ba4fed89ddc9e6956
969a1b87e99f273677bf6b46bed11c95b99d0d56
928 F20101114_AADUQE martinez_a_Page_142.txt
202621cc5950b9aa506a4516a5e03eca
23c87093dc75d7fd7440c9631967f75d735d82fb
2291 F20101114_AADUPP martinez_a_Page_125.txt
d598ca29eb9cd5c79d41eaec392406c9
3f6813ed6610051c4e9b0f89482cade6995ae346
56531 F20101114_AADTMN martinez_a_Page_033.jpg
b76e313887c9c4c059ee443bd2ac2c59
e2ebfe680d82cb9d180b1a9e7ea024286a67a595
108224 F20101114_AADTLZ martinez_a_Page_017.jpg
aa0461dafaeb715c003b3b2828012107
f7d1cf423c9e62e5352faa26337a6791aa6b1e3b
121099 F20101114_AADTNC martinez_a_Page_048.jpg
5e89f6c1e0ea5b6e55bad72186e210d7
b532cf8be3d20807dad2ec463d367412e6e7c35e
698 F20101114_AADUQF martinez_a_Page_143.txt
bd916fa81be870119969457aac91867f
fd45192238cfe906d5ac91104592ccceddfee12b
2265 F20101114_AADUPQ martinez_a_Page_126.txt
1d95551ef9395199e7c8d63605f66bd3
6099ac4cd90a4ccbca133f37df10e2eba32dbd5a
101241 F20101114_AADTMO martinez_a_Page_034.jpg
c35eb70d69c2e58e6642c10e8db4e7e1
ae2ace9f5d666e04ddf642af6e74fa9d294303db
113347 F20101114_AADTND martinez_a_Page_049.jpg
356f124d343892e85de19dd756abf0b2
95be4b7299a88c5688d641c8d06bfd3799d3d24f
1059 F20101114_AADUPR martinez_a_Page_127.txt
5f6698e2ddb0767f6bfc32169740adab
917afda3d571a5ef8357f1898ae007f456559864
101792 F20101114_AADTMP martinez_a_Page_035.jpg
975c77d600e6e5b1003829b4e7503087
7a8b5ad093ef87fe54f3088536bda7de83efcede
2554 F20101114_AADUQG martinez_a_Page_144.txt
209d5713f9ba15234a5640cdaee7a98d
b944c3b14b2a05abb1e229b0b09636f7429474c4
1023 F20101114_AADUPS martinez_a_Page_128.txt
f5a54f22aebff8dad67668feefa19909
3eec64f17b9c7f418fc3d64a2cd838669e465591
90586 F20101114_AADTMQ martinez_a_Page_036.jpg
32936f89578c068733dcd069c428ed4e
66ac511321b09bee67f1dccd0d767936a7ad58c4
113900 F20101114_AADTNE martinez_a_Page_050.jpg
f2f4c86efdefdb6cd4f49670d0f79f34
970aeaeedc7b61ead6e04eb46fd3e051f9f6581d
948 F20101114_AADUQH martinez_a_Page_145.txt
1f0be9f4494e2ffb4c31ee6de6385ea5
13b75d90dd81130e8d2cd19c6e56ba1e8a184101
1251 F20101114_AADUPT martinez_a_Page_129.txt
7d36133e2cf5b00e7639bcf87b6bafda
9ccbd0265a69ee0e5de84267dc74b5c2175ba8c4
97147 F20101114_AADTMR martinez_a_Page_037.jpg
8bf620c871bf744c05618ed115a7ca91
5dd472e96888143548b9f019d138defb8e9b885c
59309 F20101114_AADTNF martinez_a_Page_051.jpg
1ed203d89d575cb863590ff2b347992f
6f8ff9fdc5c46544f75f2475b24bfd6353e16674
2702 F20101114_AADUQI martinez_a_Page_146.txt
58622562cacdf2d7eb8e6b5176a367f4
f62a26569f1bc6fd639eb58e5a75700c4dfe99c1
1307 F20101114_AADUPU martinez_a_Page_130.txt
534204db23c978570da6f8a217f05c1c
926f397f723010b2ed72c6d6f94ef7aa3886236c
81407 F20101114_AADTMS martinez_a_Page_038.jpg
41b910e7c8c44ce9f2b38a6448020a55
57d3de08372cfc9dba7ef127954e2cda68e2f94a
103484 F20101114_AADTNG martinez_a_Page_052.jpg
4ff85c4e8226b27bff6d913bd7696fde
b0ec35be2b6096ef03eb4a0d0622f2eff868ed7d
1169 F20101114_AADUQJ martinez_a_Page_147.txt
9ce8b096908e31228bb7cbb88a87c436
766ea83afd834361325aa630b7ea37125cacb3e8
899 F20101114_AADUPV martinez_a_Page_131.txt
9d7c279ba548940fb845b8e6e6353124
6246c1dabc69707fb31df3287f084ccc4a996941
77312 F20101114_AADTMT martinez_a_Page_039.jpg
d19d92823615a45c6bf00c13a74053a1
6773b4c276cddec71b1735c7b51d8ba55dd2ede7
104368 F20101114_AADTNH martinez_a_Page_053.jpg
cfa26b6caac849e3e23f06f5ab079fa3
807fe6cb1108408b10b3928a344dd590cfaf799f
3534 F20101114_AADUQK martinez_a_Page_148.txt
c5f80602e3b4cf92b4654c09ee35b275
5de3ced7b46bbea1c82ce7a4c5dbfe069c3e316a
549 F20101114_AADUPW martinez_a_Page_132.txt
95516287a345f743a29763f2e3f8d12b
98e57fb93ba537a1432decf51ebe8b5138181bfe
94864 F20101114_AADTMU martinez_a_Page_040.jpg
c48006b2000dcf7dea9a9a9c21c90986
90dd2999f7734a8ae50e095e5ef21e36b1573c14
79687 F20101114_AADTNI martinez_a_Page_054.jpg
766e0ee9cab60dfad516ff8b0f8cbc83
11c4e9a9a99e1856620f4a6b7bb43c2f33f24e34
1871 F20101114_AADURA martinez_a_Page_167.txt
14ee31c6a4bff7334fcf03ea65cf027e
39418de201666ba22e52f2b4c88badd577232d54
3314 F20101114_AADUQL martinez_a_Page_150.txt
6ad377558f24e91b7b0aac33eb8c2a92
c20b83a52ff32e18ea77bb9e17fc3f2459e1426a
582 F20101114_AADUPX martinez_a_Page_133.txt
fb363bfc5b9158adb1e6ecb33c0b47eb
4d9a4663fb5afc46dd3a33c622b88f053726c8c2
107143 F20101114_AADTMV martinez_a_Page_041.jpg
16e201f8bb8c0b02a1a511bd1c5293a7
f423dd2e115f0a7af0506bcf1df1743616be4b5a
84160 F20101114_AADTNJ martinez_a_Page_055.jpg
6ebcd188c1eef3c46d231501e0f8893e
2994d4127a56c6768bde3648161c23cf3a74d772
1821 F20101114_AADURB martinez_a_Page_168.txt
93930bd9f9be5af4ef813668aab96610
99cfa3addaf0c02be860365772f46eb6e257d206
457 F20101114_AADUQM martinez_a_Page_153.txt
7dfb103f8d62c5d013e4a9b1a2e2029b
fb8888f9684379554ea5ff3646f8e26c31751b50
558 F20101114_AADUPY martinez_a_Page_136.txt
c28cd904a98a256f7316a0216d93739f
d8fcdd16b47f84f3dc44c2fa3e4b57149f87aa28
91525 F20101114_AADTMW martinez_a_Page_042.jpg
9899a2548293981eaa9e089bb70b1d40
c453f29848a3301fc70e42be43cb367ce70d1be8
109065 F20101114_AADTNK martinez_a_Page_056.jpg
99a0f8bf127e548cdcee6e0dd4442d72
53b1e000454ed388ecac37b2c606dd627d40ceff
981 F20101114_AADURC martinez_a_Page_169.txt
2e58f549f50df14107d9e89d72bc28c1
7419745a8a7668ed7af8cfaaee982b2f32abd52f
1440 F20101114_AADUQN martinez_a_Page_154.txt
0674509a51f55bbec73352f208fbca1c
74196fbf11de2dbe7734924759062843afe9cd17
1470 F20101114_AADUPZ martinez_a_Page_137.txt
7e6b10722f95049f2a49ae3a371d3139
809c58bf61a3447a2cdf429607d48458f37fd9b9
100580 F20101114_AADTMX martinez_a_Page_043.jpg
1c8658203857d9ef1f6b4e4f59c2a500
fd4303a3eee0b35b0ce16686d8a9a309b59f3838
119091 F20101114_AADTOA martinez_a_Page_072.jpg
932dba5070592281d91569b06205f5a6
5e11666b05f202c8f8ad69286d7b1665d5b16d05
95602 F20101114_AADTNL martinez_a_Page_057.jpg
b205058407efd4930d169c75a350c2d8
665ca558057b7379a4db5e0e6fffa11a48bf35df
2353 F20101114_AADURD martinez_a_Page_170.txt
2e5416d93b7a078cfdf20938f5ad8b10
6a1447c36f6c3cee7c869bf6d8a58ff66f6de3f6
1848 F20101114_AADUQO martinez_a_Page_155.txt
7b8b773c2467902e2ca0283bead39d69
4b58e472e9f49007e7ab0cdb72d1df3a14b9d549
109149 F20101114_AADTOB martinez_a_Page_073.jpg
10adbc5957da65daffa0808c47141197
6460bff1a26d75b2ed0dde39c9f43b2580c0ba9d
107341 F20101114_AADTNM martinez_a_Page_058.jpg
b5007df7c869f75d4b2ebbe0676b3b92
0551879d7a6e213a1a2d3d9a2093a1607a5d9dd6
92397 F20101114_AADTMY martinez_a_Page_044.jpg
dd3bc53d810fc96955570d5b17a661b1
dc6c2fce6aa8e2556571084a443c331d1abaadc3
2550 F20101114_AADURE martinez_a_Page_171.txt
0d9652fbf10384e2445b6bcff603d220
0715be0ad9957102b88c7220eaf075c5aaad53e3
2484 F20101114_AADUQP martinez_a_Page_156.txt
40209fb3c1ce2109dec877e6cac088ca
8ce9aea66af87ee361af5009cfe656cc4b13231f
107750 F20101114_AADTOC martinez_a_Page_074.jpg
78a6f9f6f4f2061c6237c5dce0828aa8
26ce7946aa49a4cb86d39228a0053f1a3e61c546
69580 F20101114_AADTNN martinez_a_Page_059.jpg
6c753dd26fbecf7f8381cc5628895974
f5c32c7a83e59f2eb2a91b47c90337c18eddd873
98393 F20101114_AADTMZ martinez_a_Page_045.jpg
36b1c53ae4cd38a47f93eb18f0db2c3c
c1e38c2c74d3d01d7100d038614838184ab3cbd6
2532 F20101114_AADURF martinez_a_Page_172.txt
5f65fc3c32120dcd9b2d73efa2e58294
46434cc8efaca1ba5ed2677d5bf1d2a054f281ec
2001 F20101114_AADUQQ martinez_a_Page_157.txt
7c21b2d8451b8b81fd3c5d8c877d41c7
aad1acacc197d99b0c5811277eb3133dfa622fc3
92701 F20101114_AADTOD martinez_a_Page_075.jpg
874414b6add0e447b979b392e66546a4
444438f435e1d3d71edb1797741142e6d45b4dbf
80947 F20101114_AADTNO martinez_a_Page_060.jpg
e35b0de83d84698ebb9c1e2056fea6cf
401116e42a312225879fe6877efc1439f283b795
2506 F20101114_AADURG martinez_a_Page_173.txt
9d44e500816cae427c0dd3498b3dc512
9f0ea4fde0c4fd35fff8ea7189e805ccfd9a3e7b
1849 F20101114_AADUQR martinez_a_Page_158.txt
1f319eb66a9394a1ce2c4e0b4dac0d25
e9196515335d17847b2020f3190413d1572f9886
92617 F20101114_AADTOE martinez_a_Page_076.jpg
5281bb3b157e24d3a328172d4e4d49a0
7d56f4ffa24fe381217394d4e560a9fe09270522
106156 F20101114_AADTNP martinez_a_Page_061.jpg
7cff054ae8ee5d898434a875c9e072a5
77751665e26acb6cf963e832eaee3b17c3f7ac7d
1533 F20101114_AADUQS martinez_a_Page_159.txt
3106e9da9bd99c8fee78c673822b50f7
75358f36655d995e7d9d45b0137f5937b9134960
116141 F20101114_AADTNQ martinez_a_Page_062.jpg
4d79a781d7a7b92c71de3ac580ac0c72
a5966284feb062445a44fc3738c49ac1bfdbc072
1915 F20101114_AADURH martinez_a_Page_174.txt
dcb6e41ed345331c9837bd4f41dfb578
83920957d70d644ab170fd66f4d1bad7af488653
1603 F20101114_AADUQT martinez_a_Page_160.txt
958206a9202984c5a8c9082e26187d66
c066ce1cb4c9a52e7785c48fb6842219ef9f63f2
104159 F20101114_AADTOF martinez_a_Page_077.jpg
ffa891ad2f07b4b12989115f70444dce
5c8cb3a89b45d8b54e01fc4d4e1762d8c499df50
105930 F20101114_AADTNR martinez_a_Page_063.jpg
00aaba92032eba3d8a7fe8ee8e0a861a
c64206d155756eb62a77124bbd32adfdeffdd7df
1191 F20101114_AADURI martinez_a_Page_175.txt
020edff55d9558d2fddbf33c0caca306
2ede93b0395acb019287ef3e2ab5addc7d06ab2d
1806 F20101114_AADUQU martinez_a_Page_161.txt
5985a887c4dc1553a6c5d7304204c20a
10488027b0712a77635f046ba3e34b70c7290ce7
106809 F20101114_AADTOG martinez_a_Page_078.jpg
aebd061a797160b93a4ceb9308c814ea
b0cbd76402631f34c614450f98739aa2fe3eebb6
107916 F20101114_AADTNS martinez_a_Page_064.jpg
e197095e7bd42aaa78aa74a0cee9a419
88f7a2ccee1c9ab43ac1a93429e0c6f0a5e66ba0
2053 F20101114_AADURJ martinez_a_Page_001thm.jpg
72441826e7277662fce5680a0f90cd8c
8d3e82cba2ca5f0092d3c39efdc6142c8635729b
523 F20101114_AADUQV martinez_a_Page_162.txt
a14d9e77b3929cab7b2262e456e22711
260e0c735c48a857e26a5723e38a27d0df119d3c
107200 F20101114_AADTOH martinez_a_Page_079.jpg
af5b4b74f04ba068b9a4d16f37261cda
12aa6683a19233efe495acf684791b7ca9c78f7f
103486 F20101114_AADTNT martinez_a_Page_065.jpg
b1fab2015e0eab5f4e7bd965a72913aa
9160fbd5f7e60128c0067746c7116dc1dce35aab
1643122 F20101114_AADURK martinez_a.pdf
036499721228ad2cd52da7b0fc8a2301
18458356ae85a24632f8068dbaf93d97077f626c
1276 F20101114_AADUQW martinez_a_Page_163.txt
d84f68638579d031f41e743662c54593
84544c973330bab1a492735f1d15d49e0c182e86
111722 F20101114_AADTOI martinez_a_Page_080.jpg
8b8759bdef5e8943d45492663a648cd7
3f85ba95324375ab692254955453161b41f22614
113368 F20101114_AADTNU martinez_a_Page_066.jpg
3461d2037cb06a680396151a0ebfea65
85f1fcc96e5b4a1678282506b697a4fc6b637c85
35472 F20101114_AADUSA martinez_a_Page_010.QC.jpg
4ef7b9e1a08db9762b940dfbd3cc62e2
5d7b09fc65d3f38a3a97324cc73ddedb8a47f179
8589 F20101114_AADURL martinez_a_Page_001.QC.jpg
6be2cbb742230911759e04c1603b4b81
11d455628ee2d174fac40bfc6578e6c9e944c3a1
2120 F20101114_AADUQX martinez_a_Page_164.txt
c7e44e430cf41907b105ec5055e1c4ba
8e4fb3a60ba8b3ed2c7ebde1b6f0f9a7017229a3
106372 F20101114_AADTOJ martinez_a_Page_081.jpg
49aa7bf1d76d9959954ae080f0b4eb0d
7fd7297da8cd73276186c83a96cd1239f29178ad
85194 F20101114_AADTNV martinez_a_Page_067.jpg
a220b272384ba9c9813c908f59967fb8
f57ac0eb9b662b4946416d7453d201d1f0195983
5793 F20101114_AADUSB martinez_a_Page_011thm.jpg
06e9a64cd40c885da2697a363383cbf9
03f7c39124f1fd5fc97f9f082725de6cd2d1874c
534 F20101114_AADURM martinez_a_Page_002thm.jpg
239bd45ba015428f30430ed27fa8c719
16b194d13350cb455ca563c2bc6e090b89fd6b11
2324 F20101114_AADUQY martinez_a_Page_165.txt
13f0802139bfcfc4e6aab7aa942574c3
266d98848aaa0142d7bffdad3d1e94c0b147d16d
86991 F20101114_AADTOK martinez_a_Page_082.jpg
7af24cf1d4bb77f21ef4751e25661351
d7dbd6f7331e85af0293eaf5216c1d5c85580d3c
64832 F20101114_AADTNW martinez_a_Page_068.jpg
ce664c8020203b5a4550288df57fb2f3
71c6de97dab4e4406ed9af05a2f0af8805f18432
23946 F20101114_AADUSC martinez_a_Page_011.QC.jpg
91a8e09104e9207b6ea1bcd6589a515e
64eeb3b82a78ad96b42a9c3b15477e95e5f24cc5
1221 F20101114_AADURN martinez_a_Page_002.QC.jpg
f4f2c8dcd3291eff05dd447c3271015c
bf53738e019cd97df72e121f4be70feea666de90
F20101114_AADUQZ martinez_a_Page_166.txt
55cf184413ac7ce582922e21a5969248
751b3a2af535d174f7e21a207f562d087891b409
82085 F20101114_AADTOL martinez_a_Page_083.jpg
d0a6c24112c31214728bc69ff8166ae3
ab5366fbd83df96caf855f3f8c409075143b4d26
100572 F20101114_AADTNX martinez_a_Page_069.jpg
64d6b76dcda0361c191b2325644f7d4e
c167305de38217520653da896ed095e1f80d5bfb
98795 F20101114_AADTPA martinez_a_Page_099.jpg
6cfb035d75f8995133642d37d95e903f
7e0ee86d3e1ff1f206a041aff48f01a108605672
4765 F20101114_AADUSD martinez_a_Page_012thm.jpg
7e02fa86a73d1904e133ebab224c6105
07d9e9618019118c1e9b910cbd86b1f528836d96
458 F20101114_AADURO martinez_a_Page_003thm.jpg
3759edd0b815407572c633972cebab6d
876558e197a9bd8685f67f515ce86e644316c6d8
114350 F20101114_AADTOM martinez_a_Page_084.jpg
9fb0680430ef468b45b5cf3dff9e3f7d
7b22fc8b2d910c210b2449cbfd4338ab3ad8d64c
109265 F20101114_AADTNY martinez_a_Page_070.jpg
2e6de8e437accb3ba11028335bbfd127
4c1c889ac32d396858e84aa31c40c44ee6a4bfd8
75237 F20101114_AADTPB martinez_a_Page_100.jpg
c847a7529c00418a58c2c73f9ca4619a
cca326cae2204f51e8ae31baaef903c1b6b0f11a
18288 F20101114_AADUSE martinez_a_Page_012.QC.jpg
c474c6095cd72a4b23e0954650a1fdeb
b478d6eed142a5f7bb029599111409b1d80473b5
991 F20101114_AADURP martinez_a_Page_003.QC.jpg
a72c03e511659afeacafe4db2aaa5530
9b70ad016e1f96d131c08d794a87a5890e5e47be
103013 F20101114_AADTON martinez_a_Page_085.jpg
5cf076ccd72c9c47f4d848319e1cf9d2
a7aed4cd4e4e67d627c63f90a12819e93ffc2a80
159267 F20101114_AADTNZ martinez_a_Page_071.jpg
722375e007d83933a0fe31b5cd9a20d9
6b5e03354e30fb6713e39c95cb1034310ec630a1
57823 F20101114_AADTPC martinez_a_Page_101.jpg
8dd5958ac79ce8c9c6f337976dbe2e0b
5dd3032364e744db73ee75647339e125f23f5a49
1508 F20101114_AADUSF martinez_a_Page_013thm.jpg
3944486492c617996bf1203edf389fb4
f7f0ebba37182717b7505c199d4ab19d67ab287d
3171 F20101114_AADURQ martinez_a_Page_004thm.jpg
19cb02dd6b5f08973ed424afb8e97bc1
5a7fffaffdf4f4f1dc551d7bab7c4ec315334e09
107859 F20101114_AADTOO martinez_a_Page_086.jpg
f252de6b142644f01735020c43968cc9
a6815087904b41040e85e285259736aabf1bbed4
115797 F20101114_AADTPD martinez_a_Page_102.jpg
00992ffbd5b32786c87a887ffdadc96b
8dcb4507b79e52cea673aa1fcb119b1cce5f21e3
5434 F20101114_AADUSG martinez_a_Page_013.QC.jpg
89d3e9f5afef87b551acbccecc7b403f
3bf4ba3e7f21b2d3d8bf36d1c58df5dd49072ee2
12085 F20101114_AADURR martinez_a_Page_004.QC.jpg
6f37c170d7c3c41c9bab865d7428d496
b9ab6e5066880e6d25d914d7019986e6533c0f34
110230 F20101114_AADTOP martinez_a_Page_087.jpg
4c0f267e756eb4b77fde124e48b6ba9a
58515e48a596af3ba7861388bd50f91689d571f0
90478 F20101114_AADTPE martinez_a_Page_103.jpg
7f2653907eb36f47f5b448fd3f364d95
48fe3c4d93b383a31479b9781791398325c09f7c
7186 F20101114_AADUSH martinez_a_Page_014thm.jpg
c8ee90c3c6e45c953136225af5dacf3b
04687579e42172907fc720d2bfb5ef7e2cb75f7b
7294 F20101114_AADURS martinez_a_Page_006thm.jpg
6601075c9e9b442ac832a7c5022da930
278987a8de993cd8b9e5e5256ace0a749e89a440
107275 F20101114_AADTOQ martinez_a_Page_088.jpg
7c313f632cad564ee9c6102b089f1691
07a4e1bef69638d9abd3085d5a6c876257e35d03
112309 F20101114_AADTPF martinez_a_Page_104.jpg
3976ed28cd70e21df163dcfcd16e7540
0d53a464344599ada0c9bd37da106e726778f4bd
32644 F20101114_AADURT martinez_a_Page_006.QC.jpg
e3dff017c79e9ee83ff2173d97f3bec5
6b0690c72b9361f51c55f8d62e04e677b7f59e25
109411 F20101114_AADTOR martinez_a_Page_089.jpg
59708ba75899a653c46f74bee8bff9c5
eca251e8ad3814cb718912ef80dab4375cfa411d
27669 F20101114_AADUSI martinez_a_Page_014.QC.jpg
bd33b4e1653b1dbddff78a31e1f32e4f
b7e5fc10818d79bd3e3ec4d330b2a9c2a4259a03
29066 F20101114_AADURU martinez_a_Page_007.QC.jpg
e9a05ad2d143ad2909ccff1d45af885a
798d941ac575d97b9b13bb7b0f67b6db9ad9c958
39685 F20101114_AADTOS martinez_a_Page_090.jpg
25a9d5d0519f4aafb52ea9834df19e79
1e44cd8c15fa7f8ca7b86ab3cf8670bfc8998f17
102425 F20101114_AADTPG martinez_a_Page_105.jpg
adb751eafbc003d77a3c16c68feac4ea
224789ce2c70749a1bbfe100b26aa55cecd9fc1c
7582 F20101114_AADUSJ martinez_a_Page_015thm.jpg
788bec081dee3b092e44464ba3f4d5a0
3107b42d3316605285a6a96db6baef6d116f3897
8694 F20101114_AADURV martinez_a_Page_008thm.jpg
a6083cb2353cb0575787724bb52cb51e
ed444a7e721d1a670efd285a5f07b78fb3dc6676
40095 F20101114_AADTOT martinez_a_Page_091.jpg
71633e6d98bd4e95cdff79caed6a95f7
2e5e3157fc392a834fa8654d20d5680065f50556
114583 F20101114_AADTPH martinez_a_Page_106.jpg
b3d745eec65984d586263b3e3194f1ab
b4ab155ed384a6637ec425d26db62eb489a70c41
31102 F20101114_AADUSK martinez_a_Page_015.QC.jpg
edd1b3ea1f7cb69738ecea5883356ec1
4ea67c95466cade1f668631b7848d1fd530b4403
35942 F20101114_AADURW martinez_a_Page_008.QC.jpg
1bea92c44501a05f130b867601cf562f
1b926c08a058b4233d12e000e7a415202f33eef9
40758 F20101114_AADTOU martinez_a_Page_092.jpg
d26a54d272dcfc88652eb54bce53c753
801726d4797fd9c8ef34748e18de86b54c3908b8
92951 F20101114_AADTPI martinez_a_Page_107.jpg
842c5f3add4c96d58d8c72b5bfe77472
9c240a4ba8739db8128904d57910c75f4e67955c
36081 F20101114_AADUTA martinez_a_Page_023.QC.jpg
26eaabafb46edb1c38f9e245dad40019
42febd064491ed6ff29df20447533437ffbf9d73
8695 F20101114_AADUSL martinez_a_Page_016thm.jpg
ee98deb6bdc74eddde317675ef01711c
1d80606ed80c9be52a19298363e0c3b088e4e7fe
3424 F20101114_AADURX martinez_a_Page_009thm.jpg
9bbcc89b363e45518b847567c9e22d5a
d1913e52e115e1a76f2bbb02c0687063b36ac829
38443 F20101114_AADTOV martinez_a_Page_094.jpg
06e495712f0d7b68becb0c6f26a820be
be45a2acb0ed746e04c100e0e85dc2c58feee6ab
107634 F20101114_AADTPJ martinez_a_Page_108.jpg
dd361294f1bae7b9d226fdc13a170fd2
1cfbf470a3800b27eef9b245a2985d659bcd5970
4254 F20101114_AADUTB martinez_a_Page_024thm.jpg
9ca462efa32c9651303d00a3256457c7
886a51be77cd353c5a9ac85a459e61eb1b902d6b
34631 F20101114_AADUSM martinez_a_Page_016.QC.jpg
3c9ba890a559035380e97f8aae07d4ae
b20d28bdeb199dba9155aa27ea1c8f9578411505
13854 F20101114_AADURY martinez_a_Page_009.QC.jpg
5688c351ca8d282486780cb67a6f24b3
ba41b7f584ea93b3d4e7b28ff4bfdfc5c3c27d28
39370 F20101114_AADTOW martinez_a_Page_095.jpg
ee2054f3d0b1fe38a8644f0b1e2d1299
a1ed8b86ea832065b0da636fbc845ab940d33a93
106492 F20101114_AADTPK martinez_a_Page_109.jpg
d1cbfd384ca2ed373663dedfb580ec30
fbb8f80016def95ff728ea1b152f11e008c48cd1
17371 F20101114_AADUTC martinez_a_Page_024.QC.jpg
6ccd738d0ed17438f444aba0a38658a8
8df670f8d9d0fdc1ade9af985decb102a872b459
8607 F20101114_AADUSN martinez_a_Page_017thm.jpg
0aef88f5a6c321cd12fe1ae086a297e9
7768280d46e7fbc9a5edde64ff3054d73d36d419
8997 F20101114_AADURZ martinez_a_Page_010thm.jpg
5112a4547e0d5f7e9f04cf23b2f4bfaa
5e078b46d76dedb361f4f9aa5780e72312cf7bff
68429 F20101114_AADTOX martinez_a_Page_096.jpg
29c5d0dc326d95c4c3100f49e01bcd6c
5693a9ccb25be2971f3d0be2c1d0ac0f6a3d5f3f
67566 F20101114_AADTQA martinez_a_Page_128.jpg
0701ec7b4bc96ecfc515902e10ded69d
f1a12242391e4197434672ca568ce781383293f6
97199 F20101114_AADTPL martinez_a_Page_111.jpg
54e4955345167982c9ffb512934943d1
a9049aaebfdfc0a801bc19a516237aca8b6a7798
8207 F20101114_AADUTD martinez_a_Page_025thm.jpg
289b80153691e37cab7951bd2527ad2e
0ac0c6d2d296d919ceed147b5a8431891dca7ef9
34723 F20101114_AADUSO martinez_a_Page_017.QC.jpg
045f3cc3365d840fa63e409f52a14856
a6dee08295592165a96cf85c9dd33e31259966ae
91658 F20101114_AADTOY martinez_a_Page_097.jpg
f745b39d8474431a8a8acd182fb5b00b
dbae7c437567e50419655baba5e0dcff3e86a95f
73157 F20101114_AADTQB martinez_a_Page_129.jpg
ccfb6be0693dee9b553521500b1688fc
d17c9026d0f9ecea1d4cdbbfdfc6c59b6af07d82
68507 F20101114_AADTPM martinez_a_Page_112.jpg
d58fecf14fad1fa6ab5b9e13e91e75be
71bccda57b2a6d02f46834c8946c6218354483d2
8594 F20101114_AADUTE martinez_a_Page_026thm.jpg
dfc555e88dfeffe703d8b018801931f1
186ab6b837d8f013105d1a45b4cb238cea95b4ad
7941 F20101114_AADUSP martinez_a_Page_018thm.jpg
572cfb589a4a35ac27a05658361b7e58
6d32ad61d6cc97ab2e845e73b6c198116b35719b
111611 F20101114_AADTOZ martinez_a_Page_098.jpg
d9a5483b363c9fce367bf84c0179353c
ba1b22d10593cd727fcc36d8656ab6c23131c5cf
76366 F20101114_AADTQC martinez_a_Page_130.jpg
47994af2818fdc73b415e7e8e4b66a78
ac977a5261e22006bdc6e175981b47c121181037
93131 F20101114_AADTPN martinez_a_Page_113.jpg
2b8176c602f317f372cae33cae799ba6
9282a8de07773b96268855d4d1f1570fedeb6c89
35482 F20101114_AADUTF martinez_a_Page_026.QC.jpg
080705e6e33149ebfb1957b68bc4793c
158310c45b9e94382594234e2577e199a78e1b35
31045 F20101114_AADUSQ martinez_a_Page_018.QC.jpg
c1ffc0d6b8a94ccb743225fad2d06883
94cdda79d9a9256e333e66cd195a964b3a90decd
61638 F20101114_AADTQD martinez_a_Page_131.jpg
c304bd8cc7fecac3ba712cd1f909fe5d
6d69eb87ae47fae62e1a1feb8a8a30fb5a6f9dc5
69859 F20101114_AADTPO martinez_a_Page_114.jpg
c67f363a8f751cf5406c19a4bce8f4fa
04d62db2b10c4310768178a8b09cdf61d923b0fb
8309 F20101114_AADUTG martinez_a_Page_027thm.jpg
dc1a11b3ff58b3b660003d2d40480086
6bdd54b98b3b2505376ab4c70f3966f659cfaf76
8384 F20101114_AADUSR martinez_a_Page_019thm.jpg
821f2ee23d1d5b77256dcdacac77f8e6
081be67148e69f9458b6e73384ecd7c3a8d61a8e
45347 F20101114_AADTQE martinez_a_Page_132.jpg
f66f60aabe81ce4822a11c3e8f713215
e98db829e29ac24412600eb5dda19d67c4d7c67c
103182 F20101114_AADTPP martinez_a_Page_115.jpg
78430d39322a2d841473fe452a0c0346
a97b5dfde9d911f47b1cb6e326f2c59421dee195
32568 F20101114_AADUTH martinez_a_Page_027.QC.jpg
b15eaf63e8b4974e531e6496b7e7b4b1
12eb8027e71c61c0ddff4c89fc0d69c755ec6d88
33523 F20101114_AADUSS martinez_a_Page_019.QC.jpg
57bbbb163218d736b9ab7027dd080bb6
343e185e7291af09b8325cf55eb9201b0853e20a
45126 F20101114_AADTQF martinez_a_Page_133.jpg
1021ad4e9af1b5f826c75579969c98ad
73ab48e06ba203e47fbb44d04e3a8ce60dc37bf4
112699 F20101114_AADTPQ martinez_a_Page_117.jpg
7a6d1c149b7c3b4e3117e8d584b5354b
81d299fc59b9333fe6fd8b3b2cd64c75d546a427
8256 F20101114_AADUTI martinez_a_Page_028thm.jpg
0b439d3a6b0416d6c208f6e51f42a0c1
9eb38cc64117e8d4c60d8678b178427476992d99
3817 F20101114_AADUST martinez_a_Page_020thm.jpg
916ae9ab34d2a130a0326a27ba88db59
ae728ad51f62f01f2866910d4536991c612e2386
81652 F20101114_AADTQG martinez_a_Page_134.jpg
3ec4353514d8e5731448ac1b2702fc7e
5b502400ce4abf7b23793947f9faadbd39387e3f
109923 F20101114_AADTPR martinez_a_Page_118.jpg
276c5f2090949b2454cbc14e45a62060
b1dc96a6179d600a283d1792955f00913094f74b
14226 F20101114_AADUSU martinez_a_Page_020.QC.jpg
faaf1033bdc15e10bfd8b52434609a4d
8c69818052e369f88a008ffe01c09a99a88227df
81500 F20101114_AADTPS martinez_a_Page_119.jpg
8ad21c72ab943e01a910c0b2276ff5f3
cfb738c510157d60c49e94daa2be3530cc348c31
34245 F20101114_AADUTJ martinez_a_Page_028.QC.jpg
b5ed5825f34c033d6208672ae2d45cf3
62b80fe55800723e198961d8d46fe476f71289d7
8231 F20101114_AADUSV martinez_a_Page_021thm.jpg
4abfdbb3bea1f14736d4a88d1fea81db
719e8f7460f6f6bdb8bab83b719e12c68ccf7680
69001 F20101114_AADTQH martinez_a_Page_135.jpg
c5a3519feb002b583e76300e00fc7154
33809157e80cc7bf4510519c465ef87ee6709716
114939 F20101114_AADTPT martinez_a_Page_120.jpg
389c7bd20a1da58c9246a98c2819be5a
6629938e45bdaa522bdc82ce3858e972616ece43
8833 F20101114_AADUTK martinez_a_Page_029thm.jpg
99eb0e449f255e0dbe5ba6a644f9a68b
1c650dce9e520cfff6009502714485cf75f932fd
34093 F20101114_AADUSW martinez_a_Page_021.QC.jpg
80542337c7152900852c54a1295c475b
3cee2e9ec35382fa03802ce41e78f2831a49c08b
34613 F20101114_AADTQI martinez_a_Page_136.jpg
ac52d27d2447565d0fb569d610f7e579
254567daa836420e6bfbc5e140f23bb77b22c715
103586 F20101114_AADTPU martinez_a_Page_121.jpg
2b06d3c830313a217048387069693d03
799459af9e5c1ba5e63c98e37cdbeb86345a781a
7228 F20101114_AADUUA martinez_a_Page_038thm.jpg
60d4b7c06727309341f0fb5c2e4d9f9a
bcee9f5447bdf49db5e75a7ac2a16c346ef7ac13
37191 F20101114_AADUTL martinez_a_Page_029.QC.jpg
d45f31f95413e033759081ae192d855e
ae710a6d78bc1a694653ee39cdfbb5631cfedfd7
8861 F20101114_AADUSX martinez_a_Page_022thm.jpg
8118a9123809a36880006263e646a844
87c048d688d4052c8d05f55bdf0241f3cb2b196a
77528 F20101114_AADTQJ martinez_a_Page_138.jpg
a2d8b8ef64411d2e5cfa6341fe2a6bbf
3a0c3c9b714cd584ea5f99c32f0f2e76f9546bae
100954 F20101114_AADTPV martinez_a_Page_122.jpg
c73c85e8a966aacce88a9408076a69ff
c2db2cb2cbeb989ecb919194206cb5130649bfec
26270 F20101114_AADUUB martinez_a_Page_038.QC.jpg
7353702d072d2bf688b14bac8504f251
739d2c49825fc9eb5b8ffea665c72c4343f6c594
8475 F20101114_AADUTM martinez_a_Page_030thm.jpg
7ba0899ec9168fa998073e142a0b5855
ca53c6c7730cad4d393897bbfc3788606eead880
37408 F20101114_AADUSY martinez_a_Page_022.QC.jpg
9d0dcf906576286b97b59d6f9700122c
74ce9ba341d4e94e244f7c286f2d9ec36d4782a3
25497 F20101114_AADTQK martinez_a_Page_139.jpg
41406c07bdb607d6a03b32c4c5e3811f
06e6798832b06dd1d67eff8af0517c6222eaae75
104205 F20101114_AADTPW martinez_a_Page_124.jpg
5ed2edc0fa35ffa5237a1074f140e1b6
7203f39c2eb77b164987316682d369f81b5e9f54
6429 F20101114_AADUUC martinez_a_Page_039thm.jpg
3669c3a45835664c8a0ff4116cefd607
6d584b26e8879879e22569c2b76154961b7a81b6
33235 F20101114_AADUTN martinez_a_Page_030.QC.jpg
df181f4d1d70c7dbcb91741e3fdddbd0
38e2c6cb80c46bf132961935f524aa5157f798a0
8919 F20101114_AADUSZ martinez_a_Page_023thm.jpg
f923f7a8c2843d2a47f8b74c68bb84b7
42cb55e893b9e77597f0d010e168ec32dceedcd1
120530 F20101114_AADTRA martinez_a_Page_156.jpg
0909fbc57f3a0298165d49ce01533d01
a184f8190cf1d4a880695185b5db670af2f3e4f6
122339 F20101114_AADTQL martinez_a_Page_140.jpg
2df476caa7e8370ed1c8aeb2774df12b
f9df231806298c74e8712066c41c14db5e189060
116785 F20101114_AADTPX martinez_a_Page_125.jpg
cfe0f29ff1c5a9967a69c264890c86fd
5ed86d703472b997e6ec72c786135654d39ef7cb
25114 F20101114_AADUUD martinez_a_Page_039.QC.jpg
ab75094b3207c44559321fcfb5c49af7
d76e71c0eca838d91013221a0a21b858f300d66a
8613 F20101114_AADUTO martinez_a_Page_031thm.jpg
3e7d92a58ebc27728a9cdaca08bbd89a
160ff88e88c39a386a554e8bf168441db78971f4
96606 F20101114_AADTRB martinez_a_Page_157.jpg
8c7ff4d156eb9bc0fae16c8905d67829
69b2226fe29dc92337fa47c6f212ebbf41986882
143759 F20101114_AADTQM martinez_a_Page_141.jpg
1e0c8f0ab9f56955e2fc816a5a551dab
9ad95edeba2796b689927bc20f0565da9ad49ff4
112258 F20101114_AADTPY martinez_a_Page_126.jpg
b92ccf3c3d0a6fb05782e1b9181959d2
71f2fd19eb1374363995f4b8e4664f5bb98aa52d
7754 F20101114_AADUUE martinez_a_Page_040thm.jpg
89f03dfe827d3c599356e9a489b4f572
e6a39eeb0fec8d929837036d3f209b3fa40bf9f1
34457 F20101114_AADUTP martinez_a_Page_031.QC.jpg
e6ceb087352931b9ea3ca0826f3c6876
314e13033c2e61ecdf04a64708589ffe3683e43d
90847 F20101114_AADTRC martinez_a_Page_158.jpg
0e97258a4d0c1bbb84bdbbc029c713a7
a1fd957dff2496d5c215ca2705e6ef489d4df31d
46709 F20101114_AADTQN martinez_a_Page_142.jpg
b4a22c5753cc21798e30ecebfb601e54
eb33b5717b398e196a625d6014445ca3cb1367ce
56153 F20101114_AADTPZ martinez_a_Page_127.jpg
c1b90704d3a95e950c59f7670efbd3be
6e21df09028cd105a26d25e009f432a04cff4a06
30421 F20101114_AADUUF martinez_a_Page_040.QC.jpg
ce8309f559af6e5cc7c904052e84e67f
5de5cbd67053b53ed382f36b1f99a9fcbaa40417
5610 F20101114_AADUTQ martinez_a_Page_033thm.jpg
226ec01290cbb84be3d1e3739c13ed72
d5605fbec1fafccd64c934a5ce6863e724f3fe44
75814 F20101114_AADTRD martinez_a_Page_159.jpg
da30eb04adfde1ad50aebfcc79d8512e
b114fe6052b56941b2adfda4778d935aad5c3eb2
51311 F20101114_AADTQO martinez_a_Page_143.jpg
2f75d03e8934b93ce076f225bd44933f
22796e9c46c279b32123f030575409856cdc60eb
8414 F20101114_AADVAA martinez_a_Page_122thm.jpg
3778a7daaabc3d12512bf4349cab89f7
a452efc844c489ff7a4342ad7114d42a7406efda
8668 F20101114_AADUUG martinez_a_Page_041thm.jpg
0246d71e867815ad44068319dd3f6fe6
431b20d1e26f799ed197e745dd4a0ba25e47dc55
18620 F20101114_AADUTR martinez_a_Page_033.QC.jpg
8c9eaba9bff931d0fa44aac4491a6d39
9e2b1a95c863a5b8b2ea9e4a0fe5d8b9cbd2e166
72669 F20101114_AADTRE martinez_a_Page_160.jpg
2ca5a08c550fa555e21b5ff15de24f2e
9e0a92433f74c1db0d60fb5964fb4ae329959168
60513 F20101114_AADTQP martinez_a_Page_145.jpg
7a3cc3023b980133d9c9215c8211ee9d
b1fafd9c550cef9dcdc052420226b3d5920cbb52
33611 F20101114_AADVAB martinez_a_Page_122.QC.jpg
9f3310ac12ff697b9864d0ab401c7648
beff29ddc5f90b0e57ddf7e21a6d7afd4134a6dc
35687 F20101114_AADUUH martinez_a_Page_041.QC.jpg
ccec621555001cb2295d8582308f9c3b
fa7cce24be74ab76033cd9e69ebb6a7774981bcb
8053 F20101114_AADUTS martinez_a_Page_034thm.jpg
fc048af53eb999fa36b1d7d15b05f885
b3f1cc1c734a6031d0c7425dcb2f4ac5286794c8
84910 F20101114_AADTRF martinez_a_Page_161.jpg
53c04d375b7ac3b942e51ac34b205350
6fef141295f9b39ffa912a250b48028b416ef905
135583 F20101114_AADTQQ martinez_a_Page_146.jpg
b537922f0a10eab3ec82fdaffe4dda16
d7995b150f10ed1cc1058874feebde7159581bb6
2005 F20101114_AADVAC martinez_a_Page_123thm.jpg
208841ca6ee2e9677f2aea53faecb037
0467ee6f36aa847ad07397c9da38b1ae482229e8
7702 F20101114_AADUUI martinez_a_Page_042thm.jpg
b7d415076ac2b0dd5eeeaeb1af3f0d44
0d606bc91b567e92b6b468eebcbc3ea51761885f
32737 F20101114_AADUTT martinez_a_Page_034.QC.jpg
35d119faf09da5a387c70343e65bc7f1
4aa1d5f6ab7d5e350f5fbdddda15fa92925a8da8
29670 F20101114_AADTRG martinez_a_Page_162.jpg
5cbbd3eeb010704f138f09173b1313e2
a2a2737b8a2c779f9121100ebca1bb4d8671b81b
55434 F20101114_AADTQR martinez_a_Page_147.jpg
fea28b42d6434e7d774ff28312a0342b
e4604594f2edc6ac3ac067c3c7ba1ad71b7d5a7f
7303 F20101114_AADVAD martinez_a_Page_123.QC.jpg
5b467359198ebfe201ee9e22e94c7ba0
913c338fd34b6e099bebeb5ebbf8af577e586fb9
29399 F20101114_AADUUJ martinez_a_Page_042.QC.jpg
6682205bc7df45f2b5c86149db06fcd8
c31566b027f4f0461a4e53e548ee95e4cf128c1d
8362 F20101114_AADUTU martinez_a_Page_035thm.jpg
7aa3030be15e8044a98b338565e4a14a
c83271b495ad27c842847a145ea13b08fcb20c12
79623 F20101114_AADTRH martinez_a_Page_163.jpg
7fdbd8264f1934f2e7cb45cfab8d0e10
ac24c8a1996a750deb10148162f759d286353c83
152099 F20101114_AADTQS martinez_a_Page_148.jpg
f1f3d28ba140f4eef05b4308951c398c
e361bee0efb6dfa97f3d7cc6dd7fd818b3f0118e
8151 F20101114_AADVAE martinez_a_Page_124thm.jpg
65267bc22b9f8a8ab92894c00119e3e5
b0d911e16bc6ef0f0b80d2fc78b0e64089d25f82
33674 F20101114_AADUTV martinez_a_Page_035.QC.jpg
d7c9ef144009dbf0d9f53e08cb211fa9
a4717b7efffb49b1f8efa4213bdd4f1fe8938f46
37863 F20101114_AADTQT martinez_a_Page_149.jpg
636b570799ce538bf8f42d2b23ba4518
58a7e10eb87eaca66e5dd18d377293fe6b77e762
33778 F20101114_AADVAF martinez_a_Page_124.QC.jpg
cb71637025caa62776bda05d1a5eaf88
30c620fe6d66b65c8ff34467b4c61082419b9bec
8302 F20101114_AADUUK martinez_a_Page_043thm.jpg
d9a3b8e78b7eb94f097970fb2fef3d52
863a18a15ca2eb20250de0586c8766b912ed8aa8
7533 F20101114_AADUTW martinez_a_Page_036thm.jpg
49f600f412e79eaa42abcb5100aa5aba
98911bd8c5bad5c2f071628a0297e128f7200e60
106782 F20101114_AADTRI martinez_a_Page_164.jpg
c1228f4b796052d3df2a548def2d3eb9
c6328d987f55827a6bf4aec8a21b69812a3ad2cc
146608 F20101114_AADTQU martinez_a_Page_150.jpg
a8feef2a5c39f6bb0a216c5bc750c171
44598f9e9242ccd2cb822740c9c0dd4b91c0e118
9122 F20101114_AADVAG martinez_a_Page_125thm.jpg
2c229e6faad1adab907f5c9d34d5a5e1
e005b2a7220483c2b5798e2348c19320b61d755d
8490 F20101114_AADUVA martinez_a_Page_052thm.jpg
b613ba7df119c96b07f373d48333609b
aa4e03d8829e16bc2be3307eee4ba0c11f1fd908
33980 F20101114_AADUUL martinez_a_Page_043.QC.jpg
2475e4555ee1aad8871d8527db3b2822
d2d339e1baf92a3faace5cb604593fa6f43c6a07
28359 F20101114_AADUTX martinez_a_Page_036.QC.jpg
a16e58f7f8a38d5182c7d868a7f486e0
70b11c71110c9b23628285a8387e7c4e0457203e
114142 F20101114_AADTRJ martinez_a_Page_165.jpg
0baf08597113b45be581d99892f19283
65fb4829ba45ca9fe5171e1b087fcfd64bfad5a7
133978 F20101114_AADTQV martinez_a_Page_151.jpg
177c2a6b2c298516af926af3a6d1ab0a
5fece9b17874a5d0b7b326622bf16d6b2d295fb6
37874 F20101114_AADVAH martinez_a_Page_125.QC.jpg
d723a5f51c82386c75f9826780a60879
edd546037b5f14b77be7af251ce142655ba41d5f
33843 F20101114_AADUVB martinez_a_Page_052.QC.jpg
17ba7d7ccd9da8c207f0746c3d500bb8
369c59cd9193ce62622f663e68438d1b2d4293c6
29353 F20101114_AADUUM martinez_a_Page_044.QC.jpg
3016412ae7e6290c43cc7a0c543d4592
f5a225bebf944dec2fc0221eaee078d20ec38cce
8095 F20101114_AADUTY martinez_a_Page_037thm.jpg
42fa86c316f41c7e3da34438ec0fe221
ef942195ef0a948b54b67faffe10e35e22f2ca2a
99071 F20101114_AADTRK martinez_a_Page_166.jpg
eb283ccfaf4def34fdd08ac756800b47
302d78c7f886e40297d72ff23c3489861106bd55
134134 F20101114_AADTQW martinez_a_Page_152.jpg
7f051ae7cd6db973c69f0c36bfe1614a
4e5d749e2a20f25ab969ff3c4c07f7f812ddce0d
8856 F20101114_AADVAI martinez_a_Page_126thm.jpg
4b644c5770e98828caa1301063cf41f2
a698a1c092e51b7ffa1556a075180858e129ed1a
8485 F20101114_AADUVC martinez_a_Page_053thm.jpg
dea2cd87105d31850702eda4fbafc9bc
bf358a4fa584583437ed69bd15b58e3c000737ed
7952 F20101114_AADUUN martinez_a_Page_045thm.jpg
17c95fad38320c4b6bc32e9feedd2a33
1d2f2586601c113dc87cf1e124d11addea8889ac
31103 F20101114_AADUTZ martinez_a_Page_037.QC.jpg
096fec0e817b51ae4aab9ce60d4150ba
be5ed899ef412a1ca868704903f0a11e0c81539e
85435 F20101114_AADTRL martinez_a_Page_167.jpg
9e8beed5858f3415dca73f41a7640a21
3b107f22ad08600810461f87467c94636846e30b
24447 F20101114_AADTQX martinez_a_Page_153.jpg
5a1d954edca66cd23ce8517597b3426c
582ad07354585dbb185e51764ba61495c3f139c7
1051984 F20101114_AADTSA martinez_a_Page_007.jp2
0ce694707f00588b709a2d507924c721
61fdf4993acf3e456e9e0a356644b247bee66a2d
37045 F20101114_AADVAJ martinez_a_Page_126.QC.jpg
1f95fd18d3d161c457cb34f75d0f6d0e
0c9a3add9832f9d3194f261c57096854c571f3e0
35128 F20101114_AADUVD martinez_a_Page_053.QC.jpg
7583bfdc06cb41f5d1e715f16f34ce5e
1623d6fe6f72b77261f9921765a4297e3b5c5e82
6461 F20101114_AADUUO martinez_a_Page_046thm.jpg
8903d77f11ff65ab7018bd6d5255764c
d21bc4c68790a301608c99a240eab1114ab66fa6
83653 F20101114_AADTRM martinez_a_Page_168.jpg
03f8d7a1d8b54c8713fdd7c7465aa29d
9606b6842a27cd49997df421e01d7cf047555309
86703 F20101114_AADTQY martinez_a_Page_154.jpg
94dab15c5aa84bfb5417b67c794d1ea0
2cf6e2a111ad50b4172fcdd0ed18a821a691d3e0
1051986 F20101114_AADTSB martinez_a_Page_008.jp2
de444d33449446124b6fe26a8e0ac434
53beda4c4aef3204ac0866ab632f836b9560305f
18635 F20101114_AADVAK martinez_a_Page_127.QC.jpg
8b46b1bfc555f1ff88a25e196551468d
7f5c51c8582087ac5de0e94feb47841116f3fab7
7698 F20101114_AADUVE martinez_a_Page_054thm.jpg
bf36209a38fe3d1902d609535767356e
465fa39e267513b1cfa68a221823e8cd46255d32
22462 F20101114_AADUUP martinez_a_Page_046.QC.jpg
5ecf7fe48bf11d2dca2c98cfdc797aae
8b6cbfd95ed26fb7d940d58d2cef0deb83a2f922
50892 F20101114_AADTRN martinez_a_Page_169.jpg
357dfbdc291445a423a42dbb6b3103ba
ed757ca7fde5b70dd31d28fb704882019e0899f4
97191 F20101114_AADTQZ martinez_a_Page_155.jpg
0d2a2db8985f95adca76990159194e19
5e70086864ea8d44fde5b0c3866c70bfef219f8e
818144 F20101114_AADTSC martinez_a_Page_009.jp2
c994390913cc33137c4e4a71b366a666
656f11045430dc51137bf3c5406552c0819e8a7c
5548 F20101114_AADVAL martinez_a_Page_128thm.jpg
d76efa0d2e417a78beda42ec1c078bb9
6990606cfd8128d76446236fb6d726d7a2632562
26570 F20101114_AADUVF martinez_a_Page_054.QC.jpg
05b5aacd12ddf51c1fba6d7f012dd3f2
1a711210b791d38f4122b884db822ce0cb0a4e8a
8709 F20101114_AADUUQ martinez_a_Page_047thm.jpg
f8d876f9f515e90b7eac6d4938666bab
9791da4a77ffa286119396030d6b6c10fef3142e
123415 F20101114_AADTRO martinez_a_Page_170.jpg
9d4455041e047de15cbe4f19dd973c10
39793141b4431219219a9abf79880b2734e5273e
1051973 F20101114_AADTSD martinez_a_Page_010.jp2
fb7e2ee3aae935d7b43de179210f9e2b
3983c675e369c3cdb78e0b5b9f4620c9ea3baf1d
2729 F20101114_AADVBA martinez_a_Page_136thm.jpg
afbf649b7db1276a716c00938f5366ae
d0a9786c847b9225d9fb56d7fab1fe6211d74620
20255 F20101114_AADVAM martinez_a_Page_128.QC.jpg
8407cc38584f3f00a41d3a6afed5c7ac
4b65dcb9ab2030b9f81c5f92135f8f9a356bc22e
7109 F20101114_AADUVG martinez_a_Page_055thm.jpg
0e7d8bd0ce7ce0fe3086d74074ab292f
d4675ed3c47077b73d125835e337d1026f8d2f3c
30443 F20101114_AADUUR martinez_a_Page_047.QC.jpg
b9923ae259ee023e740fadd9c6d43644
064f5a1cfd17865610fdd28e2e64d4d450500368
126631 F20101114_AADTRP martinez_a_Page_171.jpg
8d274dd5728c5ccd49c31dff540fbdea
0c32cc7d15aa0386e6daa9b64f97bc5fe502b374
F20101114_AADTSE martinez_a_Page_011.jp2
985f8cdf92e70119bc0f692f1c2552df
16bd91383df6506cbbf524e545a3a0a138572f23
10143 F20101114_AADVBB martinez_a_Page_136.QC.jpg
04fe5ac969d55c0161f034f41d316ede
97b08e09603bc84898dd1c4cfe4e2222ebf85b1b
6972 F20101114_AADVAN martinez_a_Page_129thm.jpg
106cc26474e22545082684faa85fa051
0ccbb5bdbf5f06cae4a4f52b57d26ed3a13d0ed7
F20101114_AADUVH martinez_a_Page_056thm.jpg
82b800a483bad54fe56b6c9c5335f97d
f9fb19b65a9b80e9383a89a0eeb3c2a545e28fdf
9156 F20101114_AADUUS martinez_a_Page_048thm.jpg
f84a36666bc4f8cd73afd4ab65b8cc6b
43e926a7377ff924f58700e4e989892d8d06cde0
130833 F20101114_AADTRQ martinez_a_Page_172.jpg
e4c342273f0834b020c93129c2d4778e
ab0a5793026d008a6690f41fdc31327444597479
53494 F20101114_AADTSF martinez_a_Page_012.jp2
1e96059c8b5bcb09fc876eb63fe7c8be
10220b6e37ebf693b4e4f72d0a43591f96ad095f
5258 F20101114_AADVBC martinez_a_Page_137thm.jpg
15bbc1730544bb878930507274ddc5fc
85a83dbdbb3515a26b89ff5d80908a653428ad5d
24615 F20101114_AADVAO martinez_a_Page_129.QC.jpg
6e0e63a0c2ac2df072cbb87c5646429b
dbb1e11ff3c175612452b135935a92eb067d3f04
31924 F20101114_AADUVI martinez_a_Page_056.QC.jpg
03bf479a710fdd30f26e20d3263bf53f
2a635f2f90ab349490e0f5465a442ede2fad26c7
38135 F20101114_AADUUT martinez_a_Page_048.QC.jpg
dfe5e00c20461798dfdfdf90059ec47b
5af9b2daadfc652ff5b10235d2035e9a2a7e98ca
128054 F20101114_AADTRR martinez_a_Page_173.jpg
79b039f60fb78b10d5a32954f8fe4948
5dc2c0978c1d2195996d0d448bb3a8f556848e80
16892 F20101114_AADTSG martinez_a_Page_013.jp2
04c36256b9b997757a570e7dee090079
c95cfd1235872a4fb8c250fd34f847e222ddf1a2
21290 F20101114_AADVBD martinez_a_Page_137.QC.jpg
8e8d2068069e2c879c9e601b259e3f78
f4dc4d3890b513813ec2ca1ead3953de44e5228c
6899 F20101114_AADVAP martinez_a_Page_130thm.jpg
e9fa38f706c0919315391cc78a2d4dfd
a8ec1d5616ad463e3dc8093701fb5782ece98120
7821 F20101114_AADUVJ martinez_a_Page_057thm.jpg
bf0209d6804f0b3e04231789137d0e58
1ab3c7c37ab49ffdb0cfe11caeefc496de1d0067
9070 F20101114_AADUUU martinez_a_Page_049thm.jpg
6fce19d98ef7afb821ed7f211a6094ab
1b627b3f591f3581e872fe8992d60ed9fb0588c5
95214 F20101114_AADTRS martinez_a_Page_174.jpg
30527505ecf5b28783a3b196c65e07a9
3b95e3f6ef92df33498f2d586b6d8d366432a830
93459 F20101114_AADTSH martinez_a_Page_014.jp2
9d2883bb4b2bee8488a7ea6bab3b2df1
68a153315c69147233f2925e9496108e1e6a22db
5598 F20101114_AADVBE martinez_a_Page_138thm.jpg
372e3fe09a172882bb36fa516de905f6
bc7e0a75f6b4ff06c7539ba78eac3719a52b86a4
24650 F20101114_AADVAQ martinez_a_Page_130.QC.jpg
7c7f516083ef2dc825e9e053cadb8aec
63b8e3ba42abd2985087151ca6cb23411da1c62a
31153 F20101114_AADUVK martinez_a_Page_057.QC.jpg
a7d043852d3fac5142a70343d89bf9b1
bf4223c50195f9f00a5b28f22ea4ce576e11d3c0
37165 F20101114_AADUUV martinez_a_Page_049.QC.jpg
d74006fac81dc0356f31fc4f57b6b7cd
259d10c185bb962173936da9777b9d8e57acce08
63678 F20101114_AADTRT martinez_a_Page_175.jpg
53ef7f5f2f6a7251d31a607a78d98464
c5011ff3aeafe6e0b7926627513f4a8bd32f7cfc
105066 F20101114_AADTSI martinez_a_Page_015.jp2
38bf0caec301889e256ce3a0917ce59f
e86ceb66ed61a49729164f5687c1641054c560b1
22263 F20101114_AADVBF martinez_a_Page_138.QC.jpg
a1439da76521b91f61750769d2df1276
4402e963246b8ed9b0381476d4df1b7c062da925
6179 F20101114_AADVAR martinez_a_Page_131thm.jpg
8ee36de203629a2e9c83ddd10519c173
8aee850248565b00f4b9fcfba2b5f816afaa9c75
9038 F20101114_AADUUW martinez_a_Page_050thm.jpg
60617269f872aaa968fe86c6d9384d05
b4cb601f79fb5f75f0aff8dc12e95802ea96abd3
25411 F20101114_AADTRU martinez_a_Page_001.jp2
2db0a7ac58ecb970907077023a9e4a47
1b57038236555c949568f43717d5fdaa8121065d
2038 F20101114_AADVBG martinez_a_Page_139thm.jpg
fb58f3a026abd5486348a0491b731fb6
cb56c1054b2adcaedc9dcd5e5222d0175320134c
20341 F20101114_AADVAS martinez_a_Page_131.QC.jpg
ca958936728909f6add4edbbf2dae5a6
678c5acf4a0bb9c02af1023f327eb511c075a7c3
8924 F20101114_AADUWA martinez_a_Page_066thm.jpg
d7fb2777abc555669b134ea5f7721aeb
8b01593b76de142ec5454c856984daeb1c98aa9c
8187 F20101114_AADUVL martinez_a_Page_058thm.jpg
11f687c58e26920b2082cca5c40ce58d
9c98c8e0e5ba0881d5eaf8e788a430f088798bea
36388 F20101114_AADUUX martinez_a_Page_050.QC.jpg
c2132fe04e446a328084d23f6f1b8a21
5c2214ed22796b1bf947d836bf465899041f56a3
112057 F20101114_AADTSJ martinez_a_Page_016.jp2
fba997be968bddbc66aefca32fb8be5c
c94618503189c04c586937fcf3873ba162a17dc4
5613 F20101114_AADTRV martinez_a_Page_002.jp2
ebd2e4e4b5105ba85be7aae4d707e73f
442aa04e8ca78017e71f4db58a6f46a09cfa92d4
7652 F20101114_AADVBH martinez_a_Page_139.QC.jpg
6819945f29b25da6d73ef046b5eb4790
9a8e04aae89191534692ccc5380f5810ad46ef80
15172 F20101114_AADVAT martinez_a_Page_132.QC.jpg
3a9cf6707dade3de6af04e25af690a1e
18fdd203b1b63f659193004ea51f96264e1aa423
37458 F20101114_AADUWB martinez_a_Page_066.QC.jpg
d7f95cdd185fe933f071f8356301a67a
2df5cecf81d2d00f010b70d372c43c2e94045574
5231 F20101114_AADUVM martinez_a_Page_059thm.jpg
bb6de3ec9ff4dbdf0b64c145680dc0aa
4c32646da73835cbf37363a7883d4a0acc8a9126
5026 F20101114_AADUUY martinez_a_Page_051thm.jpg
ab5186942f42e539d28f2238fbc610de
7ffb4676e3d0c76212c542806676e52e1fbb3144
113894 F20101114_AADTSK martinez_a_Page_017.jp2
d27096df5ab114cea7ccafd89eae74b7
b152737ed21c90d560fcfcef00a09591ed19d680
4426 F20101114_AADTRW martinez_a_Page_003.jp2
26c9cd99843bf6b7b84387cca28d9533
9ab6e3a2c3698d1ad084d4debeef342bb1852d0c
7626 F20101114_AADVBI martinez_a_Page_140thm.jpg
ee60d6345eeec9f65fcdd0c3484a98d2
af24ef479802b22282a3cb19960760c804ab67a7
4478 F20101114_AADVAU martinez_a_Page_133thm.jpg
96bc5a3fa6660e493f5854e1e28b204c
185db576fa365b0867dd078d6b69dc340983a582
6915 F20101114_AADUWC martinez_a_Page_067thm.jpg
25fb313cb180386bc369e92c4541de38
a839150025611982f9f1825001c04a91f39e418c
20954 F20101114_AADUVN martinez_a_Page_059.QC.jpg
e75e228308d8acb2dca9a8df91103bd0
1ca64138fc3ef69b275c29c6ffe7b9232e291219
18919 F20101114_AADUUZ martinez_a_Page_051.QC.jpg
8e6f0f057ce0fbd6bbf8a5744324c8a1
cb68b21e8a551aa16c9346668263ee39370b041d
106296 F20101114_AADTTA martinez_a_Page_034.jp2
5016566d4b1c5c91cf36fc6efd3782e8
f564fb4d99ee087ee52895d0a21e9736cc7f72d6
102483 F20101114_AADTSL martinez_a_Page_018.jp2
475c909fc130c15de731a371c95cd73e
bf87208a2664635eaad526d5766d39c67821cd5a
39816 F20101114_AADTRX martinez_a_Page_004.jp2
4b48476e0617484de9b9792c9756be89
2bc0173e8e2a1d8a206fb634cc7d0eabf25024df
8560 F20101114_AADVBJ martinez_a_Page_141thm.jpg
df37841267706abfd23cdd892010efb3
710e24a69d720a4c788a172e4a5cf249a145cbc1
28787 F20101114_AADUWD martinez_a_Page_067.QC.jpg
cec388e95824049501ebc23685ff7ed4
dbe3c2afd2cd2af9d56b854a1cd86a341dd97fd2
5639 F20101114_AADUVO martinez_a_Page_060thm.jpg
379f73ecd70d3938e1efb4d77af3bd48
6e2500ee778fb1b0b28021cb903e9a1b0ece90b1
108354 F20101114_AADTTB martinez_a_Page_035.jp2
103f37b2a1eab61c29873b621320d100
6af31da2e82e1d86cbe6cdce4da0ac706fcd8795
111123 F20101114_AADTSM martinez_a_Page_019.jp2
abde1495a525625600ca9d80788706c8
7ad28932ec313bc8deda582082dd5e3396f18beb
1051977 F20101114_AADTRY martinez_a_Page_005.jp2
4269e206a34cee8533d64ca76df561dd
2587d32ffc1c962df3b94c882580ceeb5d0496c1
37081 F20101114_AADVBK martinez_a_Page_141.QC.jpg
1b83bf739fad610a538213456b522cd9
36ba2b5c5609261f7d3703d5ce3463e10ab49fcd
15387 F20101114_AADVAV martinez_a_Page_133.QC.jpg
cac9cfbe62d5cf8b052e70a52c39e23a
a35ada8b0cc2a6655c2f3b5d7cf15399a5cf47b4
5566 F20101114_AADUWE martinez_a_Page_068thm.jpg
262c30e2eceb67aaea06f4fd7ab60f83
745a309a1260e7333fe7b0604e548e3ee474e5fb
23011 F20101114_AADUVP martinez_a_Page_060.QC.jpg
205488f930184513e221cbfcd454bcba
6704ea90f1efb49f60c1eedb122931fd98a8ebae
93765 F20101114_AADTTC martinez_a_Page_036.jp2
7d1b9a057b690a182a55c71795ef7332
5f434e453ec940b2343d44b7f02c9c6013da084b
50032 F20101114_AADTSN martinez_a_Page_020.jp2
a002ae0b1a12dfd0511b3ef56d31a822
7f02c8881dbd105f66ad5ef8c21a859be7f25274
1051964 F20101114_AADTRZ martinez_a_Page_006.jp2
a03bdcf10b1e2746cc09b82323f6844a
13e5d78bbda03c23410a52935c81d92d4261efb8
3238 F20101114_AADVBL martinez_a_Page_142thm.jpg
6d6246754de4a7a79cb45247acff3526
d4a3b5a8781c8e22167e315a580968cb8a13a8d4
5658 F20101114_AADVAW martinez_a_Page_134thm.jpg
bc36d07c6e89cabccac8e627de578b5a
68f9586e61643c681b8be269c74f81ac6d745df5
8212 F20101114_AADUWF martinez_a_Page_069thm.jpg
86d9cdc007d4c1348901c9a963f64f3c
56ad50e080ae37444306ea2801028839d5275a6c
8500 F20101114_AADUVQ martinez_a_Page_061thm.jpg
681c11a3873aba6abb88b4c8f6859981
cef58a3453df46c8ac19c554710142705b536b82
102999 F20101114_AADTTD martinez_a_Page_037.jp2
b50c02f9beecacefac10ceb46fd58856
01baa44e04f5e161d9e1b1e1d8f8f7f426017471
110585 F20101114_AADTSO martinez_a_Page_021.jp2
c3240928fcdcfe3fd88744704fdbfa49
8c87875eb43fe302cd63e1c7e512224fde20ffde
11102 F20101114_AADVCA martinez_a_Page_149.QC.jpg
18ed09b9de3a8118120acb8521f7176f
0fa015f897ccb438c25d65f1acfb2398727e20c6
13420 F20101114_AADVBM martinez_a_Page_142.QC.jpg
09ee1146f3e0173365e4f1dfb775a049
1d8c2e6af74f3ee046a0dd499d6a90a37498455e
22472 F20101114_AADVAX martinez_a_Page_134.QC.jpg
8d36a51f81c6f554ee3f1eb41f9a41f7
7565243cbf0e17a59e33f1957f27df49c05386fa
32787 F20101114_AADUWG martinez_a_Page_069.QC.jpg
302d693ab247bdc580efa15e2ff19ac4
e4ce796f00da442232f3d6f6a30fd217c2f743ee
34864 F20101114_AADUVR martinez_a_Page_061.QC.jpg
a1a1ed9adc38db7762d9794e7ce96f49
360ef23a0a60862e2e2535de0bbd7dd7e18a6814
85874 F20101114_AADTTE martinez_a_Page_038.jp2
1db88e71e1d01f435da9a90bf0cd642e
0119025778af74e428361c79df7dd17da0fa2b9c
122060 F20101114_AADTSP martinez_a_Page_022.jp2
aa254e32bcad38b64e3664e66ff48cdf
a3c3ccb1ed820841bef3bc224b2a5cda9045d3de
9267 F20101114_AADVCB martinez_a_Page_150thm.jpg
9da696a82a6b85d64f3916f7c7f92ef3
6385f88936bf7e69c23232bd751993fb505926e5
4330 F20101114_AADVBN martinez_a_Page_143thm.jpg
93e3314bc51a37c42ba6bfb3f1db55b1
10ac94ebc1d49ab683f02bda778ac14b66ea57b9
5040 F20101114_AADVAY martinez_a_Page_135thm.jpg
7d6870256412cf6b779170a37f309a4d
7f7f8ecde3d57178ea39cb90b917400b36e6c797
8902 F20101114_AADUWH martinez_a_Page_070thm.jpg
3a5f0911aa67be3c3b20394fcdba7ac2
1cca3e502bbca73378c45ded586a73f79f4a0471
9226 F20101114_AADUVS martinez_a_Page_062thm.jpg
73ccf1582e2a4932049c08c459a1569a
c7994daf6e13c563fffd9f4b4c9dd37cd7eab76d
81279 F20101114_AADTTF martinez_a_Page_039.jp2
23adaf1b48740593bb72b10c1a8b8532
e4f98f82c5d778dd4799081d5dfdf9810195e6cd
116978 F20101114_AADTSQ martinez_a_Page_023.jp2
896764f04d1276b0c753454cd2bb3d6c
7bbe9031750200618af842f688de39a40a9bb261
38741 F20101114_AADVCC martinez_a_Page_150.QC.jpg
7e8eb5fbed21bd44fb664f72ac26c155
308ac2756b482ce4ae59b8efa33988578c2ac60f
15841 F20101114_AADVBO martinez_a_Page_143.QC.jpg
2f16fe70804b8447562a81a162d8434e
bad8f0301e69251b9073cea69dd5877134bee612
19628 F20101114_AADVAZ martinez_a_Page_135.QC.jpg
6f980035a3dc44c26f891011cea53d67
ee18213e5318de0f4a9eea8e11aa8fe639e1cee1
35719 F20101114_AADUWI martinez_a_Page_070.QC.jpg
7898b6200e6f171f75a2a44d10e59b53
b5bc97beb3cb6ee063bcabd733401b1a367f99f2
37475 F20101114_AADUVT martinez_a_Page_062.QC.jpg
268be426742e9f2c6749244b8d7a5c62
f1e937a6d2c58e9f6afefde6487d6b572866bd3e
96345 F20101114_AADTTG martinez_a_Page_040.jp2
a37ec883c5279dfd8d05db5994153ca7
3dbc5ea5367d4d7c510ce119fb64519ec17cf4cc
55559 F20101114_AADTSR martinez_a_Page_024.jp2
5422da23291dfe6f862802fd4b2ea5ce
44c82086b6039d81daff00a385d38093bcae52d5
8379 F20101114_AADVCD martinez_a_Page_151thm.jpg
c06d6f15f3105fba30ce8face1e70d0c
8544f25415a1eeaec82112a7e1d4b15067160e9f
7810 F20101114_AADVBP martinez_a_Page_144thm.jpg
3ed7da18daad02e9a9014ec95770466d
9ac0f4e803e665f758375e69df8a2cae16938499
9445 F20101114_AADUWJ martinez_a_Page_071thm.jpg
ff024037fe8ee9b74f4f0d1ec3362478
b02f177da1ec963ff9a0756f342f37af7112f6f5
8524 F20101114_AADUVU martinez_a_Page_063thm.jpg
35cec9141748d06546a8dd652a8b9bc4
c399043dfdae7d22bdcca3be66bec398bef3371e
113194 F20101114_AADTTH martinez_a_Page_041.jp2
db02a82fba070350137f32e49286c7ba
16d0290984221157a906031372b0bf48d38f38af
105247 F20101114_AADTSS martinez_a_Page_025.jp2
3f043195fc801b2a18b411650ba31da6
d1f932e32aab969f99d0bec726874df741ff8e8b
35434 F20101114_AADVCE martinez_a_Page_151.QC.jpg
6a8e1ae1dea594227ba4d18329c70b30
9f17b7130858b1dd62ea60571b26929a0d32a4c1
33954 F20101114_AADVBQ martinez_a_Page_144.QC.jpg
814ec1be0c23e442714a06d46fb0e239
e845e0eda2198f751f35fb632ce8a693d7a3633d
41772 F20101114_AADUWK martinez_a_Page_071.QC.jpg
d4c63b740a6d1b175dd680b775a1fea3
55560bb578f88f2a6be0c3d794f56141c1032d53
36615 F20101114_AADUVV martinez_a_Page_063.QC.jpg
3b44c05643b349c6c997a054ae3a88d9
7d1884cc0d8818a89156d5193334f96b57d10e6f
94318 F20101114_AADTTI martinez_a_Page_042.jp2
679dda47f342a5869dc40b7918bbe421
a98cc9a73d7be7f9571eaf6528718d5bafd0e533
114348 F20101114_AADTST martinez_a_Page_026.jp2
2d13dd5a1e67e32279c5b4c72f47da67
9dd01f825e94071dbdfa3e631fc1237778ed7691
8720 F20101114_AADVCF martinez_a_Page_152thm.jpg
d7b26d3a15598c997b7971aad05ec0e4
fd30dd9717daa45b1a89f1069ed497e456b4c2a3
4489 F20101114_AADVBR martinez_a_Page_145thm.jpg
31b9626a93fea36ba531f2f0422fd7b3
1d07de5c002f54806e848adb6388f87c9eb4e3e9
8973 F20101114_AADUWL martinez_a_Page_072thm.jpg
3bcc0463f133cbecfe96b79ecdf69c6a
cb7ace2b5731a5dabfd51fcc6838de45e6779d61
8580 F20101114_AADUVW martinez_a_Page_064thm.jpg
1e754d26a8a5f72f88072583029a6eb2
548b7180b9616ab3c2a344284465d5551f4d590b
108682 F20101114_AADTTJ martinez_a_Page_043.jp2
755224b189ec34942e21419563b9f637
b1ecc671de49484ed6d7d4b09d4e3d737b939e38
108466 F20101114_AADTSU martinez_a_Page_027.jp2
a342ed173ac3bb62ef35d2e7827213b0
f2cf32b15cab9ed0a723f2c6505eab3447845fb7
35947 F20101114_AADVCG martinez_a_Page_152.QC.jpg
62bbb74e4ce7fee2657cc8f0e75f9b9c
4092f01d08f96980e214d168d84a026d62267d60
17860 F20101114_AADVBS martinez_a_Page_145.QC.jpg
cbd465e2f9a901a068943640b021ee2d
c6f2a3c51dc3d2505f94cb3ec992ee1025543ea1
9147 F20101114_AADUXA martinez_a_Page_080thm.jpg
e4b1a99595636da2a5138cf1e6ac91b5
ab41003137a72ee07d34de682b59552fa735578f
35320 F20101114_AADUVX martinez_a_Page_064.QC.jpg
8887d8a68e2c1adf678f6de0c36275a6
bb26f9b247be3c0ca707378999fb7ff06ac44cd4
117954 F20101114_AADTSV martinez_a_Page_029.jp2
073676b8a1c48b497551be2c909ae55e
fe5509d5aa898311f4ee789375511a12800ba35a
F20101114_AADVCH martinez_a_Page_153thm.jpg
acef69f7bb33d00c31360200d6ec3882
4fc720675a1b1f85f52e83519adcd71c71a1f3a1
8845 F20101114_AADVBT martinez_a_Page_146thm.jpg
f7b3389d21986bc2440ecec169b1155c
fe43f0e5ad4b8e22dd6bfb994f975b0a2a747809
36398 F20101114_AADUXB martinez_a_Page_080.QC.jpg
4ec1e4fb2163e84d9e538172ac0f0dfc
6a7639d3831270dad893049bdb1c79683b1178ac
36641 F20101114_AADUWM martinez_a_Page_072.QC.jpg
be0a9e00a697873d1fea533e20018531
5b3bc79ea82cd9f3ef61f2401cf639c1eff72074
8680 F20101114_AADUVY martinez_a_Page_065thm.jpg
23fbe1296c2f17bd941a68c2332dea87
6ec82e131ce666c0b3892c1e85c1aea7dd2eac36
95732 F20101114_AADTTK martinez_a_Page_044.jp2
6f7e4d07327949826f2d2f38437b80d5
e06ba2133c744fa303e26453c4cbdc9084e4e7b5
107964 F20101114_AADTSW martinez_a_Page_030.jp2
d1e2820f7ead75d95002c997fb56d3bf
00d468aad03ec3413d8ea07879368f10edc7ceab
7043 F20101114_AADVCI martinez_a_Page_153.QC.jpg
366b98edd9b1048d0ad7c2724621fc9f
1b2a543d7cfd8aaeb31f23461cfc954cd0d7ab46
36586 F20101114_AADVBU martinez_a_Page_146.QC.jpg
240deff96a34f318ca9cd757909c322e
e3d053eba221daefa625189e2d5b873b4a687f14
8745 F20101114_AADUXC martinez_a_Page_081thm.jpg
05c071f22d6652931172369cdf1d0644
63eb2e48fa40a3f6968fa3081170ecde686f8249
8743 F20101114_AADUWN martinez_a_Page_073thm.jpg
463ca1a04c56fa6e7ed27095e79e6a91
6ffb99bef9a77ed38b082768fa6ae5f79bece97f
33373 F20101114_AADUVZ martinez_a_Page_065.QC.jpg
0552940dfffe5d85eee2db7009aa45d5
52b254bca3be4e6bcb5016c0e0f18313d0e9a6d2
105118 F20101114_AADTTL martinez_a_Page_045.jp2
bb0ac072eb09e792e32fd0796701fe51
f40d1b2492404ebaa09aa7e836ca217ee54ee8b2
112086 F20101114_AADTSX martinez_a_Page_031.jp2
849ff1fb6b036a39d7629f0f2448aba6
04aaeeb30e2d47b3bc2756609cb663c87ae734c9
112880 F20101114_AADTUA martinez_a_Page_061.jp2
7688256cc04473afba667cbbd95dc74f
ede49283a94c2740c1ce58fefbd2796cb2cd76e1
5767 F20101114_AADVCJ martinez_a_Page_154thm.jpg
708bb291c968d9e204f5cfa66712040e
545a382a4bdb99898f9057a8e0e6a2296e89727c
3494 F20101114_AADVBV martinez_a_Page_147thm.jpg
42c1c44ceb821ff6853e85923c347c7b
b3a87a5cd0b11d696c81552bb9ac965be7171459
34669 F20101114_AADUXD martinez_a_Page_081.QC.jpg
169f9c69c40e9c4afaf3e94006bc8671
f82834cdfb71bf285dd72c98f64c1557a818f370
35688 F20101114_AADUWO martinez_a_Page_073.QC.jpg
88605e358d35cac4af8232818e5b066a
3430d4bb29c0cd5ffc6ab61469c8bab4a9e666fa
921333 F20101114_AADTTM martinez_a_Page_046.jp2
c5f9a1de78852e1a904dc3bf0b2b2958
0cee797aac985c3954e64b0c68b1372a0daeaa89
120417 F20101114_AADTSY martinez_a_Page_032.jp2
2a89ad3ce004313deaa49dcdef167b3f
80747056219b42ce8873c87943ecf547de3474b4
121855 F20101114_AADTUB martinez_a_Page_062.jp2
88b82d574798f3f81b80f57351af9992
2cbd437a047f5b3ee0f0e26fece5aade66a4c39f
24215 F20101114_AADVCK martinez_a_Page_154.QC.jpg
fb09d1f29fab4ab12f4731ec835b07dd
82d1070ad0b733428ff90f6b0290af5927dbe0f3
25642 F20101114_AADUXE martinez_a_Page_082.QC.jpg
0ec7bd836f30ee89f672c655f99d712c
33b7c8c29766abadbd7372e12d677e7f4638312c
35412 F20101114_AADUWP martinez_a_Page_074.QC.jpg
2724a56d175be986e6ee77353d87445c
80ac989316cf42becc61f4bf84e4d9c8e21ad51e
104371 F20101114_AADTTN martinez_a_Page_047.jp2
1029bbd4d0dbdbbd4da7f157067424f8
3b517f0374a4aac0c485077bf44615d854feb3b8
629772 F20101114_AADTSZ martinez_a_Page_033.jp2
520f6b4370fc716853335bdc3da2f6d9
5b1d4a07d9790eb9cf896749206539ed26eb52d8
112863 F20101114_AADTUC martinez_a_Page_063.jp2
68157ad6cdb3d140990f930c12aed0d5
d144a7271a91cf44c4efbfb97615b03174280aad
6754 F20101114_AADVCL martinez_a_Page_155thm.jpg
2399e73b52fcd3c25e86663cf6c2a6aa
7886bc9591721e30b499bc84c63bbdf9c8576e80
14610 F20101114_AADVBW martinez_a_Page_147.QC.jpg
c6530bed77f06c847fc40fe847418cd7
67c7a087a0e10d3bc4545975d72857b826605b84
6606 F20101114_AADUXF martinez_a_Page_083thm.jpg
bf69118e52553c9c3c1dbb8c020397b2
6a87758540a5c3443817733d9958b955a20afaf2
7943 F20101114_AADUWQ martinez_a_Page_075thm.jpg
7d2193d5ae31e52feaae798d7993cea0
d6178aae3075c0d70b30dcec8b115ca9129267ef
127778 F20101114_AADTTO martinez_a_Page_048.jp2
f2b80f9d6a7f530bce8b47e771261286
cfdbbc6973124d36baa12979b9f86ae3db9eb25f
113387 F20101114_AADTUD martinez_a_Page_064.jp2
51e853d0b3bfcaee373438b08163d847
aa2e6a0cf7d2f3f7586b316c67cd89a15a67079b
5777 F20101114_AADVDA martinez_a_Page_163thm.jpg
c2e6e6518d8161682722da9168882362
6e2691c13a0b2136d43b5cd2ca5ff236010182f1
26517 F20101114_AADVCM martinez_a_Page_155.QC.jpg
997d5481c4b9f4e320c5bb9c34281e08
7e306b311d87cd5c1c7b5f510038b14a80e96590
9274 F20101114_AADVBX martinez_a_Page_148thm.jpg
794ba7cb5e731537e79418ad6b98d170
22fd80854285178fab499e4b2884f3139a29e3f3
23653 F20101114_AADUXG martinez_a_Page_083.QC.jpg
4e935133b2ebb9cc6e170dbe1e23c613
3eff938de667f76b9132b9be3eaa68eb6907621a
31431 F20101114_AADUWR martinez_a_Page_075.QC.jpg
b8ac289cf28248330bce0bf34ecd69df
b0d45ef1000e8284d63354c7910faedf90e149f3
119270 F20101114_AADTTP martinez_a_Page_049.jp2
01ee499f6f73e167b75f926e42bf3c75
5efa1dab24234c7ae5cfabe0d3ddd45c40f1a75a
111913 F20101114_AADTUE martinez_a_Page_065.jp2
3cf637966af2dd6baeb60d429aa123c9
2cdb4fd34aca852998d4a7371facea5fb8e99e5d
23599 F20101114_AADVDB martinez_a_Page_163.QC.jpg
449cf3523e44a1dcc39c2c14f65820a2
30ef17bfc74ee9bec6ad5ad40d276e9b5f51738f
7177 F20101114_AADVCN martinez_a_Page_156thm.jpg
925ccc3184cfa7d868a14044e338b1a7
51f57a7b24b9138e913f715b68fa76b966908d58
39078 F20101114_AADVBY martinez_a_Page_148.QC.jpg
0c54045393ca3cbf5d8fd90bfdf9ee24
c4c00b642316ff4473059e191264ba70b182834c
9080 F20101114_AADUXH martinez_a_Page_084thm.jpg
177e4ad10d59d0c5ac3d4799db7a647c
11f059e27febdea702f9c8567eb8f4e866b189e3
7728 F20101114_AADUWS martinez_a_Page_076thm.jpg
d9a4c1fe21b08036509028f72bf21d0e
e94a3104828f9823f92d2febad1955f0ef9243e9
64032 F20101114_AADTTQ martinez_a_Page_051.jp2
88a8f7d76044b1653a6c5b24f3aa7ce9
8ad57a9e67762e436336fc0a16a1abf122fea1d7
119516 F20101114_AADTUF martinez_a_Page_066.jp2
ae3d8b38b808a7ed1e5ccbb12975f39e
13cc8e27aa057473f6388ae3de808f5f5dda1e5f
6609 F20101114_AADVDC martinez_a_Page_164thm.jpg
8d8123b97acc2d16c45161aa44083b24
dce0e12e3466139a35b3ab4b1b7640a0c7dbbce1
31666 F20101114_AADVCO martinez_a_Page_156.QC.jpg
987084443a5662dbdbf169716808db51
586188d7f51bdcd076932174a432aff95d08e16d
2925 F20101114_AADVBZ martinez_a_Page_149thm.jpg
17ab041a34331f5495d22249f38ab362
57db51ea736a88ea01597df87047dd78c50a9ad2
36895 F20101114_AADUXI martinez_a_Page_084.QC.jpg
e902bcf01f23305cdf55bb666e761562
63ad7e76bf51965769adb48245d14b6adc9c6f97
30216 F20101114_AADUWT martinez_a_Page_076.QC.jpg
400acd40898fbd32ccb3dc2c7688dbe8
384459b7c46557641de8e0443737924a89ff3506
109438 F20101114_AADTTR martinez_a_Page_052.jp2
f0c549caf06ab52a26928f3f71cde09b
66edb7f4ccf25721957bd99be6468e26d1f54a2b
F20101114_AADUAA martinez_a_Page_050.tif
7fa7d2458c0880887f4b6c9f2e3fc3fe
e930f864c5ccf8fb9551c06c88026c0e15410a61
90269 F20101114_AADTUG martinez_a_Page_067.jp2
0f784b89c06df5069beefc199fc1ff99
a8b4c6a99c950e1514c415319d1a69d16bab465d
28744 F20101114_AADVDD martinez_a_Page_164.QC.jpg
caec23cadbcb4565e22669ba95d76806
1d253da1321305ba43617af29532722370b2671b
5855 F20101114_AADVCP martinez_a_Page_157thm.jpg
cdb26549056b07e49bf45507078d006b
70a8fa9336429683b77704ff082315baf1c3400f
8543 F20101114_AADUXJ martinez_a_Page_085thm.jpg
fdc4e452713ec90d1cbc2a3d6760d360
8c8e6107ff1540b4fd02f95d004b09f6839f7849
8292 F20101114_AADUWU martinez_a_Page_077thm.jpg
981863a82030feb4f1f9f84ded545970
b22c6c273615abf044ff8aa6792e53e0f41f6863
111093 F20101114_AADTTS martinez_a_Page_053.jp2
494b98b1313b3e5465dcc8aeeca1ceab
c792f00d6bf4d0f1af717841f2390a9c388d6d78
F20101114_AADUAB martinez_a_Page_051.tif
2eb8d029829e2944ff7445be35d15dde
416a0e35c78fea887a7ba60be042822e2f38f469
703985 F20101114_AADTUH martinez_a_Page_068.jp2
06ea6ecda9fc384666f3fdc39efa7254
b1afc667f6215d308233e41cc32afb62ed2c5d91
7062 F20101114_AADVDE martinez_a_Page_165thm.jpg
b8f8100cf18196004fabcc095b018fad
f3cbd080569c05e8db449b1d28771b53ac8ea26c
6350 F20101114_AADVCQ martinez_a_Page_158thm.jpg
c7122aed7b48d63aa0a30f12c301cf27
e9b54bd226e6375b06849b478eab6e7206542777
34268 F20101114_AADUXK martinez_a_Page_085.QC.jpg
f5adcbd6733b4f006ad8cd524e968472
32143a5b35163319621bf7bb1e6a7e33ef3e8f0f
33325 F20101114_AADUWV martinez_a_Page_077.QC.jpg
f39ad513b7e84afb1ff6865b26475460
41436f2594cbeb2936b16c80063b944e2a89dca8
985556 F20101114_AADTTT martinez_a_Page_054.jp2
1c074ba199774eb4807e34812e4b4756
70d4642529f006ec249b604a5ed24b8646d02e38
F20101114_AADUAC martinez_a_Page_052.tif
815171125a821275c10618e811059e56
dc62c00ed8cc81e85629b6cab08071d413d9d323
107673 F20101114_AADTUI martinez_a_Page_069.jp2
2f1c11435a251075c503d513708a0b8c
76e59ddae454dd2eaec095847382257f47445755
29962 F20101114_AADVDF martinez_a_Page_165.QC.jpg
87eaff17597d54a6ae823dc2ded80ce6
a75a31af939fdb9ea657d34414e49cce346d0046
25685 F20101114_AADVCR martinez_a_Page_158.QC.jpg
55ac4012e8c456b3fac96703010db2f3
f4a7adecaab89e1ea32fac6d941f3b9b18f03ec6
8531 F20101114_AADUXL martinez_a_Page_086thm.jpg
bfb9f1830b2b0abf6307ab9626872500
34ba894b2f5f4d9313b4e2dc58faa108ddf8aa68
8442 F20101114_AADUWW martinez_a_Page_078thm.jpg
67372fa4cf08d74d70d8ca7980b3f82e
3823e994edf24282ae61e395203944a18fb79464
89662 F20101114_AADTTU martinez_a_Page_055.jp2
6826c3a58f608946458d2a8d743e3824
d5bd7c15531acfe7a943f79176f7ba876fe32ca9
F20101114_AADUAD martinez_a_Page_053.tif
18896fda53dec058359b953cbaa36b14
a26bac1c5acb879acdc60a6d73160f5df0c32966
173452 F20101114_AADTUJ martinez_a_Page_071.jp2
006ac9c04c7c792c4091ec11c052304d
0b46686b85e7f0bcd2fbdb594518e406ded6a355
6868 F20101114_AADVDG martinez_a_Page_166thm.jpg
a16cc21395008571904abe40156b4ea6
29f865014124b9259a7d7814e5797b760a6761c0
5400 F20101114_AADVCS martinez_a_Page_159thm.jpg
3e85a8f0496672f26bc797142098d134
705dd0e4d3325e3126765050fcd84148e07c0f59
3886 F20101114_AADUYA martinez_a_Page_094thm.jpg
9ee6b1d0421b6db5f80ad50c9a78156a
2e0c3bce99014b9208532a7e6bb1de8f36e72b2a
34846 F20101114_AADUXM martinez_a_Page_086.QC.jpg
0430c30071ffde223e83ead66b00b8c2
aef3758b0f43d16311b8fe6849b2d749702fb1cf
34515 F20101114_AADUWX martinez_a_Page_078.QC.jpg
d7259e80037ab130c8bf58ec397d8207
34501766fb8d587374d0c6e67cfa117c0cd0d7c1
116833 F20101114_AADTTV martinez_a_Page_056.jp2
c4fd8ace2bf3513973905d04d0b4326c
5d2496df241945f7455487d07e9ee53fa8e6c841
F20101114_AADUAE martinez_a_Page_054.tif
c259f89a2317f8bed44f343536b2a66c
25b04ba945915b1250809c7cc6e1f47dddd799cb
128993 F20101114_AADTUK martinez_a_Page_072.jp2
95a53cd90eb0e01ea93a980cd8ff0861
834131359a8fc8c446257706e49f7f448d335b10
27518 F20101114_AADVDH martinez_a_Page_166.QC.jpg
93b9320657d75abc6bf5a98eeb9255cb
6cf18caa086e1a3fad9ed094e47b00f4b878e49e
21096 F20101114_AADVCT martinez_a_Page_159.QC.jpg
6c2c5d40c6398c865018f0403aa39c43
fbd08bd8d5986eca28d33829eca676dab6e4850f
11993 F20101114_AADUYB martinez_a_Page_094.QC.jpg
16daa4474d7fa4b8eaaa89d2de5e42b6
c728f5e3eaad406a1ece8f5e917177dcf58574fc
8869 F20101114_AADUWY martinez_a_Page_079thm.jpg
61d485edeed5e166c97218dd6347b52e
8858fb49d4a535f4ec76c12533e5f6e2ad294f2a
100211 F20101114_AADTTW martinez_a_Page_057.jp2
39410b4b13e36e844c7b070734eb6262
33c85842515882a1c704b5d1e690d8655ef595cb
F20101114_AADUAF martinez_a_Page_055.tif
aeeeb1dca2e4e694085d5bfdbc3540ed
63ec842b46969f2ba8fd16b5bdd233bb8b407b11
5809 F20101114_AADVDI martinez_a_Page_167thm.jpg
3d400c228a529e61cb79c99d83586ca5
712280e011b05bc32fd23cbf0883141823e5565e
4716 F20101114_AADVCU martinez_a_Page_160thm.jpg
2993e2eec0e21c213606526e2e28a11a
9718d55eedb869626f9b6b7b568c5faafb0cce99
3893 F20101114_AADUYC martinez_a_Page_095thm.jpg
a0412504659830c9a52351b965e6e53e
e4100ce09c8a46d618d038babe2d83e2a11ee0c3
9116 F20101114_AADUXN martinez_a_Page_087thm.jpg
71f38ce39501d19a69ca57bd9f52674c
06f197b2c67978ef2ed1da3e548ef5184d3847c8
35388 F20101114_AADUWZ martinez_a_Page_079.QC.jpg
b4be54adccde9b085b36592cf4429c26
cc92379542b16cc77e0547cb42e723be4cb630a7
112153 F20101114_AADTTX martinez_a_Page_058.jp2
a44c1bea3772d158e9813a8e42471309
ccf224b6abba5f44a395f0f025d3357c0d9cb6aa
F20101114_AADUAG martinez_a_Page_056.tif
4a0a8b6bb1522993c526f347a34d31d3
8f94dfdeca295e6d3e6da72968948a09ccba13ef
112033 F20101114_AADTVA martinez_a_Page_088.jp2
f8f07b735af2e11c24af9314c82e1f41
d8a804b39dc4c0a9a0fccd7b320bc572627eea6f
115399 F20101114_AADTUL martinez_a_Page_073.jp2
2742b7114b6a7001bf29ecbf4d35a609
3d00ff8973e452fb96f6b59e79f9a5594bd16df2
23602 F20101114_AADVDJ martinez_a_Page_167.QC.jpg
16b9d061b1821e37a7da3c504fe5ec1d
c979570c6d0a88e7c22610eab365e609b1e1385c
19657 F20101114_AADVCV martinez_a_Page_160.QC.jpg
4a245342577e748b11bef1291844abdf
55d5bb9dfe47a8665f3aef188c05b7033915d8cf
5635 F20101114_AADUYD martinez_a_Page_096thm.jpg
568ce7d770acd95097bb9b5f157a5a3b
5dfa11bcb4574e3fb02b12c938c014eb24d5c043
35778 F20101114_AADUXO martinez_a_Page_087.QC.jpg
d9495b9183e52ca5174022772590824d
37cc0f8115f6f18abc56583d6661b2f595968b3b
70437 F20101114_AADTTY martinez_a_Page_059.jp2
e664e88b36f54515325d888164fc35be
44cab18986215fb4622f3476a4abafbcb2445552
F20101114_AADUAH martinez_a_Page_057.tif
1879e69a7840c2667d4e932c5dbac2f5
ec27d9edab059e517ab3059fcba11bf82a4048bd
116123 F20101114_AADTVB martinez_a_Page_089.jp2
942a1b09b4c901e5b723b9ffcfad58a2
8ff091cd57f831c5455429b1aa99492e913c3b98
115447 F20101114_AADTUM martinez_a_Page_074.jp2
aa7f353e0950f669641373365ccd1785
b688c6423ce06b224281ee6b2e7dd2316d53d003
5550 F20101114_AADVDK martinez_a_Page_168thm.jpg
65cbd2f29f2ebf0c7d44b5b15860a867
95f058e2757401b36e670909bff19861e6c8c1d3
5447 F20101114_AADVCW martinez_a_Page_161thm.jpg
5c08f0b5649925947674a6bafa8ef43f
072dff08ddc0f5cf37e5340b83df82f7fa74baa5
19883 F20101114_AADUYE martinez_a_Page_096.QC.jpg
fc0ae6479b8ef89bef20859663f93f5c
c8c91183a55b1331a96ee5f1c29e156cc34d7614
F20101114_AADUXP martinez_a_Page_088thm.jpg
341f390e16d0cdde13745c1a5c54bcd3
a570edfda3c1e43a1bf0a9ae7b63cadc0449fe5c
78878 F20101114_AADTTZ martinez_a_Page_060.jp2
97f297f7d169ec3ad4a1818a434d215c
90bb544e4e9bc121bbd688cb6ed881bc18e7e20c
F20101114_AADUAI martinez_a_Page_058.tif
59eda1298ae6c9a9d6de8edfd557371c
a937bbfecad1f730bb2c3a3c3a1f3d5a7b049d64
404143 F20101114_AADTVC martinez_a_Page_090.jp2
4aecb1db90ce463f90f7be0d2e346989
c903aa0a0ca577e50d8f30e0a0209d0ef4c04543
99648 F20101114_AADTUN martinez_a_Page_075.jp2
04d00d317a8fd85308f70abbb2f1f79e
6e83835d8b32491c87977aec86e88de8a82cfe67
22362 F20101114_AADVDL martinez_a_Page_168.QC.jpg
c7fa1b2c908d96e1dffa1965e298d4c9
866dcab79bfbe6781bc9820cb4707526b6402787
7058 F20101114_AADUYF martinez_a_Page_097thm.jpg
e80ea98a42e504e1e36467a6c2200a0a
7538ca0591781553b99af51e1a3048db0da9a80f
34143 F20101114_AADUXQ martinez_a_Page_088.QC.jpg
6835e640785acc9efc2fa57392261f11
48b1da4a78d7bec815ec876209a97e3c01aae866
F20101114_AADUAJ martinez_a_Page_059.tif
0479a14e2ffc453bda53baabb545ba4d
48c370484223e37d27a0e27436403a57d5d78140
405044 F20101114_AADTVD martinez_a_Page_091.jp2
5bafcfaad81a2ec1332b645b4f641550
ae8e7ef72850587bd1e25b065abed4ce45daccea
97679 F20101114_AADTUO martinez_a_Page_076.jp2
88dbd3c31d12e23a5bdf37896df04a9f
60f5d2ed040057d82f59762dcab9393b3026e63a
3760 F20101114_AADVDM martinez_a_Page_169thm.jpg
caee69ec90f0d76d929dcf83eef826f0
ef64ac6554ef4586632e4256e94c1eadf296e144
22256 F20101114_AADVCX martinez_a_Page_161.QC.jpg
e4bb525cdcd2180df3f49eebbe28e5c5
d4b7edc93164e05e47275ed898d6d19aa99d0065
26635 F20101114_AADUYG martinez_a_Page_097.QC.jpg
05f445bad284396900bf3c17307ada3c
6db46c3d62375c59461ddcdfad0a57e370358835
8774 F20101114_AADUXR martinez_a_Page_089thm.jpg
d484c76ab9fe0a88ac5d9d850bcce11a
7d291bef109525e0972635524a78fae2d594fb7f
F20101114_AADUAK martinez_a_Page_060.tif
682e9097ea0801324edc1fd072a6e88a
42217b136d8e497a3cd6b7ef9535ffc04ed6e382
409936 F20101114_AADTVE martinez_a_Page_092.jp2
811954acafe00ab375d728fa8e0f3187
d9bfb6527295ec716f33abecf39528b6d78cc3ae
111288 F20101114_AADTUP martinez_a_Page_077.jp2
761d1d5c2cf24525298a39193ca11f50
1de85af511a7e06d7a9ea04fe82303c1590e15dd
15232 F20101114_AADVDN martinez_a_Page_169.QC.jpg
e9116b31b28562fadef017e0a7a386da
59de66306e4af8bac57b16cd42fa2d47e42799c5
2270 F20101114_AADVCY martinez_a_Page_162thm.jpg
5ba852771c1a32490799981a54c277b2
f34b435ccf5f70383ba1a46386d9567f1a090f64
8911 F20101114_AADUYH martinez_a_Page_098thm.jpg
3046580c1c912bcb836431db26962b6a
021fa0ccf7273217feed6e4c8d89b14e45522698
34995 F20101114_AADUXS martinez_a_Page_089.QC.jpg
bf021d8df99ba931a55c968487cb45f9
9921606181cd6963c27b8ecc1470885cfda0e5b2
F20101114_AADUAL martinez_a_Page_061.tif
5a5166429d443129b5cbcfd9d8ad0bde
8fd1e37b0e4a33a103be5b11b96e85ffd8fd6640
391204 F20101114_AADTVF martinez_a_Page_093.jp2
fb442855dae8fee6e1d9916457dd161b
1333a6c4478d4783b8d8d003a769677c7fa2ef7f
113484 F20101114_AADTUQ martinez_a_Page_078.jp2
9c11708ee8d793b8bd3753df2f4ad450
751729c6dc3f070ae74867f506f097d097f95287
8511 F20101114_AADVDO martinez_a_Page_170thm.jpg
e3bbca5e7ecadca44c9a929f4c4cb2f7
b4856e5a816fabffe81ed8800c54185b20c7ea9a
8407 F20101114_AADVCZ martinez_a_Page_162.QC.jpg
6afc3199f4734bf53817b6c89986defd
ea7c347529b71716943c4049181c01ba72328c3b
36102 F20101114_AADUYI martinez_a_Page_098.QC.jpg
e3d61c95b9c391af8761679dc3e9f7d0
2fd0ac5fa270841e92557c034839ded718c32207
3955 F20101114_AADUXT martinez_a_Page_090thm.jpg
a4caa953fbfe1d19533b12faa5927bc5
4d155147f081b6017449918cd0a3c1a074c2c72a
F20101114_AADUAM martinez_a_Page_062.tif
e84c0f12aa6d49c6d503ca50646dfc47
9bf640c803de1e62d4b884790e6e5dd31b2b34d8
389094 F20101114_AADTVG martinez_a_Page_094.jp2
1efbca9b57ec70de9a99b6a9b6a8fbf5
15f6ac1e45b4f523ea3f050ca69297271de26931
114148 F20101114_AADTUR martinez_a_Page_079.jp2
631da0e0d7a7bbdc11ba73977fbc6a1c
e71aae69e2ccd96b8a68b09ba730bad02a041068
F20101114_AADUBA martinez_a_Page_076.tif
eb6199a3f50ce0b776fd91f174dfed9f
bbb605393a34b72d6f8ff14e47d609e001a94b9b
35404 F20101114_AADVDP martinez_a_Page_170.QC.jpg
6a3faa36fce9fe0a464ab0751004dead
de798533e701fe2e8b1a57d9c95d9e9527f28b6f
6418 F20101114_AADUYJ martinez_a_Page_099thm.jpg
5f216959d6c4f5e6a713915ac1f05c84
4f378c1eda1b82653243f15f2680fb2a42449253
3988 F20101114_AADUXU martinez_a_Page_091thm.jpg
715baf399a72f487c25e6aec0760cb20
b7c89a0124632aa73bcb84fa2699ccd9e94a87cf
F20101114_AADUAN martinez_a_Page_063.tif
753b1d20c3f91a6e005823a3297f2215
fb17a281f1cfd8613dfa054dd24a0c9552cbacba
400100 F20101114_AADTVH martinez_a_Page_095.jp2
4e6e00286b148338b5b4b5768bdf78d7
926f6f54da6fcda8f5aa0d19b7782dffb22aa151
117316 F20101114_AADTUS martinez_a_Page_080.jp2
0dd6bdab5292a9b6157069d885c6b278
63e6f387755a125b56c37ba9428e9237c925f325
F20101114_AADUBB martinez_a_Page_077.tif
d7159099b266a5199363afe6950e50e9
12b8c9ca2506797dc111b6b8c17cc44fddcd55ad
8652 F20101114_AADVDQ martinez_a_Page_171thm.jpg
d35058b05a0aebc5fa1981cd7212b26e
c1d64e369d504fa3736649de6a3445cc0a3291d9
27088 F20101114_AADUYK martinez_a_Page_099.QC.jpg
516d220722ae2cf10ac35790acbfdedc
e37edc2296e4a4fa17b2c3101bd7674107ed11f5
12711 F20101114_AADUXV martinez_a_Page_091.QC.jpg
8d79458bafcc6138439ac66ff406b1be
5c3692ba1ad81c73ed61e506ad2a5f6fef556c27
F20101114_AADUAO martinez_a_Page_064.tif
211c6b2e70d56154f85703411665434e
fcba9c57b08bc2b3eba961ded568c71a1068f9b7
1051972 F20101114_AADTVI martinez_a_Page_096.jp2
695ad5395632e37fe5b987df14da8eed
3a46a8e2003a69465116bb9623a6052ddce69db3
111667 F20101114_AADTUT martinez_a_Page_081.jp2
58ff54b15b02479e8a7e28e81b0b7380
bb0aec92e1470294cc428ce8b3975590f234044a
F20101114_AADUBC martinez_a_Page_078.tif
97179fb38f61f7d4a5ff60d2a9115376
34f40618d6414dc94624ed98f867aa825671ed75
9019 F20101114_AADVDR martinez_a_Page_172thm.jpg
ef978c0fb6910bc651fbe14ba0421688
0b50961ae0bc55900b09d46d468eeef9c37a8e23
6346 F20101114_AADUYL martinez_a_Page_100thm.jpg
1661396c5d175238df1045056186043f
5052085f58bc035b77ae25ef8d5e867cb2f5ea12
4081 F20101114_AADUXW martinez_a_Page_092thm.jpg
e8f47b06748cefaab708e2f25f5a2a9e
6d23563f10b31b0d6a64522e99a35ccc3c9a358f
F20101114_AADUAP martinez_a_Page_065.tif
0fa294efc3d949fa728cc1f13c2f6399
3b644e240a22503a71cc679b530b9a100e16a8f4
93629 F20101114_AADTVJ martinez_a_Page_097.jp2
576085c286bd72dbd45a0fffbc61a6d7
c658bb6b0f8f12d327ab09a9b7971986caf20de2
993389 F20101114_AADTUU martinez_a_Page_082.jp2
b4b9a61e9675d6fbfa6c13aff305e2da
c08d1a1a87e09a8b799527b9345a0c48a5815c3b
F20101114_AADUBD martinez_a_Page_079.tif
8933fa079f4ec92f8251e1b94a59c8f2
51ef06e6de8b315af281ff77208af4b6d67c7ef1
37960 F20101114_AADVDS martinez_a_Page_172.QC.jpg
33ff7b192dfbbf8b729bc486225776dc
13223fcd933a02b8096f6d8178a671cdc962026a
35805 F20101114_AADUZA martinez_a_Page_108.QC.jpg
af2becb8164fb2dd9ffb7291d634a6ee
5e634c4ad65ac6cb23dc3d236105b5c9f0312606
23792 F20101114_AADUYM martinez_a_Page_100.QC.jpg
0a474146cc3f10d075c64e7417403c4c
604b14bacfd137318f5f8277bd1ec2dc9ac8c8bc
12842 F20101114_AADUXX martinez_a_Page_092.QC.jpg
671129ec1adaae3075f92c351070fdbf
16627ae571924abf75cbbc80820e8133eeb59bc5
F20101114_AADUAQ martinez_a_Page_066.tif
8daa19b7f5b6d15c4e36894e4e7f85b3
7f58d391e3df65656595d4a140f37499b2e505b1
119441 F20101114_AADTVK martinez_a_Page_098.jp2
363b011e85be55dc3e169be73d4f027e
496089916f5ecf7cee537ca619c2ca5207f3fc11
808943 F20101114_AADTUV martinez_a_Page_083.jp2
f5a7b22ba6e073414bbbf5dabe99600f
8ff16ff540055a0be588c6e6674deba10aa9a58b
F20101114_AADUBE martinez_a_Page_080.tif
3f0d2fed46789c52da14fc3f77278253
909887f70c82b2e5b91eeaeb88cf239ccf10c7f2
8816 F20101114_AADVDT martinez_a_Page_173thm.jpg
80e46cb5925b8f356bbbe0c60fa78e49
52f739b11cf7693983172d3ab8656e42c5a201b0
8521 F20101114_AADUZB martinez_a_Page_109thm.jpg
5609d37353c67ce4562527aa5ea7f6ce
61fbacc2371d43a12933b74fd2a4643228fb852e
4806 F20101114_AADUYN martinez_a_Page_101thm.jpg
4e490d9d143ab33597d87b0b416531ab
3d17a3d810ae6f372d179813eaac2457d2da576b
3857 F20101114_AADUXY martinez_a_Page_093thm.jpg
cbd693ffdb42e978b5fead039a6fab10
7eac9651bab06e97ef4092787026f2f9ebc5d285
F20101114_AADUAR martinez_a_Page_067.tif
b49ba6be6e9b7868614e28694a981cb7
b3917c82d1a0b6f5b0b5376551f94587da5d711b
1051968 F20101114_AADTVL martinez_a_Page_099.jp2
ee00a00d186fc9a0eaee781be9140c6b
069934a44721e9cc905e2f110eb8dfac5688cb48
119873 F20101114_AADTUW martinez_a_Page_084.jp2
02d8039ce52a52a44ec0511f31a06f65
7bd08d6185410ca0541c6f517a451aa12f3a016d
F20101114_AADUBF martinez_a_Page_081.tif
30459fa497f128f6d1f517f9fcd83426
7de020a4f33f9f988107d337787a193b270ca1cd
36815 F20101114_AADVDU martinez_a_Page_173.QC.jpg
7a7451b8b81c0934b20b717ae240bff5
2dd53284a29822209e547fc2449ad2df78a0924b
33445 F20101114_AADUZC martinez_a_Page_109.QC.jpg
50b9e0490d9309e4e204a1ac470a19f1
bc977e0fcf649949dc6bec135ad74fb256d5cecf
12247 F20101114_AADUXZ martinez_a_Page_093.QC.jpg
1b1822aa03ad2182637e1502ae5f6b42
a96d798e2d60db45cbba71b9a582a889f004727f
109410 F20101114_AADTWA martinez_a_Page_115.jp2
d815b0328b829b81e3565730da377ee0
572fb18eeb91cda93af189f74d6c8590f21f8872
8423998 F20101114_AADUAS martinez_a_Page_068.tif
785fff1d619ddf0f80da66f44fda3be8
09186150dfbeefbc27d45c30a0b6a708d6dabe57
109102 F20101114_AADTUX martinez_a_Page_085.jp2
6104c203ac8a3bcde035fa83f1f66b5e
eb2fecd52fc865d5bbfc03b537eacb0fd24da0d4
F20101114_AADUBG martinez_a_Page_082.tif
8390d89bd539335460ff433b2a895734
c22fa62985dfceac82d6f68d70e4ab65f718aadb
7112 F20101114_AADVDV martinez_a_Page_174thm.jpg
097dacec09cf6da23e03e80a9840e2d4
01dfd4ae94f4289e282807e5b1da9f556200da88
9061 F20101114_AADUZD martinez_a_Page_110thm.jpg
1dfdc2b7b32aeee92cebd49f01604e56
ce966a4e8d07a32aa8aebbe501223eaa5516e740
17994 F20101114_AADUYO martinez_a_Page_101.QC.jpg
9c8bb674a801ed80360deabcd96ed62b
18524aab87c5e221aa8fc0cdee4775feb49f6045
115296 F20101114_AADTWB martinez_a_Page_116.jp2
a321ff2d4c7a0c278771a178bb9687d3
22d360074d91c298a76c164f5d8d25d05a693189
F20101114_AADUAT martinez_a_Page_069.tif
76fc9b8a5bcbff7a1be3e1ff691d2b59
44b0ab1dc1d81da3289e4da2d1f0dd38e4d25caa
812659 F20101114_AADTVM martinez_a_Page_100.jp2
a42a14106bd279995dd486a331059f8b
9912a716ce3aaecc4a5ffc8c07da6e59c7906a11
115015 F20101114_AADTUY martinez_a_Page_086.jp2
5ad037b06067ee2b1cb4a3a97c962d0d
902b6be1d85e2803b30da06ca5d89c9328930c02
F20101114_AADUBH martinez_a_Page_084.tif
e6239b02e710749ba953ac31912751e0
9e2ea01dabf2012a0a348baa87fc3fcc51a7e5aa
28355 F20101114_AADVDW martinez_a_Page_174.QC.jpg
554ced8d94986fc836b90ef306ed86ec
004661f065974e2bcc94e406d9744e6b3434809d
35939 F20101114_AADUZE martinez_a_Page_110.QC.jpg
f2a56af905a56a4f67aaf0c1296ad5a3
869b9693b41dd5065064e8ca17068d5ccaaf698d
9260 F20101114_AADUYP martinez_a_Page_102thm.jpg
4879311058708d92ca0f3f8af6dfc19b
6958b2728c6742cfd81d9f7401743d248a226206
118101 F20101114_AADTWC martinez_a_Page_117.jp2
87cbc71d79228c554b0f916fb7d8056d
c5469d70bcca2f18423a1106b5835f72e39e9403
F20101114_AADUAU martinez_a_Page_070.tif
a78eb3c67a81f236555943700c16bc40
de4cea10676eeef0dddadf28567e0a4ecce6fe29
623108 F20101114_AADTVN martinez_a_Page_101.jp2
8269ba57883ac33bb53215aa5f0f9a6c
2e2b1e9d9633a719b6175e7d40ef8672785531c9
116350 F20101114_AADTUZ martinez_a_Page_087.jp2
70831fcfa20560a098cbb0df886a1172
6eb8268ad4a2f7761452323218d1f4b193d83124
F20101114_AADUBI martinez_a_Page_085.tif
055dcf5a8239cfb0ec0c36b8c1529685
df383c47484be46467ba733f9d1c546a2b334428
5129 F20101114_AADVDX martinez_a_Page_175thm.jpg
bf03f685eeec855cabec823646c884d4
1ee8e8b0b6becdce1bc14ad5ec63ac63e2f67632
7907 F20101114_AADUZF martinez_a_Page_111thm.jpg
098c2ba71ea2140aecce36f225404306
ece40288c7814e42107ff1ee31e211cb1640f84e
38083 F20101114_AADUYQ martinez_a_Page_102.QC.jpg
733bc4b9c57864db1ac97c44792998b9
9a7b807d4bc1bb056506f0529d329b54e2049eee
116404 F20101114_AADTWD martinez_a_Page_118.jp2
fef98af1f94192dcedc1b51ffb72501b
6c07369e47263c1b1c16f1e8d32f4892d611eac2
F20101114_AADUAV martinez_a_Page_071.tif
8ab6c37c1d82425f0f8e0e86ed2aa64c
13f4d551b4d514effc77dbd3b390e8059901036f
121281 F20101114_AADTVO martinez_a_Page_102.jp2
e2c8d217d6f79b5d2891e77e56a679f9
8fd085fb245c640979a4412f053c8692992c89b4
F20101114_AADUBJ martinez_a_Page_086.tif
2251fac313c0a85d4bc1cefee9c718f2
a4bc0479f8622223c96c2e41432044b13b884124
30766 F20101114_AADUZG martinez_a_Page_111.QC.jpg
7740ce9d18cff192f1b7cb0b9c7f6c72
37ac784b95bdc8cd0cad4fabe0702d38b6a66baa
7633 F20101114_AADUYR martinez_a_Page_103thm.jpg
aec7ef3cc008f4369f2027141a8280e0
c59b216959d198475730558aba100fc484255be0
935762 F20101114_AADTWE martinez_a_Page_119.jp2
bc2ed99a3376291f432d425d99279e67
388abc5d1cb3358e33cf60edc60af89cbdcefe6d
956608 F20101114_AADTVP martinez_a_Page_103.jp2
543ec0f6627bce34f4643e339c2c2926
141e1664b3cc10817a7e6b63ca55bcfcf7239674
F20101114_AADUBK martinez_a_Page_087.tif
bb664148782a4d1c14e2ccf32313a00f
744e561e5f75fda711664e3361799f6eee93d2a0
20831 F20101114_AADVDY martinez_a_Page_175.QC.jpg
7451e046b742efbff850e39545e8a174
5784b4cfa7908cc3742921274f5218ee351489a4
5643 F20101114_AADUZH martinez_a_Page_112thm.jpg
f9eb460a832a2471894ad798909c1639
3a1b7fcaa7e6de7c78e0620e207e0c46b471bad3
9044 F20101114_AADUYS martinez_a_Page_104thm.jpg
fdac29d78ce483701b2fc6f4708dd302
49eced4b63e7cdd77b3d38e2824bc12667ea866f
1051937 F20101114_AADTWF martinez_a_Page_120.jp2
36dd000ff3735ba8140897a4c2192424
0d41957a090d31acb73525becef9601bd3e056cd
F20101114_AADUAW martinez_a_Page_072.tif
291a45bf14292471c774425901efbcc8
e2562c1d02aba5a9ff7180f9035bce6f29efabde
117542 F20101114_AADTVQ martinez_a_Page_104.jp2
4a525e61c7bc741a1e575754c49991e5
2be9070fc30ef26254443a404341845e6f6e8f5e
F20101114_AADUBL martinez_a_Page_088.tif
c07df1aceb5447833d4e665f18c5f016
e4219accf36f2fa0a645d9f238d3db47149bf306
203247 F20101114_AADVDZ UFE0021491_00001.mets
911d3ef083f78d679d8cdcd7a3f33ccd
74aabe2934a35f49c8a1f5c42bf7bc1d959a37f5
20142 F20101114_AADUZI martinez_a_Page_112.QC.jpg
d188e9aa6f3dcf67e431b65065c28090
d667b05b7d0a68ea0bbcdce961f2bce1a360fe17
36270 F20101114_AADUYT martinez_a_Page_104.QC.jpg
3c7f46df8823201a22e9c055f4e61e14
d01156b1f773c54e63d62d1bd9c0f93ed52dd124
107208 F20101114_AADTWG martinez_a_Page_121.jp2
0961875fda8499ffdce7b2df15e4b195
9901f15aae4a1a98d8611b838c505f77e7e5e718
F20101114_AADUAX martinez_a_Page_073.tif
bf41c2af5d1f12527bc12eeab0bcbb05
fef1e5154b23d2846d6a4629e1d77b201fbf9ea2
106834 F20101114_AADTVR martinez_a_Page_105.jp2
f06103cebeb97e04831d2425a2b48111
b2fb754a5c3fb2c62823e8ce0ef99b88f3bb6ec8
F20101114_AADUCA martinez_a_Page_104.tif
026a27609585b3c41edaed14dfbe63fb
34107a950db7fc3f41644b11cefed50624c6a0a2
F20101114_AADUBM martinez_a_Page_090.tif
9af3b337d6f553e3dadb93843bd97a67
a376baa773a2fe409b9c14be7e3fa9d0350b998c
8144 F20101114_AADUZJ martinez_a_Page_113thm.jpg
4f8c5d2db05cd1b71fd98b88f9bace55
887d1779e60c1c2e225d98c5d5de20a0779976c7
7951 F20101114_AADUYU martinez_a_Page_105thm.jpg
568fd73bf3cef709c20a2e17e9124c28
3f775a3d43d14946e17867e4ce546120e4071b3c
105431 F20101114_AADTWH martinez_a_Page_122.jp2
b13394e76e21696a6670f385be9c0a31
4fa248d06af62060d7433207e594eb443bc94d24
F20101114_AADUAY martinez_a_Page_074.tif
555604f0b9d0e55261326e0cf1317ad7
f8acbbcafd42cea53c2e5623b21ce2ae1dab6181
119702 F20101114_AADTVS martinez_a_Page_106.jp2
26fa49b9e19c2c5d8911d3e762650f77
1f7631ba619d7ab525b341e0637af5118972e2a9
F20101114_AADUCB martinez_a_Page_105.tif
3f6642963de52e7c18ef453f7a97dccf
d1c855cf22527a81e29a556027fff0e4f61ace6e
F20101114_AADUBN martinez_a_Page_091.tif
133d159451af154f060942e53ab679cf
4c820873df5627351ad4cacce6a435472e19544d
29333 F20101114_AADUZK martinez_a_Page_113.QC.jpg
0ad5feb6f5b049a3a906da31d1f68efd
cb18571675675e8d24a7ffe861e3f8b2f9043def
31879 F20101114_AADUYV martinez_a_Page_105.QC.jpg
d3dae31932a77cdf66408bc280aa1ffb
17bc8ee53e45c0bbfa2adc8345963c4e4b551d5c
23170 F20101114_AADTWI martinez_a_Page_123.jp2
7360c4af4ae8d36da98c8d301220611f
628b78d7e083f83a6c2ef5057c0fdccb0097b282
F20101114_AADUAZ martinez_a_Page_075.tif
d6aeb83b93fe248cb9f92119134675b8
75972e6db6f4f3ca3d66a4a85128974ff07dbe65
116015 F20101114_AADTVT martinez_a_Page_108.jp2
452cfb519a78f9ea9d6ade5ee1da454a
e7c0e321a28a3e021d772811c56172f3ce1ac6fc
F20101114_AADUCC martinez_a_Page_106.tif
3e3757c0c02ec57621127ef7ab86e080
65c271e8ed7aae701601e6bdbce06e9760d75f1c
F20101114_AADUBO martinez_a_Page_092.tif
f202c9bb32095e90a3f89bf97b6307d5
daba66b59086569f9a1fbba82eca0bf03fbef857
5873 F20101114_AADUZL martinez_a_Page_114thm.jpg
b3483301bdd78a80670f2ee8239cc94c
262fd28f0f548412056ca53d6aed7a652030c38a
9187 F20101114_AADUYW martinez_a_Page_106thm.jpg
d68b5ad30496d3fb4028ccdf19f8f2d6
e1cef1998d0b53dd29a4a4eea2dc34573d3cb583
110208 F20101114_AADTWJ martinez_a_Page_124.jp2
9eb431d36ccc30747f52f21305e86830
db8587a32d651bc8a4e6609697d63ccb6e303c23
109837 F20101114_AADTVU martinez_a_Page_109.jp2
9dec4ee4e509484f7e4ead3353e0d64b
826b1a2297e033bb28e67febe5efcd90e65131ca
F20101114_AADUCD martinez_a_Page_107.tif
942031e321d06c0b245bcb863ea53499
5acf2625917216336232d4a9dbc2de9531e22e78
F20101114_AADUBP martinez_a_Page_093.tif
c9cee96aeb4397c8ff7e7bde31a7facc
ed391e6c7b0b6fa3dd5171cbdd0d6df00e539aef
20665 F20101114_AADUZM martinez_a_Page_114.QC.jpg
15bde1075177720196d4771ce24a4c88
58ace4106314c89183d3b0ca47238aa9289a05dd
37037 F20101114_AADUYX martinez_a_Page_106.QC.jpg
5aea572616c995f4bf65d40316b4e1b6
11d238c364cbc5f1fe97942eef8052725364abc1
123473 F20101114_AADTWK martinez_a_Page_125.jp2
e2f8faad0ca1309fb22bb48f0e7a94da
26d5eb6fd81e82940255ae3378946d01f41d64b4
116609 F20101114_AADTVV martinez_a_Page_110.jp2
3a3400a29f3b11e2b1750b790984c0b5
02b1ec2a0b49de6907d09af4a1b6c05fc2d28d66
F20101114_AADUCE martinez_a_Page_108.tif
9b0e6b6e4b21246c03af92abe89292ff
1570351e4e1d8f3fe303e0492ac364465ce8bc3f
F20101114_AADUBQ martinez_a_Page_094.tif
85674da45adb973d2dc3fc45949004d5
20fe906e0b47d61fdb5d1410e5fd6998a2e35a2f
8688 F20101114_AADUZN martinez_a_Page_115thm.jpg
80913eb7f0144c13d7a140791c13f4c2
fb860e02d359c6447a079e3778a0fa4d6771606e
7694 F20101114_AADUYY martinez_a_Page_107thm.jpg
b52934690959b0523a6c4c8ad0f347ef
e8202fbf9cb9b64b388d8255d9385df4b5fb36fb
120486 F20101114_AADTWL martinez_a_Page_126.jp2
8a8a7aee20962f04844c4ee39d6331e9
8cfa6ac656144c81bb417b93f38568ce9e824a87
1007729 F20101114_AADTVW martinez_a_Page_111.jp2
b4b1f041ba68fdd7a0c7309442c99dc8
11f874ada3585400fc0693a96f59e6eb36c4a0e2
F20101114_AADUCF martinez_a_Page_109.tif
7236e9450c895c55bb8ca10f972dc871
abfd92673fec286959422e5c49e312b8eb89c286
F20101114_AADUBR martinez_a_Page_095.tif
7b63e3784013ba479960bae5c776ba87
5e92faac2bc80a34b2a77c34bfffb978514cf189
34516 F20101114_AADUZO martinez_a_Page_115.QC.jpg
2293cd686b5581410a41d3a6a22ea9dc
2ef5940d65a6d7ba7f22689fb75ad43e27a2b941
9043 F20101114_AADUYZ martinez_a_Page_108thm.jpg
3d1e4c44933f777275abfcd871d4f1f6
4f514aea75d5840a8e0bc31f5be55f7334b0a5f4
59015 F20101114_AADTWM martinez_a_Page_127.jp2
33a68fdcb71409d97473b49678416067
a7e5ef676b9e2b3bf61a9173a922e681c662ef10
961244 F20101114_AADTVX martinez_a_Page_112.jp2
98d63ec8c21d7aa018afc614c6f94ef7
f0097e78bfdf535da4c15302bbf9476ae7367d58
F20101114_AADUCG martinez_a_Page_110.tif
8007bc79469ce7288430354ac35c2657
d335603386a8a745b3cebba5d09b711ae4417ebd
126896 F20101114_AADTXA martinez_a_Page_144.jp2
0a77e6cd775fd83e7d139570ec088a90
ad4289c639ace53f8973d3f2c7eeb4c0da4cc877
F20101114_AADUBS martinez_a_Page_096.tif
c084075d4d2a432bf8efe8b32b06774a
feaddb66dde2b2d358648dbfb222985dacac7b72
950144 F20101114_AADTVY martinez_a_Page_113.jp2
6baad84f44807aa28253d893f8456d8d
b57b4def9194cfe52729c93e5caf268ac2d66703
F20101114_AADUCH martinez_a_Page_111.tif
c294f0bb80e2594a099c2fcffeaef072
2b20dad541e886c1b913d77bfbbb2a601afccdc9
54272 F20101114_AADTXB martinez_a_Page_145.jp2
5b17ae336c440f94744ddff00f239d56
7547f5e6a54906876e1ed964566ad0a001589aa5
F20101114_AADUBT martinez_a_Page_097.tif
9ea89c492737312fc635ba69b820e609
53c44c9e05bb2f7091bffebfa502611b00455167
8725 F20101114_AADUZP martinez_a_Page_116thm.jpg
f30f8a990939058912419d8bd213494d
f6e7a64923833b5271d654072c7dae866577ede9
740291 F20101114_AADTWN martinez_a_Page_129.jp2
065df7620e524adc5faf37b75b2ce435
9d6d11a9a9c7f26999dfe4044c05d02cbb36d1ac
1051921 F20101114_AADTVZ martinez_a_Page_114.jp2
67016b901264b2869cd4f6516f97659b
b20eafd9aa01c9b6a1f5327dff910a66921112a0
F20101114_AADUCI martinez_a_Page_112.tif
7c5fb0d04dac0354776bb22f33ea7bb1
edf2b8158c411cdef9d434cc1d8683e601ddffcf
141132 F20101114_AADTXC martinez_a_Page_146.jp2
6c47a65f7bfe9170979f50099d601034
9de424e942c5e588cc80700f40d865c04d8e4d66
F20101114_AADUBU martinez_a_Page_098.tif
5f2478d7b7e3184a26a28dc5d47eec53
7223e997b8f0303468a2cbde87c080270245ef26
36113 F20101114_AADUZQ martinez_a_Page_116.QC.jpg
3a273108424a96438ff551fd0304f2d0
eab452d21f9334e6b0dad4215a4e8e44744be45d
788007 F20101114_AADTWO martinez_a_Page_130.jp2
278093b43cacc3ad757d2c1ff892a87b
f0a238a6f1380a8389ee21c0c56fea5547c9f960
F20101114_AADUCJ martinez_a_Page_113.tif
5fcc298023c733f8582696e248042e5a
81bc051c2ef3d6d6c31e867aed62b6d1366aa1b2
54004 F20101114_AADTXD martinez_a_Page_147.jp2
44088da8b56421771e34986b0070a26d
2231464071036b990d5a22f9e4160e98b86ac571
F20101114_AADUBV martinez_a_Page_099.tif
dc9546cb57889259359cb239aec4658e
0984dff9e6dde27851215d4549a0c2672fb25b8f
8947 F20101114_AADUZR martinez_a_Page_117thm.jpg
e8a139f4da56f98b0c2e1d3ca6252511
ef8e43dbcd9955ec46134cd90fb346cef28b371a
455301 F20101114_AADTWP martinez_a_Page_132.jp2
154f8dd7b9effb550dcbf9a6c48b0825
8c55d9dae63b6fb3f1c8d31a9349c44ad0223617
F20101114_AADUCK martinez_a_Page_114.tif
1227c3e3812f0a0882bcf16ddeca0d90
5f572fcfe0926de74c24c58f385f3076f806cbbd
154825 F20101114_AADTXE martinez_a_Page_148.jp2
7b8458ad85e01d5f086fc2d3b7a81d8f
234c38d1afdc5a37ceb10a1437829e247ed00e74
F20101114_AADUBW martinez_a_Page_100.tif
279b595d0c043316d8b4572de51b8e0e
21325c59ceeba43672bd8bbf5b3dd80a00ac2d8d
37019 F20101114_AADUZS martinez_a_Page_117.QC.jpg
2934a2966d68d03e83ac133c3a57ea11
4df7234258413fadd3a580b1eb5eb19c65f0a82f
426665 F20101114_AADTWQ martinez_a_Page_133.jp2
9ef8b3deff8a24f5cbaf09b5a31ac516
9d91cb8d53025ca2cbacf8cca31ef2202762e837
F20101114_AADUCL martinez_a_Page_115.tif
3b8681f0e555551bb909203b08f56a14
dc40e776d647ef3d8bd11decf5f8f1f79d6267d8
36676 F20101114_AADTXF martinez_a_Page_149.jp2
24f132d0704d1c4cdadeb567374d9c2a
9af0751d31d76606ceb667be3b1a2c8dcadb9a29
8799 F20101114_AADUZT martinez_a_Page_118thm.jpg
71a62f488a3be9ba71c74945f4d1885a
53a73893459389010d680c01741e07ae5852c858
87132 F20101114_AADTWR martinez_a_Page_134.jp2
eb372b95f0180f809d07f1acad9a19f7
7cc66bd380c70da554ec85be352c8327eb377ce7
F20101114_AADUDA martinez_a_Page_132.tif
575bdc07e094f9689d9ccb84ccc0f0a6
1e94bc16a06f32aff5fdc6e4f2175350f6235c5b
F20101114_AADUCM martinez_a_Page_116.tif
71bafe786aecd5889651414e07c08db4
ea914635a4cc369fe867272fe30c21e8737f1130
149801 F20101114_AADTXG martinez_a_Page_150.jp2
b13c9abaded990d65dd92d5a23208228
66d3a57fd85356529eb622c96b64c40ec53f940b
F20101114_AADUBX martinez_a_Page_101.tif
7702841c87d8261191616e70747ee359
e7f28847921a0a71c9df473b58a9353d6c13b711
37030 F20101114_AADUZU martinez_a_Page_118.QC.jpg
8abe807003d487371cc8f85d51c8fc82
18e14c75156bc1b92b4d5fd61ea0d7329d69455e
F20101114_AADUDB martinez_a_Page_133.tif
1951199d2be51ee9a8d249234228aa54
5934c19b3a86fbe0df98bacb84a45809fab7f1f9
F20101114_AADUCN martinez_a_Page_117.tif
23709ac90ebaaf412c301eec0bea8b74
55b7dea8c6f8c630a03790b29e5f2403b8f3e405
136110 F20101114_AADTXH martinez_a_Page_151.jp2
cace4acf22cea62350f6b8658317f966
048f7d0831e1d71010554957167ba2f8b73740a2
F20101114_AADUBY martinez_a_Page_102.tif
b6636be30a3829c99b050eb237b3a3f6
701a075b6a0319b05d0afd2c8f85cbf1d2408bbe
71478 F20101114_AADTWS martinez_a_Page_135.jp2
9529ec6de2dd1be821f275d1fb5d5b08
c3a39bfc55e0864cca93195fc4cc362f62cab5be
6984 F20101114_AADUZV martinez_a_Page_119thm.jpg
e2974da41c7e7d1a9bfb63e7ffc27e7a
de3ebc7c34c0130f5f734692250d8efac8ce423a
F20101114_AADUDC martinez_a_Page_134.tif
1ee03f20e130e9e37446b2b66e4bd60e
3ee63868d662cd595d36e7bb2bbb7bb7112f294b
F20101114_AADUCO martinez_a_Page_118.tif
eadbed6691803c5523025e2a4c56aea1
9a7d1d68c1715b1650e1ab7546eec1c73015f4f7
135044 F20101114_AADTXI martinez_a_Page_152.jp2
af83814a88fb445ce00b13e8a38c2abb
2250b16dbe501f751da29323800a9d978f6ef314
F20101114_AADUBZ martinez_a_Page_103.tif
cf36afb5ce795366d58fefa56ac2d51c
a00b4c1cec7d40d15f587cc60febb8ffaa85fb4c
35528 F20101114_AADTWT martinez_a_Page_136.jp2
d5f040dc79ee6d4ac92c0edc44c4a637
34e406ba10f7f9f269ae2b121b5638ba676f2f66
23654 F20101114_AADUZW martinez_a_Page_119.QC.jpg
8fc90487ab47d766fc6a82721e3bafb8
4671dfd2a4fdc84c477b01348c104b6db2050d0d
F20101114_AADUDD martinez_a_Page_135.tif
bf5cf2757b191e678227bf80f5d74197
be14e8625448fc8a2eb1412d531078e260aad51d
F20101114_AADUCP martinez_a_Page_119.tif
a24eba5848fdb6cbaf6f77019ee44b76
e9fe4ab6a5bae59a56799671906d30b59a5d81e7
24743 F20101114_AADTXJ martinez_a_Page_153.jp2
7c9f182920dfd06873d80e3f717f3236
4fdf2b947de2d6914d813028d90cf86cbe3dac91
82047 F20101114_AADTWU martinez_a_Page_137.jp2
09bc99f6f6a4f2c83c97bd3b36fb6b85
d279f46aa0661154210fbcea9b4aa547a2c4ceac
9188 F20101114_AADUZX martinez_a_Page_120thm.jpg
9c91cb59562bbecdf4d424e2641bad80
798e2a6bd9a9d18454a174d3246290b60451f289
F20101114_AADUDE martinez_a_Page_136.tif
d34a731593db182487dbe6ae0ccb85e7
0132a2b5a50aee0602bc1648e0a0b23ae05bb31b
F20101114_AADUCQ martinez_a_Page_120.tif
7ed6683ede344b888dbd389880a7f7ca
35e318dcc56f25a53bf4e4f083340aec3bf3d09c
89733 F20101114_AADTXK martinez_a_Page_154.jp2
1d82357b3cc5ed9cba51234834903e64
e79da578d92bf1d046b51ede754879311b9bb169
26652 F20101114_AADTWV martinez_a_Page_139.jp2
75fc9145944c49a456110cc620ccaaa5
4e813f868bdf49fa5f96a1d2a85c37140eae56cd
37902 F20101114_AADUZY martinez_a_Page_120.QC.jpg
1fab78e40e43bc6d90f22e6c7281b4bb
9e4af54cb4cd2c3a9ce10a5ee0be78ac3e835977
F20101114_AADUDF martinez_a_Page_137.tif
a68401b2afa1205affbfc1c5507200db
533786bb5bae4c2ee6f92cfe25d4ddbd07642b83
F20101114_AADUCR martinez_a_Page_121.tif
ab5ad2b6a157266cbb7f1aec70e26a3a
f729283103349fd973e5b13da9fc93fb89ebc7c3
99229 F20101114_AADTXL martinez_a_Page_155.jp2
96ac75f428ad317bfe4e30a515ad307e
51c341af3e1b19a677f91ea379f9c61adaae0eec
127529 F20101114_AADTWW martinez_a_Page_140.jp2
bf588366b23a9111e821eb20cb64402d
e486d60a070bb38118eff5afd270dd6162462765
33514 F20101114_AADUZZ martinez_a_Page_121.QC.jpg
1ed464cdfc9d7f1f84f254aa90c67fc4
7c28cfdf65d69cd460aa96356b3c83c7d3ea5800
F20101114_AADUDG martinez_a_Page_138.tif
b8e724e791d328d76cb8b5db33704c74
73a048314552a0a647a344fe0a687578998b6f98
133481 F20101114_AADTYA martinez_a_Page_171.jp2
6c3f297ac1c40001607758fa8ed7b9ac
c6e8fe39c002f048178b672c2133e319c4d3a723
F20101114_AADUCS martinez_a_Page_122.tif
9192d846759fca27431d05f1556dc49a
897ef3ef6c2a03876e8dbb5fafb025240f8e3b2f
127519 F20101114_AADTXM martinez_a_Page_156.jp2
89d2e2383a39ebcd5f8525f27dc7a4f0
000b07afecb008da7de875e6baa48199bb935bfc
148458 F20101114_AADTWX martinez_a_Page_141.jp2
ff1be9beedbc0525a3e640c89f0f41df
a88552caae98356509c67ef9a40dda4bb170c501
F20101114_AADUDH martinez_a_Page_139.tif
05992c7161f103c4ca38b45d4b85f489
9883cf29658106191fe8291cfb69667a30d2ef44
133027 F20101114_AADTYB martinez_a_Page_172.jp2
5b65524841f31cfcbb2093968b459c0f
7624cac7beebe8d447293b7fa940315d9ce373b3
F20101114_AADUCT martinez_a_Page_123.tif
deb7965b79575783769cd28a91f92ab8
1e492ccfe68b27fb756eafcff5f0aa3e502d8abd
103309 F20101114_AADTXN martinez_a_Page_157.jp2
a32a5fc1327eba1dc5ff32ee23ec21a7
7af367889ae06f40fb942fd87a1086a58f6bf2f6
47299 F20101114_AADTWY martinez_a_Page_142.jp2
66c4833079dbf0adb107400ea7835700
e60c510a2e36e00ac074e2bb541b755586bcc4fe
F20101114_AADUDI martinez_a_Page_140.tif
b77ad754501a0db6e29d07910e0e6e0f
f439401e7d19c90afa4d5fd1ff7a57855000aa55
131772 F20101114_AADTYC martinez_a_Page_173.jp2
cc224daa18ffdabfd85e1300ef648c9d
90e7bd937ddc570618e337bc3086120fb722e495
F20101114_AADUCU martinez_a_Page_124.tif
a2b19796317f8ee6f94937070c837156
4fff415f9da09a986781fd83e2690d2c373ecee9
43796 F20101114_AADTWZ martinez_a_Page_143.jp2
4ee9a82f10feac164b7b50e1033517d3
353fb7580ba229fc8f8849395b9475cbd7ee791b
F20101114_AADUDJ martinez_a_Page_141.tif
9d103e9f3c5aeab572a83bf93d20c25d
eb30835a0290c55c0050e755e9392ca09f8ecb48
103450 F20101114_AADTYD martinez_a_Page_174.jp2
a2c76e352d1cad90aec526a71a0dba83
ba76770087305bb5d622b72da449c7460d90b7d7
F20101114_AADUCV martinez_a_Page_125.tif
fde087280fae1217ea9b32644111f3e3
b2c0450b9146a83d7d4fdfcaccf9b4f3952181f7
96367 F20101114_AADTXO martinez_a_Page_158.jp2
3dedc69d5addd39cc50e80d5fb1f8408
b15386df9e1daad93e04d5eb27a1a63730e39334
F20101114_AADUDK martinez_a_Page_142.tif
0ed2084e9ff71d78e3626b17741e31e8
e291e97ae10331bc7cebf3a2a648926032f2669b
66971 F20101114_AADTYE martinez_a_Page_175.jp2
dd246e3594bf5f97329db7fddb4e769a
aab8076794c385ac4c83fccfa8d083e5f346bfa4
F20101114_AADUCW martinez_a_Page_127.tif
a9514ee793358306fa9876db13e491fe
25f53763d5827b5996eda9d7e07b366f6e73fff0
80525 F20101114_AADTXP martinez_a_Page_159.jp2
2a30093d15d6e25a710b9d14865a0bbc
6171b560d6a3dc1d7461baed516ba4402ec5ae1c
F20101114_AADUDL martinez_a_Page_143.tif
5e41d429d3c79ad39d0f4c526404fa9e
e3464c11dc94374b21c6da1dd139299731889ec2
F20101114_AADTYF martinez_a_Page_001.tif
a01e5d172fa7f94e697651969526e687
9916175f343aefa6d1d82d6ef80dddf8538c9b08
F20101114_AADUCX martinez_a_Page_128.tif
0c44d767fc873b28305dd40cb5021d8d
10b71668f8f88bc3d56db06a55f48a52f15d3b4d
77359 F20101114_AADTXQ martinez_a_Page_160.jp2
5debf8c8200039049250ad8d0076792b
5e4dd6946dd65a8f3a498d398adc05523408d3f5
F20101114_AADUDM martinez_a_Page_144.tif
f1f9aa1711e1408a146b987a2edc833a
7617b17a06d95ce0f04a452552396cf16b97906d
F20101114_AADTYG martinez_a_Page_002.tif
c6b77c44040c0b6426fabeadb991604e
44460b681179ab251a1637eb09ed84aeee85a32f
31925 F20101114_AADTXR martinez_a_Page_162.jp2
e18cdf6e59e657f8e42fde472231b8e4
cdca2f7b8cc8953160c43688e4ef50196ae80e76
F20101114_AADUEA martinez_a_Page_158.tif
1399a9e510f888b77117ba7d95e3e05f
28aa144af30e3d502ddc1e186c4ce62c1659ade5
F20101114_AADUDN martinez_a_Page_145.tif
f9f609c7f5b4df4381dd4d488d7d6dd7
4a81a0429f7895f0495a18d21d1ad07bc4f18201
F20101114_AADTYH martinez_a_Page_003.tif
b09265a319cb660e8da153cd7da3da6f
9ee9c4e0d8c26a1771b1fa223eae5c6650f2972b
F20101114_AADUCY martinez_a_Page_129.tif
efa5d7d03597e90f5b0c8f74ab91f9f9
1c232016de53e15f670bebd9b5132ed3ec7c0f0f
83083 F20101114_AADTXS martinez_a_Page_163.jp2
870ffd3b72b05709dd52315db4a048c6
2f1cca7bd464bfc20ae6a6fa91e138c6ec3f8f8d
F20101114_AADUEB martinez_a_Page_159.tif
763b012c5059857764d9f3098c019114
b4f161e32ce0bc92790eb389bcbf8bae43d37231
F20101114_AADUDO martinez_a_Page_146.tif
46cc8e1b045a6f91db3d024a3b24a2a0
ffcc059d266c5f785283a138231d07ec82c7072c
F20101114_AADTYI martinez_a_Page_004.tif
71c32c368df0f90a6095b396769c65b3
73726f246faa674d46c2b0bb39c0f0033fb79d91
F20101114_AADUCZ martinez_a_Page_131.tif
c9f05311de7ef7eb691392f99600d91a
8f2fe76783fe4eea9fdd4f4bda67ec083915ac8b
111385 F20101114_AADTXT martinez_a_Page_164.jp2
213c899a4162c3097aab8978b0e10c71
fa8c694c27481b4c0cc7e2d684be0cb5bab8f48d
F20101114_AADUEC martinez_a_Page_160.tif
ec7808c0b25d4bf47a2543b48696ef04
4dace61700173d5eac3b963fca409fe4620fe3c8
F20101114_AADUDP martinez_a_Page_147.tif
95ae365945eb56f0019536ddbaf8080b
15661f0cefc9fa8e7f4229f8b8a76a1f8a71c069
F20101114_AADTYJ martinez_a_Page_005.tif
9dde98e3f110a2ae6df980211be71b99
c5d726471496d32f9d3a568b36a2cd4670210d95
120524 F20101114_AADTXU martinez_a_Page_165.jp2
b9678c318ba33d48191003612481fc2d
d95979a20ce7e58ce7a9accc6f087a20ca9a2e19
F20101114_AADUED martinez_a_Page_161.tif
e947a8f36be5225419d3f3fbf3711eb0
2e646e9d39bff3120c2ad38a8d7f45b0e9d894a8
F20101114_AADTYK martinez_a_Page_006.tif
e15583eaf1797e56530403b990d87e56
a98aa57f86fbc7dc6d248b15b6ab230a763471df
104945 F20101114_AADTXV martinez_a_Page_166.jp2
a6b77c4819478abed227a950c2bc9003
3e468d339660c53d30737d1ddf1ff79a2fc8adcc
F20101114_AADUEE martinez_a_Page_162.tif
cad71dd2b172e13b0893644f8607b7c8
3467eca07c375c178249fd2398269ed50d8b5fe0
F20101114_AADUDQ martinez_a_Page_148.tif
08abd5afffd19c9434887ca1756f5ef3
cab1ce4bc870dc4c97e4a106a61b3c6e1b418638
F20101114_AADTYL martinez_a_Page_008.tif
8664d221e6fc2fc9c56071a0a55bb1c9
9b21ceec24899a2a1210877346171bd85a9104a9
91487 F20101114_AADTXW martinez_a_Page_167.jp2
9b7e9bf2bd0453dc75422904a1866f95
f890e4a9e77d0b3361185a2ed863154ad16e06ee
F20101114_AADUEF martinez_a_Page_163.tif
d0de0ac32bd6c15f6b396b82e9066438
d68fa4981acc255860f8af7119d255da56f65462
F20101114_AADUDR martinez_a_Page_149.tif
feab268314c7f6daa419f9b0ec0ca953
2fce9272503d9e46cfea93581a833a68a916d9a6
F20101114_AADTYM martinez_a_Page_009.tif
fb1e466cc3a05bade1fbd36dd8b2c277
09339250e7f61508c07a1372cd2d9e946a44810e
88853 F20101114_AADTXX martinez_a_Page_168.jp2
a5a8bd421586e8e23aa18a04ab31d863
99a7eec5a416a392b1c4fa6f97a3ac44ddd4fc01
F20101114_AADUEG martinez_a_Page_164.tif
9794f72593e7cefea96e0b5b43fab60b
17715c5574d0a0ff6f314facb264febf4b7594e6
F20101114_AADTZA martinez_a_Page_023.tif
5357c19a3401adbf285262b15424a1bc
05f1b10ac50169b4d7a7a72223a3b5231bab9adb
F20101114_AADUDS martinez_a_Page_150.tif
e4977ee11e5058d646a44444481b7cc3
9c553481cf3264a7cab50128342720f4f2fa66ea
F20101114_AADTYN martinez_a_Page_010.tif
a3573f501624ae0d57d32e5ef49a5bef
2f0500b3b62cba67ef6dcff880f52c8719df2cef
54540 F20101114_AADTXY martinez_a_Page_169.jp2
e73b88e6f58a53135eca10907ce4894b
93b517514f286d8b53fabfcd57333509f143eb47
F20101114_AADUEH martinez_a_Page_165.tif
ac2424314affbdd5d2510f701afd5d75
ae8fe539581f2b06ccf6694018b74c29aea160c1
F20101114_AADTZB martinez_a_Page_024.tif
d1173d6c33b6ee3de03719c7f7d53118
d0271a7dc4b9ecef634445c2882bfb83d1a7c855
F20101114_AADUDT martinez_a_Page_151.tif
9e799de841abc07263417bb5353f0ac8
8ed296dbed2a710b03c51108c008d6bfc3c08051
F20101114_AADTYO martinez_a_Page_011.tif
9561ec287bb523bebe24e0f54ad555a0
851646ccc65532c8e638ebbe8528d552bc5fb049
125633 F20101114_AADTXZ martinez_a_Page_170.jp2
2e1e0c745afcc60b1a38397e1d59ff6f
b0b0e46e332e3f64a8bbb973508d75b8be8ad76d
F20101114_AADUEI martinez_a_Page_166.tif
30b9525de600fdc26457334dc617d589
239b1a6a023b104360949c6ff392e8f8eb7f98d9
F20101114_AADTZC martinez_a_Page_025.tif
f5bb4a039698d7765612cada04e23be1
5821019a693ed6d531fc65760b57110a2810c284
F20101114_AADUDU martinez_a_Page_152.tif
a675b0198cc1f41216f1e67fb73aa785
d56765fa693f8de0ba77275cd9eedfd14cee107b
F20101114_AADUEJ martinez_a_Page_167.tif
cbaa6a337b9a5bca8df62ac2c48c7052
233440bc3800962d6bffb4edf74bf8cbc2add030
F20101114_AADTZD martinez_a_Page_026.tif
d9194c9a063aa7af4062cb0d809f1b99
419a56305215e2ae5fe02902cb5bf62cc4d8381c
F20101114_AADUDV martinez_a_Page_153.tif
1f280f6bb5d316978478384e837dfbab
d675efaba6a1b2b37f5e6d6aca5680271365c930
F20101114_AADTYP martinez_a_Page_012.tif
96089fdd516f663bc2cc9e2e5cfbb001
f2c721b4220649a792ac609769f19b2dd752d530
F20101114_AADUEK martinez_a_Page_168.tif
836c02095e095337d402e65d0eedde0d
1aefcb329585f9aca4abe1673a6955ed754359cc
F20101114_AADTZE martinez_a_Page_027.tif
4521a23b704a9d312f4cb82d461e91af
cb3ec20ace503489dd6ed7a9bea29fa5a8fb2b05
F20101114_AADUDW martinez_a_Page_154.tif
87fa9728bb130409f86232de4be171d9
f653b6036fee364b77cff1d490ca8df64d2e4016
F20101114_AADTYQ martinez_a_Page_013.tif
bf40bcac22016c33e3c01e6b6457d6fa
1f82f8e186bf98e64599012e03702c1db126ba74
F20101114_AADUEL martinez_a_Page_169.tif
1e0af582ed40ea8d7444d21a294e83ca
fd2c8184cb69a2ea0a4d138c0c81831c89e24110
F20101114_AADTZF martinez_a_Page_028.tif
f85617b7c9b2c770101a9f61f08055ef
6b8064f8e95dc8c92164a95f932ad574b3e4db9f
F20101114_AADUDX martinez_a_Page_155.tif
2383c42ffbeead1b35626cd50b4697d3
4649d5140b730c199e1417bee55c1b2fc53dde65
F20101114_AADTYR martinez_a_Page_014.tif
d0ae061df3a8b5df2cedac906bd80cdd
9d3b11769c0c32132070aa7d1a399fc9746f9e29
21587 F20101114_AADUFA martinez_a_Page_009.pro
ffa2ab2b033d43e8219c4fc82e1dacce
da7a433fbca0b7a8ba72300d94abdd35dd1d00db
F20101114_AADUEM martinez_a_Page_170.tif
cef114d3ed3a5894de2bae2557151d16
3e76f2c6dbb24970411883567e32c8a0f4e4c142
F20101114_AADTZG martinez_a_Page_029.tif
d91e36c152954a77f1f515fa702a8841
ce07b5deada9e23a111947f2f6acc859213e751c
F20101114_AADUDY martinez_a_Page_156.tif
0a647e4f3aa039349b8e207393111c0a
1bf0b6663dcaa60fcc32500496227f2f0cf5a07c
F20101114_AADTYS martinez_a_Page_015.tif
103bbb94641b3af010e480042760a3b5
5f471dd8660c24f4875c1c127e644193c0192e02
63464 F20101114_AADUFB martinez_a_Page_010.pro
12ec600b70f581cebcb36c5d61a44fb6
790efa3201b550b695b4a8730d5696b9bc1897a0
F20101114_AADUEN martinez_a_Page_171.tif
6ed9b1dd17b4736c3569b721d80a0064
3abe5eb550b8f66e6caa52a2220a37d027a98bb5
F20101114_AADTZH martinez_a_Page_030.tif
24821b5e251bb00643b4ed947fc99e6e
169aa42dec086ba765ee968493fd6faff512335a
F20101114_AADTYT martinez_a_Page_016.tif
9aebd2527ca734fc921f4b1b090ae0c1
fa600c4f8dcd839bdd51e267785d8b350eeb1136
36660 F20101114_AADUFC martinez_a_Page_011.pro
b862fbaa3f96f91f630ac4082696dac5
d0d499afa7850dac9838370e9da626f284c7fca7
F20101114_AADUEO martinez_a_Page_172.tif
fa3b2f740e6f754ada3ecb7337000a7a
ae16719f27d23148bd4ff2c813a86ddaec830f61
F20101114_AADTZI martinez_a_Page_031.tif
5c639ccad5170747cc4bdbff6e7ecfa0
1549389a899327612f1030852618ad37445833bc
F20101114_AADUDZ martinez_a_Page_157.tif
360ea213136754ff6ca943940aba541c
fa2235614088a982397f6f89ccbdfd89d0e5d0fc
F20101114_AADTYU martinez_a_Page_017.tif
6d97b98c0b6c8aa5ab4f21bc301cc2a2
7da27133cb44e1d0aada2f1e83883a739ba1d404
21183 F20101114_AADUFD martinez_a_Page_012.pro
3e6d8d1028b6d7e0e09a72a8678b5018
4aca97b0c093b51b8383e3ee9431dced216a71c1
F20101114_AADUEP martinez_a_Page_173.tif
34c47f7f57258b343efb445b885fe47f
4047dc6ca747a8524af6aca8609e86dd51ab5281
F20101114_AADTZJ martinez_a_Page_032.tif
9bf191554400015ef0ab5252a8c7b0b1
a19900934338c447f526788f45e02fbca42cbb0a
F20101114_AADTYV martinez_a_Page_018.tif
1b0c59a830124df135a14ae896a4dc79
48ff1bd729878ae450a4412ecccbdcd770425ee7
5430 F20101114_AADUFE martinez_a_Page_013.pro
5ccc4ebb8429c77fdc047eb2a174342a
e40bc6f52e0079f2463029f5f210dfd1d60bbd61
F20101114_AADUEQ martinez_a_Page_174.tif
074449bc788487a1c810b54d2ac98f2a
f82d88b69fcdb5ca162ed1a9bd0a1fc244fa6b87
F20101114_AADTZK martinez_a_Page_033.tif
3920a5d15a8cb6172355ab9228a60325
79514b636069aeb7256cd95c8ee57fb3849e4ff9
F20101114_AADTYW martinez_a_Page_019.tif
918ac44cc313a4336162babbefc9ae61
3a9cdb23dc37a6bcffbc4e5eb5d740fb6a092d17
41077 F20101114_AADUFF martinez_a_Page_014.pro
c217e2cef6dc988acfabd49daa74f9f1
ea0a9ddf45390bb8974470b70d137e8643597f2b
F20101114_AADUER martinez_a_Page_175.tif
cf0e5ce644461ea8e3ae85957719de96
a796eabaf974b302fe8e3b069d5784b4bac8dd94
F20101114_AADTZL martinez_a_Page_034.tif
2b1dfca42ed329310dd5997b160638bc
444a2463bfcd8d305bb17b170c9f3022df6be776
F20101114_AADTYX martinez_a_Page_020.tif
a128bf8fd7f2039c9512ab83001bbf32
d51581f5fa5a447160793871d1a789841970ed67
47967 F20101114_AADUFG martinez_a_Page_015.pro
7c52880fae697455ffb8ce8408ec66d2
006e6cc91f57e908b68c3448e6e812ab6e18aa4b
8073 F20101114_AADUES martinez_a_Page_001.pro
212fa034f38b21d5deb78f61e2d5b688
9322b0f37b1688a612b784f93f9616a62e98d57a
F20101114_AADTZM martinez_a_Page_035.tif
799263836f9af7b32bc3317cb12df0a0
add24e27dc8c251ab8f56ca36c228ea96badb7f0
F20101114_AADTYY martinez_a_Page_021.tif
df65e8426ee00c9d8d6f4355c28eb1fa
b355c6f6809cebd4a57103564bee82fa6289c67c
52091 F20101114_AADUFH martinez_a_Page_016.pro
c78476a45b3f8b2d7b85daf421147711
2d16cb7e660a32c66b4965540973e76ed328ae0f
935 F20101114_AADUET martinez_a_Page_002.pro
93b87d2c86b9d30ace405933676f0901
61631f554657263a1ef6c868ab806cec8e7d09da
F20101114_AADTZN martinez_a_Page_036.tif
f7e77b183bee5427259a4df36f04dd5f
24e7f8aec57019f784fa4adf420d591a01a5a0de
F20101114_AADTYZ martinez_a_Page_022.tif
42dcd4b70bfa5e93f69aa22f3fb36b7d
00acb3dda7d2403f719955ce841b92c38ff0af34
53441 F20101114_AADUFI martinez_a_Page_017.pro
8fffcb84632641838a6f2334226ab6d5
02384da32a3df443150e89e28bb03fe65e2c44c4
624 F20101114_AADUEU martinez_a_Page_003.pro
34dd8c2c5319f0d4779da06ba7445f0d
409eb27f3de95e75305e3e1521d7ef9660d4fec0
F20101114_AADTZO martinez_a_Page_037.tif
868e8eefba8ae80f0c6f632c4bc9675f
bda9dd46de66d0d1ad38ed3805059069b0ada4b2
47358 F20101114_AADUFJ martinez_a_Page_018.pro
31d94fc34591fdc0ddb3313bf60dbb6f
54d8ecaef7f2effeb548fd5eb77f85735c5e7ebc
16849 F20101114_AADUEV martinez_a_Page_004.pro
06dc04a1249c61add5df1953e71fdc11
e376a0882b18570f538e378c42806f4526b84b19
F20101114_AADTZP martinez_a_Page_038.tif
fa99caaf5c87931f55bb01b7d86c6382
75bca98971f546f81d6cc6c813cf74fdcc4e4897
52830 F20101114_AADUFK martinez_a_Page_019.pro
f5ee7c797cae140ec90480756e67d94b
87ac976b58889df01ceef61090fdd1ee0a165fac
72821 F20101114_AADUEW martinez_a_Page_005.pro
afda21d275bc3a88b7cb198574a10d7b
d948013bde6c948aaabcacc6b07df51866f255cf
22497 F20101114_AADUFL martinez_a_Page_020.pro
b7058e5789d3a6ad227484ccf35a7adf
875aca17477ef2ce6d384136db07f219a73f9675
96295 F20101114_AADUEX martinez_a_Page_006.pro
a41466dbd03e113796cb32cee1b75e77
08652fd9dfa34c3c9499d58868b0ced96d1d0351
F20101114_AADTZQ martinez_a_Page_040.tif
e4621546c091ffb8b40d2e34a9bbcb17
15871b89e17e3d8cbba5312f245e58928c8aa40f
51111 F20101114_AADUFM martinez_a_Page_021.pro
3633d87eb0da6b39143105b69ed5b9e9
ac96465b51ed882ba34335b147abc1a324d363a6
77122 F20101114_AADUEY martinez_a_Page_007.pro
6747eae5abd1aa049133c9d92d17eb1e
7ee851e5d227fbb9df4e3d12910dd9919768cc61
F20101114_AADTZR martinez_a_Page_041.tif
b6947ea1690cdfab9f13783c88117778
991c09798d09532b7aaa53b7f57969c1161bea26
51061 F20101114_AADUGA martinez_a_Page_035.pro
a49ae96b65f8608efb6cedc95c24c181
83bb43806c60b31cb6c970944c741bae9ad170cb
56206 F20101114_AADUFN martinez_a_Page_022.pro
c9319b410ddba310452d2763244c75e2
8045cf24981e069ec9e58e35f9d490b541c1d09c
63145 F20101114_AADUEZ martinez_a_Page_008.pro
02736d50d915da579f7939d283e5dfeb
f05d7d4517373fd93076e28843cc24ac6eedfeda
F20101114_AADTZS martinez_a_Page_042.tif
10e2e85740e43ba09e483faeb08a769e
4da7ebc663fe211ec7edc9b7bdc5beeb5af1d8d3
44113 F20101114_AADUGB martinez_a_Page_036.pro
a2c9cc80ee21c59ac1059c97a8707595
cdab18a2c9c325f80a50a4b7a20d0129f3a8854c
54972 F20101114_AADUFO martinez_a_Page_023.pro
db59297ccf8d6f0eb51381ec064cdc12
acb45340e98bfaf66a4d66b62779c45b4b2ffdb3
F20101114_AADTZT martinez_a_Page_043.tif
55a0274a00bc3d55592db80caed03424
7e95b3dbc9551ea717678b53404e48650b441783
48760 F20101114_AADUGC martinez_a_Page_037.pro
3123ae9d750877b414dc496c59d7f8ac
813b885d173188fef2b1eac6eee665488c566f7e
24225 F20101114_AADUFP martinez_a_Page_024.pro
446776c66ffe9f7efe02fc30482d957b
230bde981192c19d07b0de5ecc1e710b5314dbd4
F20101114_AADTZU martinez_a_Page_044.tif
b1d19f988953a20f2d902f4b3214d756
606b8125ac7cd14e0d5a892823ac16b71a113c30
40885 F20101114_AADUGD martinez_a_Page_038.pro
420f581e86a3d3e4c2aaebb0d32b25ce
0922e463fc674a3d80135a7524785603e21f17aa
48963 F20101114_AADUFQ martinez_a_Page_025.pro
4d25ac95668807569a55deea9a381aa0
25e16131f82626096b6b60a2594fd344b38e8f06
F20101114_AADTZV martinez_a_Page_045.tif
aee9f910262c23f98f44509fb9db41fc
47de51049643ec31de8b59683f2a36c7c18eebb2
36364 F20101114_AADUGE martinez_a_Page_039.pro
b5fd331ea444783bbcf94128835647f7
f2d23a9486cf1a8dc8d6eb0dcc18feceaa66e728
53952 F20101114_AADUFR martinez_a_Page_026.pro
32ef935019fb30f369d82f76dbf1188c
be600f5a83b3b1876abec592c13b13f14fa87ec2
F20101114_AADTZW martinez_a_Page_046.tif
c47f086fb9c431a99adfd463ee1784dd
06b2616602903af0a2fc9b75263bae04ea2c67b6
44289 F20101114_AADUGF martinez_a_Page_040.pro
bc7c5b2d8d4e3fe873a99a79db6678df
c370eb23eebcf5b6aac61c717dda9597c57f19da
50123 F20101114_AADUFS martinez_a_Page_027.pro
0a0b271b4bc418826949199768cfae91
ca8e5006ca9a74238da6565adee80241e1d8a96a
F20101114_AADTZX martinez_a_Page_047.tif
95fbc79924382186cd34239bc4e6e4d8
e44a03c543ddda70bfab41df3772aff7db0c6f64
54327 F20101114_AADUGG martinez_a_Page_041.pro
0be318a268e2e34d4c6b27d996d37409
5598d802c390754735cc47556058559f9ff8efc0
52437 F20101114_AADUFT martinez_a_Page_028.pro
1a2332fe0e147d76ca619e3e870c6e50
60eada11b77850fa6acc61e0188f3a8e26b65b77
F20101114_AADTZY martinez_a_Page_048.tif
2d12607cdd38f2c5436ef52ac9d9d948
97a380199480842e36e9b0f9c0c98758aa7ec2d1
43077 F20101114_AADUGH martinez_a_Page_042.pro
b09e8ac62d26a8d659e4b2f89bce94ed
6afff5d5a02288d96a3f2ef89d11f8634bcc285a
54716 F20101114_AADUFU martinez_a_Page_029.pro
771ca079bd886adaa994bbfd26b0348b
3b8964a53777a8ec3df1d4aa4209faf38110a271
F20101114_AADTZZ martinez_a_Page_049.tif
af7352d8e05195055511d6a4ca8c77e8
619e0b7d459f2975b60e691f8b782f97c85495f7
51011 F20101114_AADUGI martinez_a_Page_043.pro
b7db4832d889ba1aea23333931e5643c
37a1b86f547b7af9b3c61d16034f125caf311105
49494 F20101114_AADUFV martinez_a_Page_030.pro
7d5bdd38a82d11b87f7f39aa82fcd145
855bf52f9debf37b1ef702d3461dfb886cf9d045
45644 F20101114_AADUGJ martinez_a_Page_044.pro
42787ae9e53eae4a335dee424ad1dcf9
a37aed7aec4983bd14259439f7c9f7580b3f7902
52496 F20101114_AADUFW martinez_a_Page_031.pro
ee8187c379dc61dc59723e229ea2b917
0b8a69b6949ef2ec2c5b407062f565ced4a0d3ca
50596 F20101114_AADUGK martinez_a_Page_045.pro
c8626f1af05c6f3e8fae3efc4d8315f8
68dbbdc2b688912b0fac9b3999b2b50f5e4bacc2
58266 F20101114_AADUFX martinez_a_Page_032.pro
6a73714076998b1af9d2681a4aacad26
d530590d4bd54e494e4c8b6f6b8427a287207481
26302 F20101114_AADUGL martinez_a_Page_046.pro
1d37c777ed50040ae7b127aa22148555
f5b9b380417efe8d61043f9cac81edb9cf7b368b
12474 F20101114_AADUFY martinez_a_Page_033.pro
a3786d4880869fe3b99d48640a59fe63
a904b78407d87eaba4eb07a4b5a4a616e5e49b77
49827 F20101114_AADUHA martinez_a_Page_065.pro
0c70d8933bbb0e43ec7bd300315cbe93
407b550c6678a2a057b1de6e8a2b122e14d62e95
50523 F20101114_AADUGM martinez_a_Page_047.pro
9b589c62aa4d0c40456553c1273234f5
9be1984131c86c4da17d7addbb29ce551d551931
49047 F20101114_AADUFZ martinez_a_Page_034.pro
6c1a5486941a17404816d024449d3fd3
a448501774d1a3ec2604cfed4bc147cef15e86cc
39920 F20101114_AADUHB martinez_a_Page_067.pro
125fb4450ad8d078ccd3b95bbf07fa86
f8576a565073cda846903bce49aa969b55b0e239
63509 F20101114_AADUGN martinez_a_Page_048.pro
b97a10312e0dd2e8ff8c3c61e43e80a2
694e6b643210e1fefc6e53bedbad2d80bc88f904
28264 F20101114_AADUHC martinez_a_Page_068.pro
5d089f8af908b2760e2504dd32f80e69
3ce0b04739b657c55ebd57c46c3f82ba31fe6ad2
57310 F20101114_AADUGO martinez_a_Page_049.pro
ec24e853d4363bf253be4e523c810892
068e366e4995d36cc1c0beed1e0e8ce7b28003f1
48473 F20101114_AADUHD martinez_a_Page_069.pro
988b7b543a2fe6bb4d592dd641b9be93
571b948e788d0829b6478a1671a1ddd6f098994c
58489 F20101114_AADUGP martinez_a_Page_050.pro
f7726e51ec6a98f013a9541cfbf81f81
1461e5084613223feb23bca7bf558bb54143af69
52430 F20101114_AADUHE martinez_a_Page_070.pro
9c6804d0a347dea63a382bd44f72defa
aa505d081f30fa30a5b9958736fbad2d3d3d2c22
26720 F20101114_AADUGQ martinez_a_Page_051.pro
9f8f3798ffc5a8814c3fa89be7abfb50
042895172b403b46b22358ee6c83226f0d1d23f2
86103 F20101114_AADUHF martinez_a_Page_071.pro
4cccfc0a893c20360342d207b5a41b84
7fbe3c4ac682a3e33334508ab72f6edbba0b2588
50623 F20101114_AADUGR martinez_a_Page_053.pro
817e2420fa3e4f7360e12592895a9448
591ccccf3e93a7b1443d57a27de45e0b454f88f7
61246 F20101114_AADUHG martinez_a_Page_072.pro
785b8ec6b60f80b9746396f234192c7f
ff8ca596237df65d09d2f0dff52c982446608fbe
19667 F20101114_AADUGS martinez_a_Page_054.pro
c54bdc595bc3aeaa53f03b2bfc9c6fcf
77a8eb661b704d878bcb63540cfb2b5ca8874691
54537 F20101114_AADUHH martinez_a_Page_073.pro
b360bd8c19068fd871c86d335e3d291e
e6d57bad13015872d9bef9c5de4ae5495adebed7
39947 F20101114_AADUGT martinez_a_Page_055.pro
e44ea6e41251171716bfdb42084df3e6
96d2985e64727af7bfff8332a21e26fce83c7be3
52839 F20101114_AADUHI martinez_a_Page_074.pro
b27f4f26e48103fbe4dd743d0429fd14
24a95e667f5048580a9b18ee3896c3ef5848c4e1
53019 F20101114_AADUGU martinez_a_Page_056.pro
50b566bf374df0a21cc51950f8302785
56c4fe557b5655e0dcf9626eba4c500c25d99470
44993 F20101114_AADUHJ martinez_a_Page_075.pro
8aca389ef5aa0d0278d2dc5b38a8076f
087cd64534ef85ccd0c41d778f498305cd28f4f2
46176 F20101114_AADUGV martinez_a_Page_057.pro
901d6be2a0cc55469f273a25b7e33af9
a4fb643cb290a1b95dee675a77ce97ae92c5339c
44653 F20101114_AADUHK martinez_a_Page_076.pro
689da066c6f69d4977b185f5bd4e48a9
a3c11cd9978490a09ccecceb672dbe7b87efbdde
27849 F20101114_AADUGW martinez_a_Page_059.pro
a101a8148d91f3419467f57f2356c811
70b35e2dd815f56a7fe5284591f9ca8b6f59c39f
52213 F20101114_AADUHL martinez_a_Page_077.pro
7be91ba8ede281b105f6a462f339cc83
5e431822e9ca293edc292e9009fd1188b2078c31
34007 F20101114_AADUGX martinez_a_Page_060.pro
4daacc755ba84dc7c62b013f5e995114
b0d58f19dd704e1691edaceea3d5a611a82fb8e4
14544 F20101114_AADUIA martinez_a_Page_092.pro
20d379fc425332c5a978b19b35b49c06
440fd03e1224069d087f376ad0aabef4c14a00e4
51822 F20101114_AADUHM martinez_a_Page_078.pro
33a228d1f38773c15f3b29e7334a0855
46e1fe8e4653217e32694e470a0af2966b2743b6
57109 F20101114_AADUGY martinez_a_Page_062.pro
9aefe28478c6e2d624a82c92e2dc329f
512a7c3727190bd388e4863ef6a8f02c1d8d4cc3
14849 F20101114_AADUIB martinez_a_Page_093.pro
21305bc9847cdee1ff65dfcd90104c7b
b27ed57023a70edcff2ab86e0d26bce6a128e66f
52812 F20101114_AADUHN martinez_a_Page_079.pro
4f122ee8a86349e98b7a61875afef0d5
de4b64bd000a938a3f4ff3ec045ccf64718d13c4
51500 F20101114_AADUGZ martinez_a_Page_064.pro
066996db61cc9833ba9709113cb71359
d8c1bc2cef20ba19e57bea809a69919b51120f19
14852 F20101114_AADUIC martinez_a_Page_094.pro
d8ebd83061baa6c05ce64fd077eab252
0c12a160f2a62e3809b1ea22eefde47385edc7f0
54536 F20101114_AADUHO martinez_a_Page_080.pro
839c8800d30ef1126d7e4ccf9612ebdc
82d60f4a827bedd56e0c1416199666900c7491b2
15160 F20101114_AADUID martinez_a_Page_095.pro
4ed424ccd121c258f673794b16d742e9
3601aa5dfc80a91b0b0ef8eaa141dee0b61b2580
52630 F20101114_AADUHP martinez_a_Page_081.pro
8110d7fa0873d99b21b4220eee972441
ad01dfc20ada3935fccbb521bb5054e73a268eca
19949 F20101114_AADUIE martinez_a_Page_096.pro
1e79a31f173c223ac43487523baea536
da5f954f7c86122c4d63a20025400a9d7b044f05
35418 F20101114_AADUHQ martinez_a_Page_082.pro
5269870caacc5786d8b84edea2c8ddf1
b381727db9f7e24a0b120de775083b86cddd458a
43148 F20101114_AADUIF martinez_a_Page_097.pro
f26b675a9c3214f5f8fb718c050e4d69
f0f5ddf451049510c1edc1fb6cc1b3eeed19e981
37276 F20101114_AADUHR martinez_a_Page_083.pro
62ecab8afdadfe34f9c21e5b5bba68d3
a332e03de53d2b1516a0738eb8f89f64ccb9f3d3
56325 F20101114_AADUIG martinez_a_Page_098.pro
2fef14fbae17bd2f54d1fe8527d906fb
28928a2b4f6d182770e63971344e84e9c40ca5e7
56563 F20101114_AADUHS martinez_a_Page_084.pro
ba91135e64956916f45417531021de55
ca38fcdbf566a8388b47e04f4bb10050e236f462
41986 F20101114_AADUIH martinez_a_Page_099.pro
d40ca15e82af3432d66728aed4aeb800
8a36167ded90167f12951d567be422b3cbd2c086
50438 F20101114_AADUHT martinez_a_Page_085.pro
7ce759538601f393a42793c4e15e61c4
88ea442ca50085765238d7f3c795e727f1379d28
28802 F20101114_AADUII martinez_a_Page_100.pro
1ef3585bbdcf07068f7f5f622c6fb381
fcda2d8bb8a11d34bf9ffbd878a6f72a0b2891d0
53508 F20101114_AADUHU martinez_a_Page_086.pro
e2d6e0e14d26f46705ada04107e7632a
575dd511a29898b15d9dc4ea823fbd68b0584962
22341 F20101114_AADUIJ martinez_a_Page_101.pro
6598c85d367ed6d0d1ed8e3cfaec7682
57b5a4568b41989138026ecbaeaabfb47d469e25
54934 F20101114_AADUHV martinez_a_Page_087.pro
8a2e4539109a0a7fedbb122e89c1c8fa
370b073946f0cf6a4e21df0a380cd0a20548b181
57966 F20101114_AADUIK martinez_a_Page_102.pro
6a48085c0bb727ac1124e7e494eb9afd
ebe3fa4461126ca1103dd9c4f2c6008bb5607bae
53484 F20101114_AADUHW martinez_a_Page_088.pro
0ee61cb8d62b3076c14628d2695b3826
f41fc16e47b4e18780c7b7fce0dd29b37084d09e
36869 F20101114_AADUIL martinez_a_Page_103.pro
9522650f77a6be737504aced3506bc46
532915cc1a319fcb161ca30f84a83487d7b6c289
54799 F20101114_AADUHX martinez_a_Page_089.pro
8b4c1d777148c56a86c48bfb2bab3aeb
7b9823815cf3c57d078692a3561c1fadb23a7917
56047 F20101114_AADUIM martinez_a_Page_104.pro
ec4d32a3ff6470ecff4bd314c36fd049
f3f2c9e8ea3e046ae6b7edf2a0d2be312467144c
14550 F20101114_AADUHY martinez_a_Page_090.pro
f7ffce0f915f5d3237b5fca2621680e4
ac7993e388f1969343b5c38a34c31f9bdbaf30a0
32843 F20101114_AADUJA martinez_a_Page_119.pro
c51bc9e0bd89a9b50da10fd622d52751
6e19482ddb2d7fe2e7b0ecffaad9c5269fd74a3b
14711 F20101114_AADUHZ martinez_a_Page_091.pro
179c7ffaf0a525e9889e3c8d663cb621
558672b5765338f17aae75b088f4a58e9684eeb0
48933 F20101114_AADUJB martinez_a_Page_121.pro
33761a1c92d6d980577413e774594e4e
39fa1eff6e4429096ac5b259a7d0f958e3f49e45
50975 F20101114_AADUIN martinez_a_Page_105.pro
6b185ec627a9a81dcc74fe4a80ed2270
c9fdabcd072b103ada1ad3070206e97b2a8dcf93
9285 F20101114_AADUJC martinez_a_Page_123.pro
e62a59004ba52e37f145c779c59f8e0d
9138b80b326477a86bad31b6bea7cc91d5f288b9
57941 F20101114_AADUIO martinez_a_Page_106.pro
1ce6c5ca8bbb4cdec8a037338552bbd1
718687b5630279b3d70d1aa78b5bc88ce26c37c3
50342 F20101114_AADUJD martinez_a_Page_124.pro
b765148c0d97a53d3613fb9c2d04dafe
ecebd68b3e170a1b348b161100fa234c8971e893
44069 F20101114_AADUIP martinez_a_Page_107.pro
d0d0e79a0355f6a569f37f2cce0b5b93
8a59d3ca19c2afacf977aefa213dbe3860cd561e
57572 F20101114_AADUJE martinez_a_Page_125.pro
9fbaf33e48fb443f33da1250f1f25b26
5ca6210f84c43dfadf0681729f70e809931c340b
52761 F20101114_AADUIQ martinez_a_Page_108.pro
6bcca2d3bc04b183abbc3a2ad05755f4
b12b1d7ddc24271fba7ed8682d481ab895468dea
56796 F20101114_AADUJF martinez_a_Page_126.pro
3cf89e20e7c80b82fe9706a37d700517
8e73a68b863489f289636fd61b0f80df6ef523db
50088 F20101114_AADUIR martinez_a_Page_109.pro
fc92f642a63a4020be82af1f95f85699
09ea6d60c78c959b79ce0e223f6fef4fbc8534b2
26320 F20101114_AADUJG martinez_a_Page_127.pro
2b8055044acb9d6ba3b1ca4396c65848
1d1d48ff87ff50a8be4f033ebac43c261e3e1501
44886 F20101114_AADUIS martinez_a_Page_111.pro
7b81df3744c977ca602ab526bfd71618
6857d467ecdf477f9295b9465fa2064ff1b3b47c
F20101114_AADUJH martinez_a_Page_128.pro
826360a75855497d62eb080b6cd78fd8
38211cb72bd1eddc825f15381997d3043687b1f3
22925 F20101114_AADUIT martinez_a_Page_112.pro
559b94eadef2efac14c0b40650cc0505
f5a5f26332307a89663696b63ecf644cd3d2fbae
27934 F20101114_AADUJI martinez_a_Page_129.pro
43290acf728d5b32a295ed25a00a220b
2ada605a1a625ed7f6cd8c226b331b34afc96cd2
42266 F20101114_AADUIU martinez_a_Page_113.pro
a4b3a479ebd4fde0e2a368a10bf43508
42f9e2e1d82c3d572a6f40171d3700f63ec2a600
32274 F20101114_AADUJJ martinez_a_Page_130.pro
4bf228b5b3ac349ce66c375aecba1e44
d5bf5e770607d7b226e2d4ac47b8d85d0d367c91
20778 F20101114_AADUIV martinez_a_Page_114.pro
57be4d5501e2bbf4b51be3c916f3bd7b
236ba37a47f1a8b935d460d42f7e181e39d02a42
21883 F20101114_AADUJK martinez_a_Page_131.pro
c7d08ddba5d383852a6fc8a570d84776
1d81ca8e47b12fb0ccb4ba666fe7bf22821f4485
50984 F20101114_AADUIW martinez_a_Page_115.pro
a2dcdabd510dcf6b6b3ec1340f8bd0f3
16f2b23eca592c1f591da560294c112247918b52
12965 F20101114_AADUJL martinez_a_Page_132.pro
b3691fe1da5dd8380ec1d67b3edc1cd5
e5e26d8216d688b04074c41a57c158e6f2cdabee
53818 F20101114_AADUIX martinez_a_Page_116.pro
7e5d66de1329ffc1ee259b2518fbb861
86a8c5b7faa88fea549420d3f08a27b7aa6a7ac1
77332 F20101114_AADUKA martinez_a_Page_148.pro
5c1043de7dea9df513573d2ca4114895
73135b18d81bcfa62840e38fc9f39a92d83aa8ad
14051 F20101114_AADUJM martinez_a_Page_133.pro
8df32d836769ceb814fd94ff91aed2b5
e7c78b14b8a0127a2d454b6b7a67f170c5a7f7ec
55430 F20101114_AADUIY martinez_a_Page_117.pro
984bb2a99e7caa8b968d7db11bd98c88
3f6b0322abdc119098e37a0a7d63582a2d18eea9
16153 F20101114_AADUKB martinez_a_Page_149.pro
11e511f18c91b9098562c16525ee3890
5e9c637accc3856ce1c52b47eb4a272ea2d8f510
30010 F20101114_AADUJN martinez_a_Page_135.pro
9778c0fdef2cddbdebf5c527ca9d1b7b
f46221e73199a9dfc308b2dae0864917569430a9
53977 F20101114_AADUIZ martinez_a_Page_118.pro
b266ed44f475b2c389ac9b6b1bf93f9e
a89471fdc9fc2f5e7bf26ddf60f3f15b5688d265
73765 F20101114_AADUKC martinez_a_Page_150.pro
6999fe39d82a4d0d139fc3e7243a8937
3868b9d41496f458811a717c0c2d7894f7a8bad9
14106 F20101114_AADUJO martinez_a_Page_136.pro
2d3b526a8340b46fccd3ed041b7389a3
20131f299d3c848ab602a8e28944add4024387b1
65862 F20101114_AADUKD martinez_a_Page_151.pro
a2c542a6b7a2093042b8fff2f9d46db5
58ef6ca965b1eef26708bf24befab82dac050ab8
35143 F20101114_AADUJP martinez_a_Page_137.pro
ade30eca586cea05a489f844169da831
cd2ed0d2ad21b35a0802efe157eb5a8910163ff1
65308 F20101114_AADUKE martinez_a_Page_152.pro
49445b57f2526dfcb088c5ae7d6698b7
baa56ccab35198079e50b2b6c7a15fa42a3822e8
33959 F20101114_AADUJQ martinez_a_Page_138.pro
5706f9c1312a8abd8820d22ef57d568b
a6691931ec8b68b66e9605a26b2bb5923c0eaabe
10045 F20101114_AADUKF martinez_a_Page_153.pro
b0569895a11e6c97976943538931e1f3
d4c01de52aba35bb3565c5e23d8bce0edf850d67
9872 F20101114_AADUJR martinez_a_Page_139.pro
5ec4adc78a173f143941d63fd885a4f8
f59adc73e3b4ca3be1f96818908d6fa0d8b7e17a
46781 F20101114_AADUKG martinez_a_Page_155.pro
003d3fc0d21fd0d0dc393e73d2018c2b
95089d377c1821f5c4a64d357583889a244b41f8
61558 F20101114_AADUJS martinez_a_Page_140.pro
f15dbabfda124833fd13ca7f61c95872
488056f4a2328df75ce4e6b3252dcf9f165494c6
61379 F20101114_AADUKH martinez_a_Page_156.pro
5d7188ccc5400c2a9d534170ab14b164
1e0650f453890117de0f6ef5af3db90764c3267d
75093 F20101114_AADUJT martinez_a_Page_141.pro
a9025ebc36300f2e4506d1d7d662c2d0
155a0b07e42e5a4b025f543b4e32c0a8ed2a19cf
46956 F20101114_AADUKI martinez_a_Page_157.pro
708b127fc770461d4bce799bf94f0025
a6c64ba9fad403420a01427766c95196daa13497
20956 F20101114_AADUJU martinez_a_Page_142.pro
fd35e61fe88731e14021098740e7e905
215e96d1275c273b870d62d38ef7c943e20c809b
44536 F20101114_AADUKJ martinez_a_Page_158.pro
56095f7584230466ffa843374da45a15
31ae19c88bc38127cedce4d98d925ec2a8ff649a
14224 F20101114_AADUJV martinez_a_Page_143.pro
9ae28292c6f426c2bc69887afe3b0a83
da3f3b1688263e2f31355522f6e0a255885c1ab8
37263 F20101114_AADUKK martinez_a_Page_159.pro
d2c99dbe8838a77f36d730b4ea91b7c4
cb04143a3eca426cf2dafa0626a3aacfa392ac40
61257 F20101114_AADUJW martinez_a_Page_144.pro
d958e5530d0e67996f7f1e040612e6a3
0f7961f6f155239dd413ecf07a3dcd03b92ae6df
35221 F20101114_AADUKL martinez_a_Page_160.pro
de741648702eb50e5b84cc7f680dbfed
62553e3fcb76ebbbc6f983584c40a51fb38754d7
23157 F20101114_AADUJX martinez_a_Page_145.pro
e892553d05329d7da6d177beff81eb54
eea5a3c46f39451fd28bbcf7e95f110b588e88e3
28924 F20101114_AADULA martinez_a_Page_175.pro
a196c2b8bf5b8804326f5ebb118bdfed
dedbb91510601477233782e904a9c6dbc21774a9
42199 F20101114_AADUKM martinez_a_Page_161.pro
a56a632014cd656eaa7408637e00f1a9
fd74dafcaad03132d45423ac25335d6ab2ec70dd
68433 F20101114_AADUJY martinez_a_Page_146.pro
fa3e8b4636d2223e20bf23e44afdf32a
6ca8c6834a0292b44ac6f3a893e8123d5ae68e8a
12544 F20101114_AADUKN martinez_a_Page_162.pro
adc0f1d6fed534e13f4d596435c29fe9
e16f2093d87f61f0898c0fb7428dc9d97b2b6182
25118 F20101114_AADUJZ martinez_a_Page_147.pro
cdd57af26719246d6528c02afc962e83
4ab636495941cc40672a5c283e8133005a61a666
481 F20101114_AADULB martinez_a_Page_001.txt
5c5920dd325fff282e9cf9d4e94e6185
88918127fe8555e1e74104989cfd1fef14b13d1a
29636 F20101114_AADUKO martinez_a_Page_163.pro
545776dae937503932878675e18f08fd
91ef7136833352b12e494554b2abf3e05f4a3bce
98 F20101114_AADULC martinez_a_Page_002.txt
0bab9d2672af9b1baeecaa1387cd4b38
ec5367d1a374270af4f63aed3dfe6f7938d93aa2
52100 F20101114_AADUKP martinez_a_Page_164.pro
cc20aaead8b65e0bdb29a42b16817998
ab8189b43481ab8daf23ca4b6ad995a52134fb9a
80 F20101114_AADULD martinez_a_Page_003.txt
dc77786b5c1717680485a503fcd825e0
a72f5bf28ffda001d0ca8e46418fa440af4a57c9







BIODQ: A MODEL FOR DATA QUALITY ESTIMATION AND MANAGEMENT IN
BIOLOGICAL DATABASES



















By

ALEXANDRA MARTINEZ


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007





































O 2007 Alexandra Martinez


































To my parents









ACKNOWLEDGMENTS

I thank God for guiding my life path. I thank my parents for their unconditional support

and love throughout all my life, my husband for his love, patience, and help during my doctoral

studies, and my daughter for being my inspiration. I specially thank the chair, co-chair, and

members of my supervisory committee for their valuable mentoring. I also thank the participants

of our research study; and our collaborators from the University of Florida Interdisciplinary

Center for Biotechnology Research, the University of Florida Health Science Center Libraries,

and the National Center for Biotechnology Information.












TABLE OF CONTENTS



page

ACKNOWLEDGMENTS .............. ...............4.....


LIST OF TABLES ............ ...... .__. ...............8....


LIST OF FIGURES .............. ...............10....


LI ST OF AB BREVIAT IONS ........._.___..... .___ ............... 12....


AB S TRAC T ............._. .......... ..............._ 14...


CHAPTER


1 INTRODUCTION ................. ...............15.......... ......


1.1 The Need for Quality Information in Biological Data Sources ................. ................. .16
1.1.1 Biological Data Sources Currently do not Manage Quality Well .........................16
1.1.1.1 Shortage of quality information .............. .....................16
1.1.1.2 High data generation to curation ratio ................. ............... 16...........
1.1.1.3 Lack of quality-driven interfaces .............. .....................17
1.1.2 Benefits of Adding Quality Information ................. .............. ................18
1.2 Motivating Example .............. ...............18....
1.3 Our Contributions ................ ...............19........... ....


2 LITERATURE REVIEW .............. ...............21....


2.1 General Works on Data Quality ................ ..... ........... .........2
2.2 Work on Data Quality for Cooperative Information Systems ................ ................ ...22
2.3 Work on Biological Data Quality ................. ......... ...............22. ..
2.4 Other Quality-Related Work ................. ...............23................
2.5 Work on Semi strctured Data .............. ...............24....


3 QUALITY ESTIMATION MODEL ................. ...............25........ .....

3.1 D efinitions .............. ...............25....

3.2 Quality Dimensions .............. ...............25....
3.2. 1 Per-Record Dimensions ................. ...............26......... ....
3.2.1.1 D ensity .............. ...............26....
3.2.1.2 Freshness .............. ...............27....
3.2.1.3 A ge .............. ...............28....
3.2.1.4 Stability .............. ...............28....
3.2.1.5 Uncertainty .............. ...............28....
3.2.2 Cross-Record Dimensions ................. ...............29.............
3.2.2. 1 Linkage ........._..... .......... ...............29...












3.2.2.2 Redundancy ............... ...............3 0...
3.2.3 Possible Extensions .............. ...............3 1....
3.2.3.1 Query pattern............... ...............3 1
3.2.3.2 Correctness ................. ...............3 1...........
3.3 Measures for the Quality Dimensions .............. ...............32....
3.3.1 Underlying Data Model ........._...... ...............32....__._. ..
3.3.2 M measures ................. ...............34......... ....
3.3.2. 1 Density measure .............. ...............3 4....
3.3.2.2 Freshness measure ........._...... ...............36........... ..
3.3.2.3 Age measure ............... ...............36....
3.3.2.4 Stability measure ............... ...............37....
3.3.2.5 Uncertainty measure............... ...............39
3.3.2.6 Linkage measure .............. ...............40....
3.3.2.7 Redundancy measure............... ...............41
3.3.3 Complexity Analysis of the Measures ................. ...............41......___....
3.3.3.1 Complexity of the per-record measures .........._.... .........................42
3.3.3.2 Complexity of the cross-record measures .............. ....................4
3.4 Quality-Aware Operations............... ...............4
3.4.1 Query Operations............... ... .................4
3.4.1.1 Selecting a node and returning its contents ..........._..__.. .... ..__............45
3.4.2 Maintenance Operations ........._..... ...._... ...............45....
3.4.2.1 Inserting a node .............. ...............46....
3.4.2.2 Deleting a node............... ...............47..
3.4.2.3 Updating a node .............. ... ...............47...
3.4.3 Complexity Analysis of the Operations .............. ...............48....
3.4.3.1 Complexity of the query operations ...._.._.._ ..... .._._. ......_.._........49
3.4.3.2 Complexity of the maintenance operations ....._____ ..... ... ..__............49


4 QUALITY MANAGEMENT ARCHITECTURE .............. ...............52....


4.1 Usage Scenario .............................. ............5
4.2 Overview of Reference Architecture ...._ ......_____ ..... ...............53
4.2. 1 Extemnal Data Source ............... ...............53....
4.2.2 Quality Metadata Engine ................. ...............54........ .....
4.2.2.1 Local cache............... ...............54.
4.2.2.2 Metadata source............... ...............55.
4.2.2.3 Quality layer ................. ...............55........... ....
4.3 Architecture Implementation ............... ...............57....
4.3.1 Data Model: XML and Schema............... ...............57.
4.3.1.1 Base XML format............... ...............58.
4.3.1.2 Modified XML format ................... ................ ...............5
4.3.2 Extemnal Data Source: the NCBI's Nucleotide and Protein Databases .................63
4.3.2. 1 Entrez Retrieval System and Interface ................. ....._._ ............... 63
4.3.2.2 FTP Interface............... ...............6
4.3.3 Local Cache: an XML Database............... ...............64
4.3.3.1 XML schema registration............... ..............6
4.3.3.2 Table creation ............... ...............65....












4.3.3.3 Cache replacement strategy ................. ...............65................
4.3.4 Metadata Source: a Relational Database .............. ...............66....
4.3.4.1 Relational schema .............. ...............67....
4.3.4.2 Index creation ........._.___..... .___ ...............70...
4.3.5 Quality Layer: Java Classes............... ............ ...............7
4.3.6 Macro Level Op erati ons Implemented in the QMA ................. ......................72
4.3.6. 1 Bulk-loading ................. ...............72........... ...
4.3.6.2 M maintenance .............. ...............74....
4.3.6.3 Querying ................. ...............75.................

5 EVALUATION .............. ...............76....


5.1 Obj ectives ........._._..... .... ....._._. .. ......._._. ...........7
5.2 Evaluation of the Quality Estimation Model .............. ...............76....
5.2.1 Challenges of the Evaluation ........._._._ ...._. ...............76..
5.2.2 Experiments and Results .............. ...............78....
5.2.2.1 Research study............... ...............78.
5.2.2.2 Experiment 1 .............. ...............8 0....
5.2.2.3 Experiment 2 .............. ...............86....
5.2.2.4 Experiment 3 .............. .. ............ ..... .........0
5.3 Evaluation of the Quality Management Architecture ....._.__._ ........___ ...............115
5.3.1 Relevant Capabilities of the QMA ........._._... .......___ ...............116.
5.3.1.1 Non-intrusive quality metadata augmentation ................. ................ ...116
5.3.1.2 Support for quality-aware queries ................. ...............116........... .
5.3.2 Operational Cost for the Prototype QMA System ................. ............ .........117
5.3.2.1 Cost of metadata retrieval ........._.._.. .....__. ...._.._ ............1
5.3.2.2 Cost of metadata computation............... ..............12

6 CONCLUSIONS AND FUTURE WORK ...._.._.._ ..... .._._. ...._.._ ..........12


6.1 Conclusions............... ..............12
6.2 Contributions .............. ...............125....
6.3 Future Work............... ...............126.


APPENDIX


A SURVEY QUESTIONNAIRE ........._.._.. ...._... ...............128...


B SURVEY RESPONSES ........._.._.. ...._... ...............134...


C ORIGINAL INSDSeq XML SCHEMA ................ ...............154........... ...


D INSDSEQ QM XML SCHEMA .............. ...............163....


LIST OF REFERENCES ................. ...............170................


BIOGRAPHICAL SKETCH ................. ...............175......... ......










LIST OF TABLES


Table page

3-1 Ambiguity codes for nucleotide sequences............... ...............3

3-3 Time complexity of the per-record measures .........___....... .....__ .........4

3-4 Time complexity of the cross-record measures ........... ..... ._ ........__........4

3-5 Time complexity of the quality-aware operations ........... ..... ._ ................5 1

5-1 Participants' assessment of the usefulness of the initial quality dimensions....................82

5-2 Comparison between the experts' quality assessment criteria and our quality
dim tensions .............. ...............83....

5-3 Wilcoxon rank sum test over the standardized scores for the high and low quality
sets of the Expert data set ................. ...............97...............

5-4 Classifier performance over the Expert data set using a 10-fold cross-validation ............97

5-5 Classifier' s prediction rate over the HQ and LQ sets of the Expert data set .................100

5-6 Effect of threshold value on the prediction rate and data selectivity over the Expert
data set .............. ...............103....

5-7 Most relevant dimensions for classifying the HQ and LQ sets of the Expert data set ....105

5-8 Comparison of the classification error obtained when using different sets of quality
dimensions over the Expert data set. ........... _......__ ...............107

5-9 Participants' quality rankings for six NCBI databases .............. ...............109....

5-10 Classifier performance over the GenBank-EST data set ..........._ .... ...__............111

5-11 Classifier performance over the RefSeq-EST data set. ......___ ..... .... ..............113

5-12 Per-record retrieval times for data and quality metadata ........._...... ......__.. ...........119

5-13 Per-record bulk-load times for data versus data and quality metadata ................... .........121

5-14 Per-record maintenance times for data versus data and quality metadata ................... ....122

B-1 Answers to Question 1, Part I of Survey Questionnaire .........._.... ............ ........._.134

B-2 Answers to Question 2, Part I of Survey Questionnaire.........._.._.._ ......._............137

B-3 Answers to Question 1, Part II of Survey Questionnaire ................. ......__ ............140











B-4 Answers to Question 2, Part II of Survey Questionnaire ................. .... ..__ ............143

B-5 Answers to Question 3, Part II of Survey Questionnaire ................. .......................144

B-6 Answers to Question 4, Part II of Survey Questionnaire .........__. ....... ._. ............145

B-7 Answers to Question 1, Part III of Survey Questionnaire .............. ....................14

B-8 Answers to Question 2, Part III of Survey Questionnaire .............. ....................14

B-9 Answers to Question 3, Part III of Survey Questionnaire .............. ....................15

B-10 Answers to Question 4, Part III of Survey Questionnaire .............. ....................15

B-11 Answers to Question 5, Part III of Survey Questionnaire .............. ....................15











LIST OF FIGURES


Figure page

3-1 Examples of semistructured representations of data ................. ...._.._ ................ .33

3-2 Illustration of an update operation ................. ...............46.._.... ...

4-1 Reference Quality Metadata Architecture .............. ...............54....

4-2 Example of a record represented in the INTSDSeq XML format............... .................5

4-3 Example of a record represented in the INTSDSeq_QM XML format ............... ...............60

4-4 Relational schema for the quality metadata in the Metadata Source .............. .... .........._..68

5-1 Assessment of the usefulness of the initial quality dimensions by the participants ..........82

5-2 Percent of participants whose criteria for quality assessment of genomic data
matched our chosen quality dimensions .............. ...............83....

5-3 Distribution of standardized scores for the Density dimension over the Expert data
set............... ...............90..

5-4 Distribution of standardized scores for the Features dimension over the Expert data
set............... ...............90..

5-5 Distribution of standardized scores for the Publications dimension over the Expert
data set .............. ...............91....

5-6 Distribution of standardized scores for the Freshness dimension over the Expert data
set ............... ...............9 1..

5-7 Distribution of standardized scores for the Age dimension over the Expert data set........92

5-8 Distribution of standardized scores for the Stability dimension over the Expert data
set............... ...............92..

5-9 Distribution of standardized scores for the Uncertainty dimension over the Expert
data set .............. ...............93....

5-10 Distribution of standardized scores for the Redundancy dimension over the Expert
data set .............. ...............93....

5-11 Distribution of standardized scores for the Literature-Links dimension over the
Expert data set............... ...............94..

5-12 Distribution of standardized scores for the Gene-Links dimension over the Expert
data set .............. ...............94....










5-13 Distribution of standardized scores for the Structure-Links dimension over the
Expert data set............... ...............95..

5-14 Distribution of standardized scores for the Other-Links dimension over the Expert
data set .............. ...............95....

5-15 Three-dimensional view of the Expert data set along selected dimensions. ................... ...96

5-16 Class-membership scores for each fold of the cross-validation over the Expert data
set............... ...............99..

5-17 ROC curves for each fold of the cross-validation over the Expert data set. HQ is set
as the positive class ........._._. ._......_.. ...............101...

5-18 Class-membership scores across all cross-validation folds over the Expert data set ......103

5-19 ROC curves of two classifiers over the GenBank-EST test set. HQ is set as the
positive cl ass ........... ..... .._ ...............111...

5-20 Three-dimensional view of the GenBank-EST test set along selected key dimensions..1 12

5-21 ROC curves of two classifiers over the RefSeq-EST test set ..........._............. .....1 13

5-22 Three-dimensional view of the RefSeq-EST test set along selected key dimensions .....1 14

5-23 Retrieval time of quality metadata (per-record) for different number of dimensions .....1 19









LIST OF ABBREVIATIONS


API Application Programming Interface.

AUC Area under the ROC curve.

BLAST Basic Local Alignment Search Tool.

CLOB Character Large Obj ect.

DBMS Database Management System.

DDBJ DNA Data Bank of Japan.

DMSI Data Manager Service Interface.

DTD Document Type Definition.

EMBL European Molecular Biology Laboratory.

EUtils Entrez Programming Utilities.

FIFO First In, First Out.

GenBank Genetic sequence database.

HQ High Quality.

IQ Information Quality.

IQR Interquartile range.

JDBC Java Database Connectivity.

LQ Low Quality.

LRU Least Recently Used.

MMSI Metadata Manager Service Interface.

MRU Most Recently Used.

NCBI National Center for Biotechnology Information.

QEM Quality Estimation Model.

QM Quality Metadata.

QMA Quality Metadata Architecture.









RefSeq Reference Sequence collection.

ROC Receiver operating characteristic.

SAX Simple API for XML.

SQL Structured Query Language.

W3 C World Wide Web Consortium.

XML Extensible Markup Language.









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

BIODQ: A MODEL FOR DATA QUALITY ESTIMATION AND MANAGEMENT IN
BIOLOGICAL DATABASES

By

Alexandra Martinez

December 2007

Chair: Joachim Hammer
Major: Computer Engineering

We present BIODQ, a model for estimating and managing the quality of biological data in

genomics repositories. BIODQ uses our new Quality Estimation Model (QEM) which has been

implemented as part of the Quality Management Architecture (QMA). The QEM consists of a set

of quality dimensions and their quantitative measures. The QMA combines a series of software

components that provide support for the integration of the QEM with existing biological

repositories. We describe a research study conducted among biologists, which provides insights

into the process of quality assessment in the biological context, and is the basis of our evaluation.

The evaluation results show that the QEM dimensions and estimations are biologically-rel evant

and useful for discriminating high quality from low quality data. Additionally, the evaluation

performed on a subset of the National Center for Biotechnology Information' s databases

validates the benefits of QMA as a quality-aware interface to genomics repositories. We expect

BIODQ to benefit biologists and other users of genomics repositories by providing them with

accurate information about the quality of the information that is returned as part of their queries.









CHAPTER 1
INTTRODUCTION

The rapid accumulation of biological information as well as their widespread usage by

scientists to carry out research is posing new challenges to determine and help manage the

quality of data in public biological repositories. Genbank [35, 5], RefSeq [39, 49], and Swissprot

[58, 7] are prominent examples of public repositories extensively used by genomics researchers

and practitioners, bioinformaticians, and biologists in general. In the least, analysis and

processing of low-quality data results in wasted time and resources. In the worst case, the usage

of low-quality data may lead scientist to false conclusions or inferences, thus hampering

scientific progress.

Several quality models and assessment methodologies have been proposed in the literature,

but most are utilized in the context of enterprise data warehousing and aim to solve quality

problems existing in the business domain. These methodologies do not fit naturally into the

genomics context because biological data is more complex and less structured than typical

business data. In addition, the increasing data generation and usage rates limit the kind of quality

assessments that can realistically be performed. For example, a common approach for assessing

information quality has been to gather user appraisals about the perceived quality of information

products or processes, mostly via questionnaires. This approach, as any other manual-based

approach of quality assessment, turns impractical in the context of genomics due to its lack of

scalability and inability to produce timely quality estimates. We therefore believe that there is a

need for automated quality assessment techniques that provide users of genomics data sources

with objective and quantitative estimates of the quality of available data.









1.1 The Need for Quality Information in Biological Data Sources

1.1.1 Biological Data Sources Currently do not Manage Quality Well

We studied some of the popular public repositories of biological data with the purpose of

discovering how they manage quality. Most of the problems we found were common to all the

repositories; hence we focused our work on the databases of the National Center for

Biotechnology Information (NCBI) [61] because of their widespread use by the scientific

community. The three maj or problems found related to quality are described next.

1.1.1.1 Shortage of quality information

Currently, biological data sources provide minimal information about the quality of the

stored data. Some repositories offer base-calling scores, which are quality indicators of the

sequence data solely. Typically, however, genomic records contain a significant amount of

annotations about the sequence data, which should be accounted for in a comprehensive

evaluation of records.

Several challenges must be overcome when addressing the shortage of quality information

provided by the data sources. First, comprehensive quality assessments need to be formulated,

which consider the entire contents of a record (i.e., sequence and annotations). Second, different

quality aspects of the stored data should be available in order to accommodate the large variation

in usage and quality perception by users of the data sources. Consequently, quality must be

evaluated from a multidimensional perspective. Third, a mechanism to represent and store

quality information about the underlying biological data needs to be devised.

1.1.1.2 High data generation to curation ratio

Most pubic biological repositories have a curation process in place, which consists in

cleaning, standardizing, and annotating the submitted data with the aim of improving its value

and quality. Even though this curation process has been partially automated, a significant amount









of human effort (and time) is still required. On the other hand, large amounts of biological data

coming from multiple sequencing centers and laboratories are loaded into the repositories on a

daily basis. Hence, the ratio of data generation to data curation is increasing, and this trend will

continue as sequencing technologies are improved. For this reason, most biological data sources

publish their newly acquired data before it can be completely curated, thus raising concern over

the quality of available data.

One approach for addressing the high data generation to curation ratio problem is to work

towards a fully automated curation process of the submitted data. However, until this becomes a

viable option, an alternative approach is to provide indicators of the quality of the available data

that help users recognize data of different curation (and quality) level.

1.1.1.3 Lack of quality-driven interfaces

Current query interfaces of biological data sources do not support user-specified quality

criteria as part of queries. Without such capability, the identification of high-quality records from

the query results becomes a time-consuming task even for experienced users. While experienced

users can generally glance at a record and roughly estimate its quality level, when a query

retrieves a large number of records, examining each record individually is not convenient.

Moreover, users who are new to one of these repositories would need to become familiar with

the implicit quality indicators embedded in the data, before they can interpret and use them in a

quality assessment. Not to mention that criteria used to evaluate the retrieved records are

subj ective and largely depend on user expertise. Hence, an automated way to present the query

results ranked by quality score would be preferred.

We believe that biological data sources should provide query interfaces that allow users to

(i) selectively request quality information over the retrieved records, (ii) filter out data whose









quality does not meet the expectations specified by the user, and (iii) order query results with

respect to a given quality dimension.

1.1.2 Benefits of Adding Quality Information

The benefits of adding quality information to existing biological repositories are numerous

and their impact is broad. Here we outline some of the maj or benefits we foresee.

First, the value and utility of existing repositories will be enhanced by providing users with

quality information about the retrieved data, which will in turn help the users decide what data

best fits their specific needs. Second, biologists and other users of current repositories will be

able to work more effectively when using a quality-aware interface that allows them to filter out

query results below a certain quality threshold, and to rank the retrieved data based on different

quality scores. This will aid users to quickly discriminate high quality records from a candidate

set, and decide what data best fits their specific needs. Third, the data curation process will be

facilitated by providing preliminary estimates for the quality of records submitted to the

database, which can in turn help curators prioritize records for further revision.

1.2 Motivating Example

The Reference Sequence (RefSeq) collection [39, 49], a public database of biological

sequences maintained by the National Center for Biotechnology Information, is considered one

of the high quality databases currently available. NCBI' s RefSeq database consists of curated,

non-redundant nucleotide and protein sequences from numerous organisms (3774 as of Release

19) [49]. The curation process is performed by collaborating groups and by NCBI staff and

records are annotated with a curation status: model, predicted, inferred, provisional, validated










and reviewed. 'Reviewed' and 'Validated' are the two highest curation levels, followed by

'Provisional', and then by 'Predicted', 'Inferred', and 'Model'l

Some could argue that the shortage of quality information can be easily solved by using the

curation status terms as quality categories of the data. While this could be used as a first

approach to categorize records from the RefSeq database, it does not convey a general and

comprehensive solution. First, this quality categorization derived from RefSeq would neither

apply to databases that have a different curation process in place, nor to those whose curation

process is rather unstructured, making it hard to even identify curation levels. Second, relying on

a curation status as a quality indicator assumes the existence of a curation process, but such

assumption may not hold true for some data sources (including GenBank) which are mainly

archival datasets. Third, records at the same curation level may not necessarily be considered of

similar quality by domain experts (according to our research study, described in Appendixes A

and B). For example, the RefSeq records identified by accessions NM_181070 and NM_174874

are both annotated with the 'Provisional' curation status (an intermediate curation level);

however, the former was classified as a 'good' quality record while the latter was classified as a

'poor' quality record, by one of the research participants. This shows that the curation level alone

is not sufficient to determine the quality of data stored in a biological repository, and serves to

illustrate that assessing the quality of data in the biological context is a difficult problem.

1.3 Our Contributions

The main contributions of this work are:

* The identification of a set of measurable quality dimensions fit for genomic data.




Information obtained partially from NCBI collaborators and partially from documentation available at the RefSeq
website http://www.ncbi.nlm.nih.gov/RefSeq/key.htm~tts










* The formulation of quantitative measures for the quality dimensions, which can be
systematically computed in a semistructured data model.

* The definition of a core set of quality-aware maintenance and query operations for a
semistructured data model.

* The design, implementation, and validation of a quality management architecture, which
enables the integration of our quality model with existing biological data sources.

* The development of a research study among domain experts (biologists), which is the basis
for the experimental evaluation of our model.

* The evaluation of the significance and usefulness of the chosen quality dimensions and
measures in assessing the quality of biological data.

* The evaluation of the usefulness of the quality management architecture with respect to its
functional capabilities and operational costs.









CHAPTER 2
LITERATURE REVIEW

2.1 General Work on Data Quality

Numerous models, evaluation methodologies, and improvement techniques have been

developed in the area of Information Quality (IQ) [23, 24, 28, 56, 60]. IQ researchers often

regard quality as "fitness for use" [3], so the user' s perception of quality and the intended use of

the data prevail in these approaches. Wang et al. [60] proposed an attribute-based model to tag

data with quality indicators. They suggest a hierarchy of data quality dimensions with four maj or

dimensions: accessibility, interpretability, usefulness, and believability. These dimensions are in

turn split into other factors such as availability, relevancy, accuracy, credibility, consistency,

completeness, timeliness, and volatility. Mihaila et al. [28] identified four Quality of Data

parameters: completeness, recency, frequency of updates, and granularity. Lee et al. [23]

distinguished five dimensions of data quality: accessibility, relevancy, timeliness, completeness,

and accuracy; each considered a performance goal of the data production process. Lee et al. [24]

developed a methodology for IQ assessments and benchmarks called AIMQ. AIMQ is based on

a set of intrinsic, contextual, representational, accessibility IQ dimensions, which are important

to information consumers. These dimensions were first devised by Strong et al. [56] as categories

for high-quality data. Naumann and Rolker [45] proposed an assessment-oriented classification

of IQ criteria based on three sources of IQ, namely the user, the source, and the query process.

More recently, Naumann and Roth [46] analyzed how well modern (relational) DBMS meet user

demands based on a set of IQ criteria. All these works offer valuable contributions for better

understanding of data quality problems and challenges, but they fail to provide quantitative

measures for the quality dimensions or indicators proposed.









2.2 Work on Data Quality for Cooperative Information Systems

Data Quality has also been studied in the context of Cooperative Information Systems,

where more pragmatic approaches have emerged [26, 29, 44, 53]. Mecella et al. [26] describe a

service-based framework for managing data quality in cooperative information systems, based on

an XML model for representing and exchanging data and data quality. Scannapieco et al. [53]

developed the DaQuinCIS architecture and the D2Q (Data and Data Quality) model for

managing data quality in cooperative information systems. Naumann et al. [44] presented a

model for determining the completeness (i.e., a combination of density and coverage) of a source

or combination of sources. Missier et al. [29] defined the notions of quality offer and quality

demand within cooperative information systems, and modeled quality profiles as

multidimensional date cubes. Bouzeghoub and Peralta [9] analyzed existing definitions and

metrics for data freshness in the context of a Data Integration System. All of these works address

quality issues that arise in the presence of multiple sources, in particular problems related to data

exchange, data integration, and notification services among the sources. In our work, we are

primary concerned with the quality of data within a single source, hence most such issues are

neither applicable to us nor addressed by our model. Yet we believe our model complements

works in Cooperative Information Systems and Data Integration System because they do not

typically provide solutions for measuring the inner quality of sources.

2.3 Work on Biological Data Quality

Some work has been proposed in the context of biological data quality. Particularly, the

research by Miidler et al. [33] identifies the main errors involved in the process of genome data

production as well as their corresponding data cleansing challenges. A thorough examination of

the quality of the human genome DNA sequence was described by Schmutz et al. [54]. Both

focus on assessing the quality of the sequenced data, but our approach is also concerned with the









annotations about the sequenced data. Recently, the management data quality has been studied in

the context of the life sciences [48, 30]. Missier et al. [30] proposed the Qurator system, which

allows the specification of user' s personal quality functions into the so called "quality views".

These views are compiled into Web services that can then be embedded within the data

processing environment. Preece et al. [48] also describe a framework, based on the Qurator

proj ect, for managing information quality in e-Science, using ontologies, semantic annotation of

resources, and data bindings. It allows scientists to define the quality characteristics that are of

importance in their particular domain, rather than specifying generic, domain-independent

quality characteristics. Our approach differs from these works in that it aims to define rather

general and objective quality dimensions that can be computed in an automated way, i.e., does

not require user' s input.

2.4 Other Quality-Related Work

Research efforts in the areas of Quality of Service and Digital Libraries have also explored

the characteristics and the role of quality [55, 6, 57, 4]. Quality of Service has mainly been

developed to support distributed multimedia applications, which transmit and process

audiovisual data streams. Quality of Service comprises the quality specifications, mechanisms,

and architecture necessary to ensure that user and/or application requirements are fulfilled [55,

6]. In the context of Digital Libraries, Sumner et al. [57] analyzed the dimensions of educators'

perceptions of quality in digital library collections for classroom use. They found that metadata

influences how the quality of the collections is perceived. Beall [4] describes the main types of

errors in digital libraries, both in metadata and in actual documents; and offers suggestions for

managing digital library data quality. The quality problems that have been investigated in the

contexts of Quality of Service and Digital Libraries differ from those existing in biological

databases.









2.5 Work on Semistructured Data

Finally, works on semistructured data modeling are also relevant to us because such data

models have been extensively used in the biological domain, and because our quality model uses

an underlying semistructured data model. Most of the models proposed for semistructured data

[1, 11, 12, 25] share a common underlying representation, which is either a graph or a tree with

labels on the nodes or on the edges. Abiteboul et al. [1] use an edge-labeled graph to represent

semistructured data. UnQL and LORE are based on an edge-labeled tree representation [11, 25].

Calvanese et al. [12] use the basic data model for semi-structured data (called BDFS) in which

both databases and schemas are represented as graphs. The work by Scannapieco et al. [53]

provides a good example of the usage of a semistructured data model (in particular, XML) to

represent both data and quality metadata.









CHAPTER 3
QUALITY ESTIMATION MODEL

We present a new model for estimating the quality of biological data in genomics

repositories. The model comprises a set of measurable quality dimensions, and a set of

quantitative measures that can be systematically computed to provide a score for each quality

dimension. Quality dimensions and measures are integrated into a semistructured data model,

which is suitable for representing both data and quality metadata, and can accommodate a wide

variety of data models.

3.1 Definitions

We define Data Quality as a measure of the value of the data. Since value is a rather

intangible concept, we decompose it along Hyve different quantifiable dimensions.

Quality dimensions are aspects of the quality of data which either the user or the data

provider is interested in measuring. Since we aim for quality dimensions that can be quantified,

we need to specify how the quality dimensions will be measured. The particular formula or

algorithm by which each dimension is assigned a score is called a mea~sure.

We refer to the set of quality dimensions of a data item as its quality metttttadtatttt~~~~~tttt and we

represent it as a vector where each entry contains the data item's score on a quality dimension,

e.g., Q [dy,d2,..., d,] with di,d2,..., d;,being the scores for the n quality dimensions.

3.2 Quality Dimensions

In order to identify suitable quality dimensions for our model, we looked for dimensions

that met the following criteria. First, the dimension could be objectively measured, meaning that

no subj ective appraisal or interpretation was needed to assess a score for the dimension. Second,

the dimension could be efficiently computed, meaning that the computation of new and updated

scores for the dimension should be fast enough to allow its use in real scenarios where the










underlying biological data is constantly being updated. Third, the dimension was biologically

relevant, meaning that it effectively captured criteria directly or indirectly used by biologists

when assessing the quality of data. The relevancy for biology was preliminary judged by the

authors, then validated by field-experts, and lastly confirmed experimentally.

Using the criteria described above a set of seven quality dimensions was selected, namely

Density, Freshness, Age, Stability, Uncertainty, Linkage, and Redundancy.~~dddd~~~ddd~~~ The first five of these

dimensions are per-record dimensions and the last two are cross-record dimension. Per-record

dimensions are dimensions that consider quality aspects of a single record solely (i.e., they assess

records on an individual basis). Cross-record dimensions are dimensions that consider quality

aspects across a set of records collectively (i.e., they asses the interactions among records).

Next we provide an informal description of the selected quality dimensions, which will be

formalized later when we present their measures. Hereafter, we use the term "data item" to

denote either logical data units with structure such as records or fields of a record, or the value of

any such logical data unit.

3.2.1 Per-Record Dimensions

3.2.1.1 Density

Density is a spatial dimension that provides an assessment of the amount of information

conveyed by a data item d. The amount of information can be measured as the number of

(possibly nested) data items within the data item d. This recursive definition takes a concrete and

natural form under the chosen data model. The intuition behind this dimension is as follows. A

data item which consists of many other data items is considered "dense" because much

information is comprised in it, hence its density score would be high. On the other hand, a data

item which consists of just a few other data items is deemed as "light" since little information is

comprised in it, and therefore its density score would be low.










Initially, Density was deemed sufficient for capturing the notion of "amount of

information" in a biological record, but later we found that two other aspects related to density

were also relevant to the biologists, namely the feature and publication information. By feature

information we refer to the features annotated in the Feature Table [21] of a genomic record,

which may describe genes, gene products, and regions of biological significance in the sequence.

By publication information we refer to the references contained in a genomic record, which may

be to publications in a journal article, book chapter, book, thesis, monograph, proceedings

chapter, proceedings from a meeting, or patent. These two information dimensions, named

Features and Publications, were incorporated into our model as sub-dimensions of Density

because despite being encompassed by the Density dimension, feedback gathered from biologists

suggested that they had significant biological meaning as to justify its separate treatment.

Both Features and Publications dimensions can be measured by counting the number of

feature key and reference elements, respectively. The intuition for these new dimensions is that

the more feature keys and the more references a record has, the higher its features and

publications scores are.

3.2.1.2 Freshness

Freshness is a temporal dimension that indicates how up to date the contents of a data item

d' are. It can be measured as a function of the time elapsed since the last update of the data item

d', using an exponential decay. The intuition behind the Freshness dimension is as follows. If a

data item has recently been updated, it is considered to be "up to date", and its freshness score

would therefore be high. Conversely, if a data item has remained unchanged for a long period of

time it is considered "outdated" and its freshness score would then be low.









3.2.1.3 Age

The Age dimension indicates how old the contents of a data item d are, so it is a temporal

dimension, too. It can be measured as a function of the time elapsed since the creation of the data

item d, using an exponential decay. The intuition behind this dimension is as follows. If a data

item was created long time ago it is considered "old" and its age score would therefore be high.

Conversely, if a data item has been recently created, it is considered "young", and its age score

would hence be low.

3.2.1.4 Stability

The Stability dimension captures information about changes in the contents of a data item

d, which can be obtained from the version history available in main biological repositories. This

is both a temporal and a provenance dimension since it keeps track of changes applied to the data

though time. Stability can be measured as the magnitude of the changes undergone by the data

item relative to its size, and weighted by a function of the time elapsed since the change

occurred. This weighting function diminishes the influence of older updates in favor of recent

ones. The intuition behind the Stability dimension is the following. If a data item has not

undergone large changes during a recent period of time, it is regarded as "stable" and its stability

score would therefore be high. Conversely, if a data item has undergone considerably large

changes recently, it is considered "unstable" and its stability score would then be low.

3.2.1.5 Uncertainty

The Uncertainty dimension is an indicator of the lack of evidence for the contents of a data

item d. In the biological context, this typically refers to ambiguities associated to the particular

experimental procedure or method used to obtain the data. More specifically, in our target

databases the ambiguities or uncertainties come primarily from the sequence data, and are

expressed as degenerate bases. The Uncertainty dimension can hence be measured as the fraction









of ambiguous values (i.e., degenerate bases), relative to the total length of the sequence. The

intuition behind the Uncertainty dimension is as follows. If a data item contains many ambiguous

values, we regard it as an "uncertain" item, and its uncertainty score would therefore be high. On

the contrary, if a data item has no ambiguous values, we would consider it "certain" and its

uncertainty score would then be low.

3.2.2 Cross-Record Dimensions

3.2.2.1 Linkage

Linkage is a spatial dimension that provides information about the incoming and outgoing

links of a data item d' (a record, in this context). In biological databases, records can be linked to

other relevant records, published articles, etc. Such information is typically represented as an

interaction graph that consists of a set of nodes representing records, and a set of directed edges

or links between nodes representing relationships between records. For example, a link between

a record in the RefSeq database [39, 49] and an entry in the PubMed database [43,61] indicates

that the corresponding PubMed article describes or uses the information in the RefSeq record.

Initially, we considered Linkage as a single dimension, but after examining the rich

information offered by different link types within the NCBI databases, we decided to split the

Linkage dimension into four mutually exclusive sub-dimensions, namely Literature Links, Gene

Links, Structure Links, and Other Links. The Literature Links dimension comprises links to or

from literature databases, specifically NCBI's PubMed, NCBI's Online Mendelian Inheritance in

Man (OMIM), and NCBI's Online Mendelian Inheritance in Animals (OMIA). The Gene Links

dimension accounts for links to or from gene and genome databases, in particular NCBI' s Gene,

NCBI' s HomoloGene, and NCBI' s Genomes. The Structure Links dimension contains links to or

from structure and domain databases, specifically NCBI's Structure (MMDB), NCBI's 3D

Domains, and NCBI's Conserved Domains (CDD). Lastly, the Other Links dimension covers all









other links not included in any of the previous dimensions; for example, links to related

sequences across databases. This division, which we believe represents important but different

biological characteristics of the genomic data, was refined with the help of expert collaborators

at our university.

All of the linkage dimensions can be measured as a link count over the respective target

databases. The intuition behind the Linkage dimension (and sub-dimensions) is that a data item

with many links in its interaction graph would have a high linkage score whereas one with few

links would have a low linkage score.

3.2.2.2 Redundancy

Redundancy is a spatial dimension that captures information about the number of

redundant data items (records, in this context) with respect to a data item d (a record, too). In

biological databases, two records are considered redundant (with respect to each other) if their

sequence similarity is significantly high. Annotations about the sequence data could also be

incorporated into a general notion of redundancy (which was our initial approach), but the

problem with this approach is that measuring the redundancy at the annotations level would lead

to expensive string comparisons among records in the database, which we cannot afford. Even

measuring the redundancy at the sequence level would be computationally costly if no extra

information is provided by the data source. Luckily, our target database runs BLAST over the

stored records periodically, and pre-computes the "neighbors" or related sequences for every

record. Using this information, the Redundancy dimension of a data item d can be measured as

the number of neighbors of d. The intuition behind this dimension is that a data item with many

neighbors would have a high redundancy score whereas one with few neighbors would have a

low redundancy score.









3.2.3 Possible Extensions

It is worth mentioning that the proposed set of quality dimensions (described above) could

be extended to include other aspects of quality that may also be relevant in the biological

context. The selection of an appropriate set of dimensions is, in fact, one of the biggest

challenges in the area of data quality, and in our context this decision is influenced by what

information is currently offered at the data sources. As an example, we next describe two quality

dimensions that we intended to include but decided to drop because we did not have enough

information to effectively measure them yet. When such information becomes available, our

model can be extended to accommodate these (and other) new dimensions.

3.2.3.1 Query pattern

This dimension would capture information about the query pattern of a data item (e.g., how

many times the data item has been queried, when did those queries occurred, etc). Such query

information could be used similarly to the way in which the update history is used in the Stability

dimension. The intuition behind the query pattern dimension of a data item would be as follows.

If the data item has been frequently queried (i.e., used by scientists), its score would be high

based on the assumption that scientists normally use data they trust to be correct. On the other

hand, if no queries are issued to the data item for a long period of time, its score would be low.

Although this dimension seemed useful according to our predefined criteria, we were not

able to include it in our final set of dimensions because the database we were using did not

provide information about the queries made by users.

3.2.3.2 Correctness

This dimension would indicate the correctness (or accuracy) of a data item. Although

measuring the correctness of a biological data item is not a trivial task, we could reduce the

scope of this dimension to evaluating the sequence data of a biological record, which indeed is a










key aspect for biologists. If we restrict the Correctness dimension in this way, we could then use

the quality scores (i.e., base-calling scores) stored in our target database to compute a score for

this dimension. The problem was that such quality scores were not consistently available for all

the data stored in the database, hence we decided not to include it in our final set of dimensions.

3.3 Measures for the Quality Dimensions

3.3.1 Underlying Data Model

Before we can formulate measures for the quality dimensions, we need to select a data

model in which the underlying biological data is to be represented. In this work, we chose the

semistr-uctured data model. Semistructured data is commonly described as "schemaless" or "self-

describing" data [1, 10] because the schema of the data is contained within the data. A

semistr-uctured data model generally represents data hierarchically (i.e., in a tree-like structure2,

with actual data represented at the leaf nodes and schema information encoded in upper layers of

the hierarchy (i.e., internal nodes). Here, leaf nodes store atomic data items or values, which can

be either strings or numbers. Internal nodes represent complex data items, which are collections

of other data items. The Abstract Syntax Notation number One (ASN. 1) [18] and the Extensible

Markup Language (XML) [62] are two examples of semistructured data models.

Figure 3-1A shows a fragment of a genomic record using the semistructured model

proposed by Abiteboul et al. [1], and Figure 3-1B sketches a semistructured representation of a

database, where the root of the tree represents the entire database, nodes immediately below the

root represent records in the database, and nodes below them represent data items within records.

It is worth noting that our quality estimation model can be used with any semistructured data

model that adheres to the principles stated above, hence our conceptual data model is very


2 Strictly speaking, the semistructured data model allows for cycles in the data, hence a graph representation should
be used instead of a tree. However, when the nature of the data is acyclic, a tree-like structure can be assumed.
































"Eukaryota,
Varidiplantae,
Streptophyta


A Div \Bio

"NM 128079" "PLN" "mRNA"


"Arabldopsis" "thahana"


DB-root


Datab~ase


P records ---+


I Rec4



IBiosey


"isoforrn2'


135 "Herno sapiens'


Figure 3-1. Examples of semistructured representations of data. A) Fraction of a genomic record
represented in the semistructured model by Abiteboul et al. B) Sketch of a
semistructured representation of an example database with four records.










generic. When we discuss the implementation of the quality model, we will specify the format

and syntax of the particular data model chosen.

Several reasons justify the selection of a semistructured data model in this context. First, a

vast amount of biological data is currently available as semistructured data (e.g., GenBank [3 5,

5], EMBL [17, 22], and DDBJ [42] publish their data in XML). Second, semistructured models

have proven useful at representing biological data and its intrinsic complexities, which is

demonstrated by the increasing number of XML-based languages developed for biology (e.g.,

BioML, BSML, AGAVE, GeneXML). Third, a semistructured data model can seamlessly

represent both data and metatada, which is a desirable feature in our quality model. Finally, it

can accommodate a variety of other data models, thus making it possible to estimate the quality

of a wide variety of repositories that use different data representations. Using the semistructured

data model allows us to measure the different quality dimensions in a bottom-up fashion, which

is described next.

3.3.2 Measures

3.3.2.1 Density measure

Our informal previous description of the density measure involved counting the number of

(possibly nested) data items for a given data item. Now we refine such definition. Under the

adopted hierarchically-structured data model, a complex data item is a collection of other data

items (i.e., it contains nested data items), which can be either complex or atomic data items. Also

under this model, an atomic data item cannot contain nested data items; it can only have data

values like strings or numbers. Hence, the recursive definition of counting data items within data

items becomes natural when we apply it to complex and atomic data items of our data model.

Recursion thus stops at the atomic data items.










The density score of an atomic data item d is defined as 1, for any d'. This means that each

atomic data item, regardless of its size, has the same contribution to the total amount of

information of a record. For example, if the value of the atomic data item di in the data tree is a

large string sl, and the value of the atomic data item d2 is a short string s2, then both di and d2

will have a density score of 1, meaning that each represents one unit of information.

Once we know the density score for atomic data items at the bottom level of the tree, we

can recursively compute the density score for complex data items in upper levels of the tree. The

measure for the Density dimension of a complex data item d is given in Equation 3-1, where n is

the number of components (i.e., direct descendants) of d', and D, is the density score of the ith

component of d in the data tree. This measure is equivalent to the size (i.e., number of nodes) of

the subtree whose root node is d'.





The density score can take on values from the interval [1, oo[, where 1 represents the

minimum density value, and there is no upper limit on the density value.

Measures for sub-dimensions. We provide here the measures for the Features and

Publications sub-dimensions of Density. These measures differ from the density measure in two

ways: first, they do not count nested data items; second, they only apply to certain complex data

items of interest (hence, we do not provide a measure for atomic data items in this case). Given a

complex data item d representing a record r, the Features score T of item d is defined as the

number of complex data items representing feature keys in r, and the Publications score P of d' is

defined as the number of complex data items representing references in r. Each complex data

item representing either a feature key or a reference thus contributes with one unit to the

corresponding measure.









3.3.2.2 Freshness measure

We previously described the freshness measured as a function of the time elapsed since the

last update of the data item d, using an exponential decay. This notion is formalized in Equation

3-2, which specifies how to obtain the freshness score for an atomic data item d.


F. = e .l (3-2)

In Equation 3-2 t is the current time, u is the time when d was last updated, f is the

frequency of update of the database (represented in the same time units as the subtraction in the

numerator), and cris a parameter that controls the decay rate of the freshness score. The role of f

is to scale the time elapsed since last update to units that reflect the rate at which the database

gets updated. The exponential decay gives more weight to recent past than to distant past, and

also ensures that the freshness score takes values between 0 and 1.

For a complex data item d, the freshness score is defined as the average of the freshness

scores of its components (i.e., direct descendants of d in the data tree).

The freshness score can take on values from the interval [0, 1], where 0 and 1 denote the

minimum and maximum freshness values, respectively.

3.3.2.3 Age measure

The age measure was previously described as a function of the time elapsed since the

creation of the data item d. This notion is formalized in Equation 3-3, which specifies the

measure for the Age dimension of an atomic data item d.


A=1- '*.(3-3)

In Equation 3-3 t is the current time, c is the time when d was created, fis the frequency of

update of the database (represented in the same time units as the subtraction in the numerator),










and pis a parameter that controls the decay rate of the age score. The role of f is to scale the time

elapsed since creation to units that are in accordance to the database update rate. The

transformation applied to this scaled time produces large increases in age at the beginning and

then slows down as time passes by. This also ensures that the age score is between 0 and 1.

For a complex data item d, the age score is defined as the average over the age score of its

components (i.e., direct descendants of din the data tree).

The age score can take on values from the interval [0, 1], where 0 and 1 denote the

minimum and maximum age values, respectively.

3.3.2.4 Stability measure

In our informal description of the stability measure from previous section we suggested to

quantify the magnitude of the updates applied to a data item, and use a time-dependent weighting

function to reduce the effect of older updates. We formalize this notion in Equation 3-4, which

specifies the measure for the Stability dimension of an atomic data item d.


S =1- C A(d(i ), d(i)) I'e-,tdt .(3-4)


In Equation 3-4 n is the number of intervals at which we measure the stability of d, t, is the

time elapsed since the ith interval (with to oo), d(i) is the state Of d at interval i, and ii > 0 is a

free parameter. The function A measures the fraction of d that changed between two consecutive

intervals. The integral of the exponential function applies a time-decaying weight to the changes

undergone by d, effectively giving more weight to recent changes than to old ones. Note that S is

initially 0 since A(d(0), d(1)) = 1 for any data item d (the default type of any data item at time to

is null, and A(null, d(1)) = 1 for any d(1) / null, so the integral evaluates to 1).


3 The state of a data item is defined by type and contents.









It is possible to express the stability measure from Equation 3-1 in an incremental way

(Equations 3-5 and 3-6). The stability score S of a data item d can be obtained from the

instability I of d score as in Equation 3-5, where the instability score I of d at time tk is given by

Equation 3-3, and I at time to is define to be 1. Equation 3-6 shows that the instability score at

time tk is determined only by the score at time tk-1, meaning that we only need to know what the

previous score was (rather than all previous scores since to). Note that Equation 3-6 is an

exponential moving average with memory depth eS"k-1. The stability measure can therefore be

computed incrementally.

S,k k t (3-5)


I,k "-- k-1 k1 k-1) A(d(k -1), d(k)). (3-6;)

The function A(dy, d2) for atomic data items di and d2 is defined in Equation 3-7. Note that

0 < A(dy, d2) I 1 for any pair (di, d2). If di and d2 are numbers, this formula assumes that they are

positive. It is also possible to use an approximation to the Edit Distance function if efficiency is a

maj or concern.

editDist(d,, d2 *
if dz, d2 are strings
max {length(d, ),1length(d2 )
| dz, d2
A(d,, d2 ) = maxtdy, d2
1 otherwise



The stability S of a complex data item d is defined as the average over the stability score of

its components (i.e., direct descendants of d in the tree).

The stability score can only assume values from the range [0, 1], where 0 represents

minimum stability and 1 represents maximum stability.









3.3.2.5 Uncertainty measure

In the previous section, we described the uncertainty measure as the fraction of ambiguous

values relative to the total length of the record' s sequence. Since this measure is based on the

sequence information, the uncertainty score is only computed for the atomic data item d

representing the sequence data (i.e., string containing the nucleotide or amino acid sequence);

other data items (complex and atomic) would have mdl uncertainty score by default. Equation 3-

8 defines the uncertainty measure for an atomic data item d.

degenerateCount(d)
U = (3-8)
length(d)

In Equation 3-8, degenerateCount(d) is a function that counts the total number of

degenerate bases in d (assuming that dis the atomic data item containing the sequence string),

and length(d) is the size of the string represented by d (i.e., the total number of bases in the

sequence). Table 3-1 lists the ambiguity codes for degenerate bases in nucleotide sequences, and

Table 3-2 shows the ambiguity codes for degenerate bases in amino acid (i.e., protein)

sequences.



Table 3-1. Ambiguity codes for nucleotide sequences.
Code Meaning (Base)
R purine (G or A)
Y pyrimidine (T or C)
K keto (G or T)
M amino (A or C)
S strong (G or C)
W weak (A or T)
B not A (G or T or C)
D not C G or A or T)
H not G (A or C or T)
V not T (G or C or A)
N any base (A or G or C or T)










Table 3-2. Ambiguity codes for amino acid sequences.
Code Residue
B aspartate or asparagine
Z glutamate or glutamine
X any residue



The uncertainty score can only take values in the range [0, 1], where 0 and 1 represent

minimum and maximum uncertainty, respectively.

3.3.2.6 Linkage measure

Since Linkage dimension is a cross-record dimension, it is relevant only to data items that

represent records; hence we focus only on the measure for complex data item representing

records. Previously we mentioned that the four linkage sub-dimensions could be measured as the

link count over the target databases encompassed by each sub-dimension; thus there really is just

one linkage measure (i.e., link count), which is applied over different databases for each sub-

dimension.

Measures for sub-dimensions. Given a complex data item d representing a record r, the

Literature-Links score L of item d is defined as the number of links from r to entries in NCBI' s

literature databases (PubMed, OMIM, OMIA), the Gene-Links score G ofdis defined as the

number of links from r to entries in NCBI' s gene-related databases (Gene, HomoloGene,

Genomes), the Structure-Links score C of d is defined as the number of links from r to entries in

NCBI's structure-related databases (MMDB, 3D Domains, CDD), and the Other-Links score O

of d is defined as the number of links from r to entries in all other NCBI databases. Each link

thus contributes with one unit to the link count. In the context of our target biological databases

all links are two-way links; hence the former linkage scores effectively reflect both the number

of outgoing and incoming links to/from record r.










The score for each linkage dimension can take on values from the interval [0, oo[, where 0

means that no link exists. There is no upper limit on the value of the linkage score.

3.3.2.7 Redundancy measure

Previously, the redundancy measure was described as the number of "neighbors" of a data

item representing a record. Given that Redundancy is a cross-record dimension, we focus only on

the measure for complex data items that represent records.

The Redundancy score R of a complex data item d representing a record r is defined as the

number of links from r to distinct entries in the same database of r (e.g., nucleotide-nucleotide or

protein-protein relationships). Each of these links thus contributes one unit to the total link count.

This link count effectively counts the neighbors of r (i.e., redundant data items of d).

The score for the Redundancy dimension can take on values from the interval [0, oo[,

where 0 means that no redundant data items exist. There is no upper limit on the value of the

redundancy score.

3.3.3 Complexity Analysis of the Measures

Based upon the chosen semistructured data model, we define n as number of nodes in the

tree representing a record r (see Figure 3-1B), I as the number of leaf nodes in the tree, c as the

number of child nodes (i.e., direct descendants) of an internal node of the tree, d as the length of

the largest data string stored at a leaf node, and s as the length of the sequence string of a record

r. Also, let I~, I,, I,, lo, and 1, be the number of literature links, gene links, structure links, other

links, and neighbor links of record r, respectively. We first present the complexity analysis for

the measures of the per-record dimensions and then for the measures of the cross-record

dimensions. Our analysis distinguishes between atomic and complex data items; as well as

between initialization and update times. Initialization time refers to the time when new biological

data (usually in the form of a record) is added to the database, so the scores of the quality









Table 3-3. Time complexity of the per-record measures.
Type of data item Measure Initialization Update
Atomic Uncertainty O(s) O(s)
Stability O(1) O(d )
All others O(1) O(1)
Complex Uncertainty O(1) O(1)
All others O(c) O(c)


Table 3-4. Time complexity of the cross-record measures.
Type of data item Measure Initialization Update
Complex Redundancy O(1,,) O(1,,)
Literature Links O(l) O(ll)
Gene Links O(lg) O(lg)
Structure Links O(ls) O(ls)
Other Links O(lo) O(lo)


dimensions should be given an initial value. Update time refers to the time when the biological

data is updated (parts of a record are modified), so the score of each quality dimension needs to

be updated to reflect the change in the underlying data. The main results obtained from the

following analysis are shown in Tables 3-3 and 3-4.

3.3.3.1 Complexity of the per-record measures

Initialization Time. At initialization time, any of the per-record measures except

Uncertainty can be computed in constant time i.e., O(1) for an atomic data item. The Uncertainty

measure takes O(s) time, since the entire sequence string of a record needs to be scanned at

initialization time.

On the other hand, computing the initial per-record measures for a complex data item takes

time proportional to the number of child nodes of the complex data item, i.e., O(c).

For a given record r, initializing the scores of the per-record dimensions can be done in

O(n + s) time where n results from a post-order traversal of the tree representing r (n is the

aggregate of c over all internal nodes), and s results from scanning the sequence string of the









record, which can be significantly large for some records (especially those that represent

complete genomes).

Update Time. At update time, any of the per-record measures except Stability and

Uncertainty can be computed in constant time for atomic data items. Regarding Stability, its

worst-case complexity occurs when the type of the atomic data item is string, since the function

A computes the edit distance between the old and new values of the string. If di and d2 are the

lengths of the old and new strings, respectively, then computing the edit distance takes O(di *

d2), and the worst case complexity for the Stability measure becomes O(d2). On the other hand,

the complexity for the Uncertainty measure at update time is O(s).

Updating the per-record scores of a complex data item merely involves re-computing an

average or similar aggregate over the complex item's child nodes, which requires O(c) time. The

only exception is Uncertainty, which is defined to be null for all complex data items, hence it

takes O(1) time.

For a given record r and its previous version r', updating the scores of the per-record
di~menso nscno beone iPTPn O/ r n + max/ {rJd r'.d) 2 r.1) time where r.n results from a post-


order traversal of the tree representing r (n is the aggregate of c over all internal nodes), and

max {r.d, r'.d) 2 T .l TOSults from updating the stability score for all the strings at the leaves of the

tree.

3.3.3.2 Complexity of the cross-record measures

The time complexity analysis in this section concerns only complex data items that

represent records, since we are dealing with measures across records.

Initialization Time. At initialization time, the Redundancy measure for a given record r

takes time proportional to the number of neighbors of r, i.e., O(1,,). Likewise, for a given record r

the measure for the Literature-Links dimension takes time proportional to r' s number of links










to/from literature databases i.e., O(Ih); the Gene-Links dimension takes time proportional to r' s

number of links to/from gene databases, which is O(lg); the Structure-Links dimension takes time

proportional to r's number of links to/from structure databases, which is O(I,); and the Other-

Links dimension takes time proportional to r' s number of links to/from other databases, which is

O(lo).

Hence, initializing the scores of all the cross-record dimensions for a given record r can be

done in O(1,, + I6 + 1, + I + lo) time.

Update Time. At update time, the complexity of the three cross-record measures is the

same as for initialization time. Hence, updating the scores of the cross-record dimensions for a

given record r can be done in O(1,, + I6 + 1, + 1, + lo) time.

3.4 Quality-Aware Operations

Since we are primarily concerned with biological data, we must consider a scenario where

data is constantly being updated and queried. Thus, we need to address the issues of how the

quality measures described above are affected by data manipulation operations (e.g., insertions,

deletions, and updates of fields or records), and how the result of these operations is extended to

include the quality scores. For this purpose, we consider a core set of operations over

hierarchically-structured data. Such set includes query operations such as selection, and

maintenance operations such as insertion, update, and deletion.

For the subsequent discussion, let vl,v2,- ,Vk (with vk = v) be the sequence of adj acent

vertices (or nodes) from the root of the tree to the node of interest v. We refer to vl,v2, ,Vk as the

"path" of v in the tree, and to the set (v1,v2, ,vk-1) as the "ancestors" of v in the tree.










3.4.1 Query Operations

We consider selection queries here. In the context of hierarchically structured data, a select

operation consists of navigating a path given by the user and then returning all or part of the

contents of the node located at the end of the path.

3.4.1.1 Selecting a node and returning its contents

The Select operation takes as input parameter a path p= vl,v2,- ,Vk. The Select operator

navigates path p until it reaches its last node vk, and returns the contents of this node. If vk is a

leaf node, its contents refers to the atomic data item (i.e., data value) stored at vk. If Vk is an

internal node, its contents refer to the subtree rooted at vk. Optionally, the Select operator can

take a second input parameter filter, which specifies which of the quality dimensions to retrieve

along with the data. If such argument is omitted, the default behavior is to retrieve all the quality

dimensions.

Under a select operation, the quality scores of the target node (vk in this case) will not be

affected since this is a 'read' operation (i.e., no changes are made to the contents of the node).

This operation returns both the contents and quality metadata of vk (i.e., SCOres for the

requested quality dimensions).

3.4.2 Maintenance Operations

We consider three types of maintenance operations: inserts, deletes, and updates. In the

context of hierarchically structured data, each of these operations navigates an input path, and

perform the corresponding insertion, deletion or update to the last node of the path. Figure 3-2

illustrates an update operation (other operations work in a similar way): the state of the database

at times t, and t,+l (when the update occurs) is shown. A leaf node (colored in red) is updated at

time t,+l, which causes its quality metadata to be updated. Then the quality metadata of the leaf s

ancestors is also updated (colored in blue).
















Time



M ETA DATA
UPDATE


1 ~Database Level


--- Record Level U



Bloseq


SOra-ref `, Annot


) Bloseq


Org-ref


Taxname


"isoforrn 2


1356 "Horno sapiens"


"isoforrn 2" 1401 r "Horno sapiens"


DATA
UPDATE


Figure 3-2. Illustration of an update operation. The data stored at a leaf node (shown in red) is
updated at time ti+l, causing the quality metadata of the leaf and its ancestors to be
updated (changes are propagated bottom-up).




3.4.2.1 Inserting a node

The Insert operation takes two input parameters: the node v to be inserted and the path


p = v1,v2,- ,Vk at the end of which node vk will be inserted. The Insert operator navigates path p

until it reaches its last node vk, and inserts node v as a child of vk. Since v is a new node, the

scores of all its quality dimensions need to be initialized. Initial scores are specified in the

previous Measures section. Next we sketch the steps involved in computing the quality scores

under a data insertion.










* Step 1. If v is a single node (i.e., an atomic data item), compute v's quality scores as
described in the Measures Section.

* Step 2. If v is the root of a subtree (i.e., a complex data item), recursively compute the
quality scores for all descendants of v and then for v, as described in the Measures Section.

* Step 3. Add v as a child node of vk.

* Step 4. Propagate the effect of this insertion to the ancestors of v so that their quality
scores get updated, too.

This operation returns both the path to the recently inserted node and the quality metadata

associated to this new node.

3.4.2.2 Deleting a node

The Delete operation takes as input parameter a path p= vl,v2,- ,Vk. The Delete operator

navigates path p to its last node vk, and deletes this node. When the node vk is deleted from the

hierarchical data model, the quality scores of vk's parent (vk-1) need to be recomputed to reflect

the deletion. Next we sketch the steps involved in updating the quality scores under a data

deletion.

* Step 1. If vk is a leaf HOde (atomic data item), delete vk.

* Step 2. If vk is an internal node (complex data item), we need to distinguish two cases: A)
single node deletion, and B) subtree deletion.

* Step 2A. Single node deletion case: Move vk' s child nodes to path vl,v2,- ,Vk-1 SO that they
become direct descendants of vk-1, and keep their quality scores unchanged. Then delete vk.

* Step 2B. Subtree deletion case: Delete vk and all its descendants.

* Step 3. Update the quality scores of node vk-1 and propagate this update to all its ancestors,
as described in the Measures Section.

This operation returns the updated quality metadata of the deleted node' s parent, vk-1.

3.4.2.3 Updating a node

The Update operation takes two input parameters: a node vkne"wWhich is the updated

version of node vk, and a path p = vl,v2,- ,Vk. The Update operator navigates path p to its last









node vk, and updates this node with its newer version, vkne". The scores of the quality dimensions

of vk then need to be updated to reflect the change in the contents of vk. Next we sketch the steps

involved in updating the quality scores under a data update.

* Step 1. If vk is a leaf HOde (atomic data item), update vk aCCOrding to vkne", and recompute
the quality scores of vk as described in the Measures section.

* Step 2. Ifyvk is an internal node (complex data item), update vk aCCOrding to vkne", and find
a correspondence F(A,B) between the set A of direct descendants of vkne" and the set B of
direct descendants of vk. Then, for all pairs of nodes (c,,c,) in f where c, is a child of vknew
and c, is a child of vk, TOCUTSIVely call the Update operation with parameters c, and path(c,).
For all child nodes c, ofyfe"" that do not map to child nodes in vk, call the Insert operation
with parameters c, and path(c,). For all child nodes c, of vk that do not map to child nodes
in vkne", call the Delete operation with parameters c, and path(c,).

* Step 3. Propagate the effect of the update to all ancestors of vk, as described in the
Measures section.

This operation returns the updated quality metadata of the updated node.

3.4.3 Complexity Analysis of the Operations

We now provide the complexity analysis of the quality-aware operations described

previously. Let p= vl,v2,- k be the path to a node v in the data tree with v = vk, and pl be the

length of path p measured as the number of nodes along the path (pl = k). Also, let n be the total

number of nodes in the tree, cna be the maximum number of children nodes of a node in the tree,

and tread(v) be the time required to read the contents (data values) of node v. We reuse some of

the definitions from the Complexity Analysis of the Measures section, in particular: 1 (the

number of leaf nodes in the tree), c (the number of child nodes of an internal node of the tree), d

(the length of the largest data string stored at a leaf node), s (the length of the sequence string of

record r), 11 (the number of literature links of r), I, (the number of gene links of r), I, (the number

of structure links of r), lo (the number of other links of r), and 1,, (the number of neighbor links of

r). For the following analysis, we assume that v is the node on which the operations will be

performed, and its path is denoted by p.









3.4.3.1 Complexity of the query operations

Select. The Select operation takes time O(pl + tread(v)) since we need to traverse the path p

to reach node v and read its contents. Our analysis does not include the time required to actually

search for node v (i.e., find its path) in the tree because we assume that all operations are given

the path of v as input and therefore v is found in O(pl). Consequently, v has to be searched for

(and its path found) before any operation can be invoked. Since this is a preprocessing step

common to all our operations but is not part of the operations themselves (as defined here), we

intentionally exclude its time complexity, which is O(n), from the analysis.

3.4.3.2 Complexity of the maintenance operations

The time needed to perform each Insert, Update, and Delete operation is nav + op + prop,

where nav is the time needed to navigate through path p to node v, op is the time needed to

perform a given operation (e.g., insert, update, or delete) on node v, and prop is the time needed

to propagate the changes up the tree to the ancestors of v (see Figure 3-2 for an illustration of a

maintenance operation). The nav time is O(pl) for all operations. The prop time is O(pl~cna) for

all operations. The op time is analyzed next.

Insertion of v. The time to perform an insert operation at node v depends on whether we

insert a single node or a subtree. If a single node is inserted and this node is not the sequence

node, insertion time is O(1) since the initialization of the quality measures of an atomic data item

can be done in constant time (see the Complexity of the Measures Section). However, if the node

being inserted is the sequence node (i.e., the atomic data item representing the sequence data),

insertion time is O(s) because the uncertainty measure needs to be computed in this case. If on

the other hand a subtree s is inserted, the insertion time is O(nr ) where nr is the number of nodes

in s. This is because we have to initialize the quality measures of each node within the s subtree.

Moreover, if s contains the sequence node, an additional cost of O(s) is incurred. In summary, for









insertions not involving the sequence node, op is O(1) when a single node is inserted, and O(nr )

when a subtree is inserted. For insertions involving the sequence node, op is O~as) when a single

node is inserted, and O(nr + s) when a subtree is inserted. In the special case when the inserted

subtree represents a record, op is O(n+s+1,,+1;+1,+1s+lo).

Deletion of v. The time to perform a delete operation at node v is constant since we do not

update the quality metadata of v or any of its descendants; hence most of the time taken by this

operation is spent in the propagation phase. For deletions, thus, op is O(1).

Update of v. The time to perform an update operation at node v depends on whether v is a

leaf node or internal node. If v is a leaf node, the update time is Oldnew* dold) with dew and dold

being the newer and older contents of v, which results from the worst-case time for the stability

measure (see the Complexity of the Measures section). Instead, if v is a subtree s, the update time

is O(c;;m' c;;m h,) where h, is the height of subtree s, and c;,,a' and c;;m are the newer and

older values of the maximum number of child nodes for an internal node in s, respectively. Such

time complexity is driven by our heuristic algorithm4 for finding a correspondence between the

direct descendants of s' (where s' denotes the updated subtree) and the direct descendants of s.

Moreover, if s contains the sequence node, an additional cost of O(s 2) is incurred. In summary,

op is Oldnew* dold) if a leaf node is updated, and O(c;;a '*c;;mxh,) if an internal node is updated.

For updates involving the sequence node, op is O(s 2 C+a C,,nax,,h,) if an internal node is

updated. In the special case when the updated internal node represents a record, op is O(s 2

c;max *c;;;*hs +1,,+l;+lg+1s+lo).

Combining the nay, op, and prop times calculated above, the final time complexity for

each maintenance operation is obtained. These results are shown in Table 3-5. Note that the


SThis algorithm uses specifics of the XML schema used, knowledge about the "key" elements for identification
purposes (in the presence of parent elements with same name), and an approximation to the edit distance.









Table 3-5. Time complexity of the qluality-aware operations.


Type of
Operation node
Insert Leaf node
(non-seq)
Leaf node
(seq)
Subtree
(non-seq)
Subtree
(seq)
Subtree
(record)
Delete Any
node
Update Leaf node
Subtree
(non-seq)
Subtree
(seq)
Sub tree
(record)


Time for
navigation
O(pl)


Time for
propagation
O(pl *cmx)

O(pl *cmx)

O(pl *cmx)

O(pl *cmx)

O(1)

O(pl *cmx)

O(pl *cmx)
O(pl *cmx)

O(pl *cmx)

O(1)


Time for operation
O(1)


Final complexity
O(pl *cmax)

O(s +pl *cmax)

O(nr +pi *cmx)

O(ns,+s+p1 *cmx)

O(n+s+l,+

O(pl *cmax)

O(dnew*dold +Pl *c)
O(cmax *cmax'"hs +
pl *cmax)
O(s 2 fl p Cmax
cma *cmax"hs)
O(s 2
cmax' *cmax~h,
+In+6+4+1,+10)


O(s)


O(pl)

O(pl)

O(pl)

O(pl)

O(pl)

O(pl)
O(pl)

O(pl)

O(1)


O(n,+s)

O(n+s+,+

O(1)

O(dnew*dold)
O(cmax' *cmax *hs)

O(s 2 +
cmax' *cma*hs)
O(s 2 +
cmax' *cmax*h,
+n+6+1,+,+10)


length pl of a path p is always upper-bounded by the height of the tree. Assuming that the tree is

roughly balanced, we have pl < log(n).









CHAPTER 4
QUALITY MANAGEMENT ARCHITECTURE

The Quality Metadata Architecture (QMA) enables the integration of our Quality

Estimation Model (QEM) with an existing biological repository. The key constraint in the design

of the QMA was to minimize the changes that need to be made to the existing biological

repository .

4.1 Usage Scenario

The usage scenario described here serves as a basis for the design and development of our

quality management architecture, and illustrates what we predict to be the common integration

approach of our quality estimation model with existing biological databases. The two main

components we address are: 1) the source of biological data, and 2) the software layer that

embeds our Quality Metadata Engine (or QM~-engine, for short).

We assume that the biological information of interest resides in an external data source

managed by an independent entity. Access to its contents is limited to what the source offers

through its API. In addition, the owner of the data source may impose restrictions on when to

query the database and how frequently. This external data source has its own user and

administrator interface, through which data is queried and modified on a regular basis.

On the other hand, the QM-engine is managed separately by the quality providers and is

considered to be 'local' to them (i.e., it runs on their servers). The QM-engine computes, stores,

and updates the quality assessments of the underlying biological data. Such quality assessments

or metadata are stored locally in a metadata source. Likewise, a local cache of the external data

source is maintained by the QM-engine. A cache coherency strategy needs to be implemented

such that changes to the original data in the external data source are replicated in the local cache.

For this purpose, periodic update requests are sent to the external data source by the QM-engine.










Query processing is also part of the functionality of the QM-engine; it would enable the creation

of a quality-augmented interface that allows users to quickly and easily retrieve quality metadata

along with the biological data of interest. The User Interface is separate from the QM-engine, but

it should generally be able to support quality specifications given by a user (e.g., rank results

based on a particular quality dimension, show only a subset of the quality dimensions, and filter

out results whose quality score is lower than certain threshold).

Following good software engineering practice, we modularize the QM-engine such that

small modules handle specific tasks and interact with other modules through well-defined service

interfaces.

4.2 Overview of Reference Architecture

The reference Quality Management Architecture is shown in Figure 4-1. The two main

components of the usage scenario described in the previous section are represented by the

External Data Source and the Quality Metadata Engine (QM-engine), respectively. The QM-

engine contains a repository for the biological data that is locally cached (Local Cache) and a

repository for the quality metadata that is locally-maintained (Metadata Source). At the core of

the QM-engine is the Quality Layer, which handles data and metadata initialization, loading,

refreshing, and maintenance. The Quality Layer is divided into five main modules: Data

Manager, Metadata Manager, Maintenance Manager, XML Wrapper, and Query Processor.

Next, we describe each of the conceptual components of the QMA and provide the functional

specification for the modules in the Quality Layer.

4.2.1 External Data Source

The External Data Source is a repository of biological information with its own native API.

Typically, there exists an Information Retrieval System with its own User Interface for providing

queriable access to the source data.












admin




maintenance
operations


user


user





Is r e~sults


Figure 4-1. Reference Quality Metadata Architecture.



4.2.2 Quality Metadata Engine

4.2.2.1 Local cache

The Local Cache stores a subset of the data from the External Data Source in a local

database. Data in the Local Cache is represented using the XML model, regardless of what the

original representation of the biological data is at the External Data Source. Although we wanted

to keep the QEM as general as possible, we adopted XML as its model since most of the popular

genomics repositories provide support for XML. The conversion between the original data

format and the XML format used in the Local Cache is handled by the XML Wrapper. The Local










Cache provides an XML API to the Quality Layer directly above. We have more to say about

the specific XML implementation in Section 4.3.

4.2.2.2 Metadata source

The Metadata Source is a database that stores the quality metadata of the biological data in

the Local Cache. It can either be a relational or XML-enabled database since both

representations can accommodate well the quality metadata. The Metadata Source provides an

API to the Quality Layer directly above.

4.2.2.3 Quality layer

The function of the Quality Layer is to manage a set of interacting software components

which together maintain the quality metadata associated with the locally cached data, and

provide access to both data and metadata. The Quality Layer therefore interacts with the API of

the Local Cache and with the API of the Metadata Source.

The external inputs of this component are:

A user request filtered by the User Interface.
Data fetched from the External Data Source.

The external output of this component is a response to the user request to be displayed by

the User Interface.


Data Manager. The function of the Data Manager is to control access to the Local Cache.

Data can be queried or changed only through operations specified in the Data Manager Service

Interface (DMSI).

The Data Manager interacts with the following components:

With the Local Cache through the Cache's XML API.
With the Query Processor and Maintenance Manager via the DMSI.

The tasks to be performed by the Data Manager are:










On Query Processor's request, retrieve quality metadata from the Local Cache.
On Maintenance Manager' s request, update data in Local Cache.
On Maintenance Manager' s request, add data to Local Cache. If the Cache is full, evict a
record, notify the Metadata Manager of the deletion, and insert the new data in the Cache.
On Maintenance Manager' s request, delete data from Local Cache.

Metadata Manager. The function of the Metadata Manager is to control access to the

Metadata Source. Quality metadata can be queried or changed only through operations specified

in the Metadata Manager Service Interface (MMSI).

The Metadata Manager interacts with the following components:

With the Metadata Source through the Source's XML API.
With the Maintenance Manager and Query Processor via the MMSI.

The tasks to be performed by the Metadata Manager are:

On Query Processor's request, retrieve quality metadata from the Metadata Source.
On Maintenance Manager' s request, update quality metadata in the Metadata Source.
On Maintenance Manager' s request, add quality metadata to the Metadata Source.
On Maintenance Manager' s request, delete quality metadata from the Metadata Source.
On Maintenance Manager' s request, age quality metadata in the Metadata Source.

Maintenance Manager. The function of the Maintenance Manager is to maintain

coherency between the Local Cache and the External Data Source, and to enforce consistency

between contents of Local Cache and Metadata Source. It also handles cache misses.

The Maintenance Manager interacts with the following components:

With the XML Wrapper.
With the Data Manager through the DMSI.
With the Metadata Manager through the MMSI.

The tasks to be performed by the Maintenance Manager are:

Synchronize the Local Cache with the External Data Source on a regular basis with the
help of the XML Wrapper.
Handle Local Cache misses.
Notify the Metadata Manager of any data updates (or deletions) to the Local Cache.










XML Wrapper. The function of the XML Wrapper is to fetch data from the External Data

Source as needed. The external input of this component is the data fetched from the External

Data Source. The internal interaction of this component is with the Data Manager. The task to be

performed by this component is to fetch, upon Maintenance Manager' s request, data from the

External Data Source into a local working space and convert it to XML format.

Query Processor. The function of the Query Processor is to determine what data and

metadata is needed to answer the user query, to request this information from the Data Manager

and Metadata Manager, to combine the retrieved data and metadata to answer the user query, and

to send back this response to the User Interface for display. The external input of this component

is a user request filtered by the User Interface. The external output of this component is a

response to the user request to be displayed by the User Interface.

The Query Processor interacts with the following components:

With the User Interface.
With the Data Manager through the DMSI.
With the Metadata Manager through the MMSI.

The tasks to be performed by the Query Processor are:

Serve query requests from the User Interface.
Send data and metadata requests to the Data Manager and Metadata Manager,
respectively, in order to execute a query.
Combine the retrieved data and metadata in a meaningful way so as to answer the user
request.


4.3 Architecture Implementation

4.3.1 Data Model: XML and Schema

In the previous section it was mentioned that data would be stored as XML in the Local

Cache. Here, we provide the details of the specific XML format used in our implementation of

the architecture.









4.3.1.1 Base XML format

The format we use to store data in the Local Cache is a modified version of the INSDSeq

XML format. INSDSeq is the official supported XML format of the International Nucleotide

Sequence Database Collaboration (INSD) [20], formed by DDBJ, EMBL, and GenBank. Figure

4-2 shows a fragment of a genomics record represented in this format. The INSD specifies the

INSDSeq format using a Document Type Definition (DTD) However, an equivalent XVI\L

schema6 can be obtained by means of the NCBI' s DATATOOL [34], which is a utility program

designed to convert ASN.1'7 specifications into XML DTD or XML schema, and DTD into XML

schema. The latest version of the INSDSeq's schema (Version 1.4, 19 September 2005)

automatically generated using DATATOOL (version 1.8.1, 18 January 2007) is also available

online from the NCBI website [38]. The complete XML schema of the INSDSeq data format that

we used as base for our XML format is presented in Appendix C.

4.3.1.2 Modified XML format

The original INSDSeq format was modified to fit the needs of our Quality Management

Architecture implementation. In particular, the following four amendments were made: 1) the

addition of an id attribute to every element node, 2) the exclusion of the INSDSet element, 3) the

inclusion of schema annotations into the XML schema, and 4) the substitution of the DTD

declaration for the schema declaration in every record document. We briefly describe the

motivation behind each of these modifications next. Figure 4-3 shows an example record




5 A DTD provides a grammar for a class of documents [62].

6 XML Schema provides a means for defining the structure, content and semantics of XML documents (W3C
Recommendation 28 October 2004) [63].

SASN.1 or Abstract Syntax Notation number One, is an International Standards Organization (ISO) standard for
describing structured data; it is a formal language for the specification of abstract data types [18].












"http://www.ncbi .nlm.nih.gov/dtd/INSDISDSeq. dtd">


AFO3 0859
4065
double
DNA
1inear
PLN
2 1-SEP-1 998
22- SEP- 199 8




source
1..4065


1
4065
AFO30859.1











Figure 4-2. Example of a record represented in the INTSDSeq XML format (only a fragment is
shown).












xmlns= "http://biodq.cise.ufl.edu"
xsi: schemaLocation= "http://biodq.cise.ufl.edu
http:.//biodq/schemas/INSDSeq_QM.xsd"
id="i1" >
AFO3 0859cs
4065
double
DNA
1inear
PLN
21-SEP-1998 22-SEP- 1998




source


1
4065
AFO3 0859. 1











Figure 4-3. Example of a record represented in the INTSDSeq_QM XML format (only a fragment
is shown).










represented in the modified INSDSeq format, and Appendix D contains the modified XML

schema.

The first modification was needed in order to associate the quality metadata stored in the

Metadata Source to the data stored in the Local Cache. Specifically, an identifier was required

for every element node in the data tree (XML document) of every record in the Local Cache, so

that the same identifier could be attached to the corresponding quality metadata in the Metadata

Source. Since records from the NCBI databases already possess an identifier called accession

number we used a combination of the accession number and a consecutive number assigned to

every node, as our element identifier. For this purpose, we added 'id' values that would contain

the consecutive number associated to each element node in the document. The type of these

identifiers is xs:ID, as defined by the W3C Recommendation of September 9, 2005 [64].

Inclusion of the 'id' attributes affects both the schema and instance" documents (see Figure 4-3

and Appendix D).

The second modification, which consisted of the exclusion of the INSDSet element, was

not strictly required but yielded a cleaner data model. In the original format (see Appendix C),

the INSDSet element simply acts as a container for INSDSeq elements, which are the really

relevant elements because they represent a record. When an XML document contains a single

record, the presence of the INSDSet element obscures the fact that there is only one record since

one would expect to see more than one component element (of type INSDSet) in the set element

(of type INSDSet). In that case, it seems more natural to remove the INSDSet element and leave

the INSDSeq element as the principal element (i.e., the document's root element). We consider



SThe accession number is the record identifier used across the NCBI databases.

9 IHStance documents are XML documents that conform to a particular schema, i.e., can be validated against it.










that the INSDSet element is useful for grouping records but it is not so convenient in our context,

where the manipulation (i.e., insertion, deletion) of records will be done on an individual basis.

Given that no group or set of records will be inserted as an entity per se in the Local Cache, we

decided to remove the INSDSet element from our schema. Deletion of this element affected both

the schema and instance documents (see Figures 4-2, 4-3 and Appendix D).

The third modification, namely the insertion of schema annotations into the XML schema,

was necessary for customizing the generation of obj ects, types, and tables during the schema

registration process in Oracle XIM DB1o (Oracle XML DB was chosen as the DBMS for the

Local Cache) [2]. The schema annotations also control how instance XML documents get

mapped to the database. Details of the annotations will not be discussed here, but can be found in

Appendix D. Inclusion of these annotations has a direct effect on the schema document only.

The fourth and last modification, namely the substitution of the DTD declaration for the

schema declaration in every record document, was made in order to use the structured storage of

XML documents in Oracle XML DB. Using the structured storage option, an XML document is

shredded and stored as a set of SQL obj ects (the exact mapping depends on the options specified

when the XML schema is registered) rather than as a Character Large Object (CLOB), and this

has several advantages including optimized memory management, reduced storage requirements,

B-tree indexing, and in-place updates [2, 16]. Since structured storage is only available when the

XIMType" table has been constrained to an XML schema, the XML documents we insert must

necessarily be instance documents of the schema associated to the Oracle table, and as such,

should declare the schema they conform to. Hence, there is no room for the DTD in this context,


'0 Oracle XML DB is a set of Oracle Database technologies related to high-performance XML storage and retrieval
[2, 16].
11 XMLType is a datatype used in Oracle to define columns or tables that contain XML.










and we have to replace the DTD definition line in every record document for the appropriate

schema declaration. This modification affected instance documents only. See Figures 4-2 and 4-3

for an illustration of how a record in the original INSDSeq format refers to a DTD while in the

modified format refers to a schema.

4.3.2 External Data Source: the NCBI's Nucleotide and Protein Databases

The External Data Source was previously described as a repository of biological

information with its own programmatic and user interface. In the context of our QMA

implementation, we selected the NCBI's Nucleotide and Protein databases as our External Data

Source. The widespread use of these databases by the scientific community, the availability of

versioning information (in the form of revision history) about the stored records, and the

availability of relatively friendly programmatic interfaces, were reasons that influenced our

deci sion.

The Nucleotide database contains all the sequence data from GenBank, EMBL, and DDBJ,

the members of the International Nucleotide Sequence Database Collaboration. The Nucleotide

database is divided into three main subsets: EST (Expressed Sequence Tags), GSS (Genome

Survey Sequences), and CoreNucleotide (which comprises the remaining nucleotide sequences

not in EST or GSS). On the other hand, the Protein database contains sequence data from

translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as

protein sequences submitted to Protein Information Resource, SWISS-PROT, Protein Research

Foundation, and Protein Data Bank [41].

4.3.2.1 Entrez Retrieval System and Interface

Records from the maj or NCBI databases (including the nucleotide and protein databases)

are accessible via Entrez [61, 36, 40], the NCBI retrieval system. The Entrez system covers

about 91 million nucleotide and protein sequence records from several sources (as of 2006) [61].









Records can be retrieved from Entrez in different formats (e.g., XML, ASN.1i, FASTA, flat file)

and downloaded singly or in batches. There is a Web-based interface to the Entrez system named

Global Query, which is the default the default search engine on the NCBI homepage [61]. There

is also a programmatic interface to Entrez, provided by the Entrez Programming Utilities (a.k.a.

E-Utilities or eUtils) [61, 40], which is a suite of eight server-side programs used to search, link

between, and download from, the Entrez databases.

4.3.2.2 FTP Interface

Many of the resources (data and tools) that NCBI provides are available by FTP. Complete

bimonthly releases and daily updates of the GenBank (nucleotide) and RefSeq (nucleotide and

protein) databases are available from the NCBI FTP site (ftp://ftp.ncbi .nih.gov/) [5]. GenBank

releases are distributed in compressed flat-file and ASN. 1 formats. RefSeq releases are

distributed in compressed flat-file format as well as in binary format.

4.3.3 Local Cache: an XML Database

Previously, the Local Cache was described as a database capable of handling

semistr-uctured data. Here we provide details about how the Local Cache is actually

implemented. We chose Oracle XIM DB (10g Release 2) [47] as the DBMS for the Local Cache

after examining freely-available databases that provided XML support, including eXist [27],

Montag [8], MonetDB [13]. Oracle XML DB was selected because it offered many options for

storage, APIs, content management, as well full support for Xeueryl2, SQXL %13, and SQL

functions for updating XML [2]. As opposed to newer XML databases, Oracle XML DB

represented a relatively mature product with stable releases, comprehensive documentation, and

the support of a world's leading database company.

12 XQuery is an XML query language (W3C Reconunendation 23 January 2007) [65].

13 SQL/XML is an SQL standard for XML (ISO/IEC 9075-14:2005 (E) draft, May 2005).









4.3.3.1 XML schema registration

Since the data managed by the Local Cache has an associated XML schema (see Appendix

D), we use the structured storage option available in Oracle XML DB, which persists XML data

(i.e., instance documents associated to a schema) into a set of relational tables created when the

schema is registered with Oracle. Our INSDSeq_QM~ schema was registered with Oracle under

the XML schema URL14 http:~~"~ /w;J st/inus~ /INSDSeq_Q2~xsd, which has to be referenced by

every instance document that we insert in the database. Such reference is contained in the

schema declaration line of the instance documents, specifically in the xsi:schemaLocation

attribute (see Figure 4-3) which is used when the referenced schema declares a targetNamnespace

(in our case, the targetNamespace of the schema is http:~~l,~ /w;J el/f/ up e).

4.3.3.2 Table creation

After registering our XML schema with the database, we created an XI Type table named

CACHE, constrained to the global element INSDSeq defined by the INSDSeq_Q2~registered

schema. In Oracle XML DB, when a table or column of type XI Type has been constrained to a

particular element and a particular XML schema, it can only contain documents that are

compliant with the schema definition of that element. In our case, the CACHE table can only

store instance documents of the INSDSeq_QM~ schema.

4.3.3.3 Cache replacement strategy

Per definition, the Local Cache is of limited size, which in most cases does not permit the

storage of the entire source database. The cache size is a parameter that can be adjusted

according to the expected system usage. We have to address the issue of what cache replacement

strategy to use whenever the Local Cache becomes full and new data needs to be added. In our



14 The XML schema URL is a unique identifier for an XML schema, internally used by Oracle XML DB.









current implementation, the Clock A Ilgo within (also known as Second Chance) is used because it

is an efficient approximation to the Lea~st Recently Used (LRU) strategy [19].

The Clock Algorithm works as follows. The records (buffers or pages, in other contexts)

are arranged in a circle (in practice, a circular list), and each has an associated "flag" (or bit)

which is either 0 or 1. When a record is added to the cache, its flag is set to 1. Similarly, when a

record is accessed, its flag is set to 1. There is a "hand" that always points to one of the records,

and rotates clockwise when searching for a victim record (to throw out of the cache). If a victim

record is needed (i.e., if the cache is full and a new record needs to be inserted), the hand looks

for the first record with a 0 flag, rotating clockwise. When it sees flags equal to 1, it sets them to

0. In this way, a record is only thrown out of the cache if it remains unaccessed for the time it

takes the hand to make a complete rotation to set the record' s flag to 0 and then make another

complete rotation to Eind the record's 0 flag unchanged.

Alternative cache replacement strategies include First In, First Out (FIFO), M~ost Recently

Used (MRU), LRU and random. Each of them has advantages and disadvantages with respect to

efficiency (algorithmic complexity), book-keeping overhead, and thrashing behavior. The Clock

Algorithm chosen for the QMA has the advantage of being efficient, simple, and having low

overhead (only one 'bit' per record is needed), but its worst case (or thrashing) behavior occurs

for sequential scans of the data. In Chapter 6 we mention that a direction for future research is

the development of cache replacement strategies optimized for the typical access patterns of

biological data.

4.3.4 Metadata Source: a Relational Database

The Metadata Source was previously described as a database for storing the quality

metadata of the biological data. Here, we give out details regarding the type of database and

logical representation of the quality metadata within this database. In our implementation, Oracle










Database 10g [14] was chosen as the DBMS for the Metadata Source. We selected it for

consistency across DBMS of the Local Cache and Metadata Source; it was not convenient for us

to have two different DBMS within the QM-engine, especially if we could accomplish the same

task with only one. Although both the Metadata Source and the Local Cache are managed by the

Oracle DBMS, a key difference between these databases is that the former uses a relational

schema for the data whereas the latter uses an XML schema. Hence the Metadata Source is

implemented as a relational database in Oracle.

4.3.4.1 Relational schema

The relational schema used for the quality metadata (shown in Figure 4-4) consists of two

tables named QMNODES and QMDOC. The QMNODES table contains the quality metadata

of XML nodes stored in the Local Cache, so that a row in this table corresponds to a node from a

cached instance XML document. On the other hand, the QMDOC table contains the quality

metadata associated to instance documents; in particular, in contains the quality metadata of the

document' s root node, which summarizes the metadata of all other nodes in the tree. It also

contains metadata that is applicable to the document only (not to the nodes that compose it).

Next, we describe in more detail what each of these tables contains.

The QM_NODES table. A row in this table represents the quality metadata of a node in

an instance document stored in the Local Cache. This table contains the following eight fields or

columns:















QMNODES QMDOC

Accession NodeNumber Q mLastU pd ate
QmLastUpdate Density
N u mbe rOfCh ilIdre n Freshness
Density AgeComplement
Freshness Instability
AgeComplement Uncertainty
Instability Redundancy
Publ ications
OtherLinks
Rank
SeqNodeNumber







Figure 4-4. Relational schema for the quality metadata in the Metadata Source.



* Accession: accession number identifying the instance document to which the node belongs.

* NodeNumber: number that identifies the node within the instance document; it is the value
of the node' s id attribute.

* QmLa~stUpdate: date in which the quality metadata of the node was last updated.

* NumberOfChildren: number of child nodes that the node has.

* Density: score for the Density dimension of the node (defined in Chapter 3).

* Freshness: score for the Freshness dimension of the node (defined in Chapter 3).

* AgeComplement: complement of the age score of the node, i.e., one minus the age score.

* Instability: complement of the stability score of the node, i.e., one minus the stability score.










The primary key of the QMNODES table is composed of three Hields: Accession,

NodeNumber, and QmLastUpdate. A node in an instance document can be identified using

Accession and NodeNumber alone, but QmLastUpdate is needed to differentiate among quality

metadata entries (for the same node) computed on different dates, which facilitates update

operations and recovery at the application level.

The QM_DOC table. A row in this table represents the quality metadata of an instance

document stored in the Local Cache. This table contains the following twelve Hields or columns:

* Accession: accession number identifying the instance document.

* QmLastUpdate: date in which the quality metadata of the document was last updated.

* Density: score for the Density dimension of the document' s root node.

* Freshness: score for the Freshness dimension of the document' s root node.

* AgeComplement: complement of the age score of the document' s root node.

* Instability: complement of the stability score of the document' s root node.

* Uncertainty: score for the Uncertainty dimension of the document.

* Redundancy:~~dddd~~~ddd~~~ score for the Redundancy dimension of the document.

* Publications: score for the Publications dimension of the document.

* OtherLinks: score for the OtherLinks dimension of the document.

* Rank: overall ranking or score of the document.

* SeqNodeNumber: identifier of the sequence node within the document; value of the
sequence node's id attribute.

The primary key of the QMDOC table is composed of two Hields: Accession and

QmLastUpdate. An instance document can be identified using Accession alone, but

QmLastUpdate is needed, as in QMNODES, to differentiate among quality metadata entries

computed on different dates (for the same document), which facilitates update operations and









recovery at the application level. The QMDOC and QMNODES tables are related through a

foreign key that consists of the fields Accession, QmLa~stUpdate, and SeqNodeNumber. Recall

that in our context the sequence node is a leaf node, and it contains the nucleotide or protein

sequence of the biological record. The reason why we have a pointer (foreign key) to the quality

metadata of this particular node from the document' s quality metadata is because domain experts

regard the sequence data as a highly valuable piece of information, and they pay much attention

to changes applied to the sequence. Therefore, based on experts input, we decided that it was

convenient to have a quick way to retrieve the quality metadata associated to this special node

from the document's metadata (since users may query this information often).

4.3.4.2 Index creation

We created two indexes, QM_~NODES~accessioniindex and QM_~DOCaccessioniindex,

over the Accession column of the QMNODES and QMDOC tables, respectively. The aim of

these indexes was to speed up accesses to quality metadata associated to a biological record

identified by an accession number. When an update is executed or a user query is processed, the

QM-engine typically needs to retrieve the quality metadata of nodes belonging to an instance

document (record) identified by a particular accession number, so having an index over the

accession number in each table can speed up these frequent operations.

4.3.5 Quality Layer: Java Classes

We previously mention that the Quality Layer manages a set of interacting software

components that together maintain the quality metadata of the data cached locally. Here we

describe how this layer and its components are implemented.

The Quality Layer is implemented by a set of Java classes which closely match the

components outlined in Figure 4-1. The JRE version used to build and run our project was

version 1.6.0-b105. The classes implementing the Quality Layer are










* Quality/LayerService: this class encloses various component classes that interact with the
purpose of providing quality information about the data cached from the External Data
Source. This class provides methods for starting the service, stopping the service, as well
as for handling a user' s request.

* DbObject: this class encapsulates a real database and offers a standard set of JDBC (Java
Database Connectivity) operations to interact with the database, as well as a special
method for handling XML data. It provides methods for connecting to the database,
executing queries, executing updates (inserts, updates, or deletes), closing and resetting the
database connection, setting the commit mode, as well as commit and rollback methods.

* Datatianager: this class controls access to the database obj ect representing the Local
Cache. This database obj ect is set to auto-commit by default. The DataManager class
provides methods for inserting, deleting, and retrieving XML instance documents to/from
the database. It also has start and shutdown methods.

* Metadatta~anager: this class controls access to the database obj ect representing the
Metadata Source. This database obj ect has the auto-commit property turned off by default
since we want to be able manage insert and update operations as atomic operations, i.e.,
either we insert/update the quality metadata of all nodes within an instance document, or
we do not carry out the operation. So we need to handle transactions and be able to commit
or rollback operation at the application level. The MetadataManager class provides
methods for inserting, deleting, and retrieving quality metadata to/from the database. It
also has start and shutdown methods. Finally, it provides commit and rollback methods.

* Maintenance2~anager: this class is in charge of updating the Local Cache and Metadata
Source upon changes in the External Data Source. It provides methods for performing
daily maintenance (including update and aging of records), and initial loading of records
(which needs to retrieve all previous versions of records). It also has start and shutdown
methods.

* QueryProcessor: this class processes and executes queries by calling methods from the
Data Manager and Metadata Manager to retrieve the necessary information. It provides
methods for processing queries, start and shutdown.

* XI Wurapper: this class acts as the interface between the Extemnal Data Source and the
Quality Layer by fetching data and converting it to the appropriate format for internal use
within the Quality Layer. It provides methods for retrieving data (with and without version
history) and links from the NCBI database, as well as checking for daily updates.

* EUtils: this class handles calls to the External Data Source's interface, E-Utilities, and
passes the results from these calls to the XI Wurapper. It provides methods for calling the
eSearch, eFetch, eLink, eGQuery, and eSummary programs.

* Quali2/etttttttty~tadata:~t~ this class represents the quality information that is computed and stored
about a node from an XML instance document. It has accessor methods (i.e., get and set
methods) for all the quality dimensions included in the quality metadata of a node.










* Quali2/ettttttty~tadatatt ~ oc:~ this class represents the quality information that is computed and
stored about an XML instance document. It has accessor methods (i.e., get and set
methods) for all the quality dimensions included in the quality metadata of a document.

* BatchVI~LProcessor: this class parses the XML documents that contain batches of records
from the External Data Source, and produces a single xml document per record. These
smaller documents are slightly modified so that they become instance documents of our
INSDSeq_QM~ XML schema, and then passed onto the Ma~intenance2~anager for further
processing. It uses SAX (Simple API for XML), which is an event-driven push model for
processing XML, and a de facto standard.

* LinkXVILProcessor: this class parses the XML link data output by the NCBI's ELink
utility program, and extracts counts of publication links, neighbors, and other-links. It uses
SAX.

* Utils: this is a utility class where public constants and generic methods are defined so that
they become available to other classes.

4.3.6 Macro Level Operations Implemented in the QMA

4.3.6.1 Bulk-loading

Bulk-loading takes an input data set from the external data source (in our case, the NCBI

databases) and loads it into the QMA system. This process would typically be run during the

early stage of the integration with the external data source, but it can also be used to load

additional data sets after the initial integration. Bulk-loading involves the computation of all

quality scores, the storage of this quality metadata in the local metadata source, and the storage

of the biological data in the local data cache.

The steps involved in bulk-loading a subset of the NCBI repository into the prototype

QMA system are described next. First, a file containing the accession numbers of the records to

bulk load is read by the Maintenance Manager component of the QMA. Then, the Maintenance

Manager asks the XML Wrapper to retrieve these records (and their previous versions, if

available) from the NCBI repository. The Wrapper sends batch queries to the NCBI database

server, via the NCBI' s E-Utils programming interface, until all records are retrieved. Once the

data has been downloaded, the Maintenance Manager starts processing it, reading one batch-Hile









at a time. If various versions of the same record are available, the records are processed in

ascending order of version number (i.e., oldest version first).

The processing of a record varies depending on whether it is the first version of a record or

not. If the record to process is the first version of a record, the Initialize function is used;

otherwise the Update function is used. The Initialize function parses a record and computes its

initial quality scores. Then, the record is passed to the Data Manager for insertion into the Local

Cache, and its quality scores are passed to the Metadata Manager for insertion into the Metadata

Source. On the other hand, the Update function first has to locate the previous version (and

associated metadata) of the record to update, and to do this it sends requests to the Data Manager

and Metadata Manager, which search for the pertinent data in the Local Cache and Metadata

Source, respectively. After both data and metadata from the previous version of the record are

found, the Update function parses the older and newer versions of the record, comparing their

contents, and updating the quality scores as needed. Then, the updated record is passed to the

Data Manager for insertion into the Local Cache, and its updated quality scores are passed to the

Metadata Manager for insertion into the Metadata Source. The Data Manager deletes the older

version of the record and inserts the new one such that only the most recent version of the record

is kept in the cache. Likewise, the Metadata Manager deletes the older metadata of the record

and inserts the updated metadata that corresponds to the newer version of the record. This

processing of records continues until all downloaded records have been loaded into the QMA.

Finally, the records are "aged" so that the quality scores reflect the time elapsed since the

records were last updated. The Aging function (not to be confused with the quality dimension

named Age) updates the scores of the temporal dimensions only i.e., Freshness, Stability, and










Age. This aging process is also referred to as metadata refreshing, to differentiate it from the

metadata update.

4.3.6.2 Maintenance

Maintenance refers to the process of keeping the contents of the QMA updated with

respect to changes in the external data source (the NCBI repository, in our case). Ideally, the data

and metadata stored in the QMA should be updated as frequently as the external data source is.

In our prototype QMA system, maintenance is performed on a daily basis since the NCBI

databases we use (GenBank and RefSeq) provide incremental daily updates, available from the

NCBI FTP site. Maintenance involves the detection of data changes in the external data source,

the update of data in the local cache to enforce consistency with respect to the external database,

and the update of the corresponding quality scores in the metadata source.

The steps involved in performing the daily maintenance of the prototype QMA system are

described next. First, the Maintenance Manager asks the XML Wrapper to check if updates are

available. The Wrapper then connects to the NCBI's FTP site and checks if daily updates are

available for download. If no updates are found, maintenance proceeds to the aging phase. If

updates are available, the Wrapper downloads them, uncompress them, and converts them to the

appropriate XML format (GenBank's incremental updates are in ASN. 1 format, and RefSeq' s

incremental updates are in binary format). Next, the Maintenance Manager starts processing the

downloaded updates.

Incremental updates are scanned for accession numbers of records currently stored in the

QMA. If no match is found, i.e., if none of the records stored in the Local Cache was updated in

the NCBI repository, then the Maintenance Manager proceeds to the aging step. On the contrary,

if one or more matches are found, the Update function (described in the previous section) is

called on each such record to handle the update of both data and quality scores.









Once all updates are processed, the aging phase follows. Records that were not updated are

aged. Aging or refreshing, as previously described, consists in updating the quality scores of the

temporal quality dimensions of a record to reflect the time elapsed since the record was last

updated.

Although updates represent the core of the QMA maintenance process, the aging of

metadata is also an important part of maintenance. Depending on the number of records stored in

the QMA, daily maintenance could be expensive due to the refreshing of non-updated records

(even if no records are updated in the external data source). However, we expect that the size of

the Local Cache used in practice be small enough to avoid this concern. Furthermore, the cost of

aging a record is much smaller than the cost of updating it because only the temporal dimensions

are changed (hence avoiding expensive computations like the ones required for the Stability

measure, for example).

4.3.6.3 Querying

Querying in the QMA refers to the processing of user requests to retrieve specific data and

metadata. The steps involved in processing of queries in the prototype QMA system are

described next

First, the Query Processor determines what metadata needs to be retrieved based on the

query filter, an optional parameter that specifies which quality dimensions are requested by the

user (if the filter is omitted, all quality dimensions are retrieved by default). The Query Processor

then forwards appropriate requests to the Data and Metadata Managers in order to retrieve the

requested data and metadata. Finally, the information returned by the Data and Metadata

Managers is combined and send back to the user.









CHAPTER 5
EVALUATION

5.1 Objectives

The main purpose of this evaluation is to determine the biological significance of the

Quality Estimation Model and to validate the usefulness of the Quality Management

Architecture. In particular, we focus on the following aspects:

* The usefulness of the chosen quality dimensions in assessing the quality of genomic data.

* The ability of the quality estimates produced by the Quality Estimation Model to
discriminate high versus low quality data.

* The relevancy of the capabilities offered by the Quality Management Architecture.

* The usefulness of the implemented QMA prototype system with respect to its operational
cost.

5.2 Evaluation of the Quality Estimation Model

5.2.1 Challenges of the Evaluation

The evaluation of our Quality Estimation Model was challenging in the following ways.

The first challenge we faced was the lack of a standard benchmark for testing our quality

estimates of biological data. To the best of our knowledge, there are no public genomic data sets

with comprehensive quality assessments comparable to ours, thus making the evaluation of our

model difficult. Biologists are also mainly unaware of the existence of genomic databases with

quality scores (see Table B-6 of Appendix B), as found in our research study that is described in

Section 5.2.2.1. The only two exceptions are the Greengenes [15] and SGN [32] databases,

which offer quality information but are limited to only a few organisms and to very specific

sequences (16S rRNA genes in the case of Greengenes and markers in the case of SGN). To

overcome this problem and be able to obtain quality assessments for a variety of genomic data,










we consulted several experts in the fields of biology and genomics about the quality of genomic

records.

The second challenge was to understand the biologists' perception of data quality, and to

bridge the gap between their perspective and our needs for the purpose of the model evaluation.

In a preliminary attempt to evaluate our model, we recruited five volunteer Ph.D. students

maj oring in biology-related fields at our university, and asked them to provide scores along four

quality dimensions for 20 genomic records (selected by a collaborator from our university such

that both high and low quality records were represented in the sample). This attempt was

unsuccessful because only three of the subj ects responded, and they did not provide the

responses we sought. In particular, only one subj ect provided scores for the quality dimensions,

and several records were assigned the same score (e.g., 0, 0.25, 0.5, etc.), indicating the use of an

underlying discrete scale or categorization. The other two subjects responded that it was difficult

to assign fine-grained scores along the given dimensions, and that the given set of records were

not part of their area of expertise.

The lessons learned some from this first attempt were: i) biologists tend to express their

quality assessments in a discrete scale rather than in a continuous scale, ii) it is difficult for

biologists to confidently evaluate the quality of unfamiliar records (i.e., not from their area of

expertise). These lessons greatly influenced the design of a second research study that we

performed (see Section 5.2.2. 1), in particular the use of a binary scale for quality assessments,

and the freedom to choose the records to be assessed (rather than using a fixed set of records for

all the participants). The challenge still remaining was devising a way to compare the continuous

quality scores produced by our model with the discrete quality assessments provided by the

experts.









The third challenge was due to the practical limitation in the size of the data set for which

quality assessments could be obtained from domain experts. A single expert can reasonably be

asked to evaluate the quality of only a handful of records given the time it takes to do so.

Evaluating the quality of genomic records is a time consuming task because of the number of

aspects that are considered by experts when making such assessments (examples can be found in

Table B-3 from Appendix B). Aspects commonly considered are: amount of annotations, number

of publications, quality of referenced j ournals, consistency of annotations, completeness of the

sequence, number of revisions, and specific information about the gene such as exon/intron

regions, areas of expression, structure, and function. On the other hand, finding domain experts

willing to provide quality assessments even for a small number of records proved to be

challenging. Hence, we are restricted in the number of domain experts and in the number of

records evaluated per expert. However, in order to obtain meaningful results, it was necessary to

evaluate our model over a reasonably large data set for which quality assessments were needed.

5.2.2 Experiments and Results

A series of experiments were conducted, some of them derived from Eindings in previous

experiments; therefore we present each experiment along with its data set, results, and discussion

separately. The core of our experimental evaluation is a research study conducted among several

domain experts, which we describe next.

5.2.2.1 Research study

We conducted a research study among sixteen Hield-experts from universities in the United

States during the months of March to June, 2007. This study was approved by the University of

Florida Institutional Review Board on March 26, 2007 as Protocol #2007-U-0304, titled

"Comparative Study of Automated and Human-based Quality Assessments over Genomic Data."










Purpose of the study. The purpose of the research study was to collect experts'

assessments about the quality of genomic records. This information was to be used to evaluate

the validity and usefulness of the quality estimation model for genomics data.

Participants of the study. Sixteen participants were recruited for our research study. All

of them met the following eligibility requirements: 1) were 20 years old or older, 2) were Ph.D.

students, post-docs, professors, or researchers in a university, 3) had background in areas related

to biology or genomics, and 4) had experience using NCBI databases. Participation in this study

was completely voluntary, and each participant received $100 compensation after completing the

study questionnaire.

Description of the study. The research study questionnaire consisted of two parts. In the

first part, participants were asked to choose a total of 24 records from NCBI's nucleotide or

protein databases, divided in two sets as follows. The first set had to contain 12 records whose

quality the participant considered to be "good" (above average). The second set had to contain 12

records whose quality the participant considered to be "poor" (below average). The participant

provided the accession numbers of the records they chose (not the actual record contents). In the

second part, participants were asked to answer a few questions regarding the criteria they used

for evaluating the quality of genomics records, the relative quality of different databases from

NCBI, and the suitability of quality assessment factors in the context of genomics.

When was the study performed? The recruiting of participants started in April 2007 and

continued until June 2007. We collected responses from the participants during the period of

May to June 2007.

How was the study conducted? The questionnaire of the research study (also referred to

as survey questionnaire) was made available online to facilitate its access by the participants and









to automate the collection of responses. Our survey questionnaire was hosted at the

SurveyMonkey website (URL http://www. surveymonkey.com), and access to it was restricted by

password. A PDF version of our web-based survey is shown in Appendix A, with each page

showed as a separate image.

Responses collected from the study. The responses given by the participants of our study

can be found in Appendix B. The responses to each question are shown in separate tables, and

each question is reproduced (from Appendix A) in the table heading for convenience. For

confidentiality purposes, the participants were assigned a number ranging from 1 to 16; and the

answers were associated with this identifier rather than with the name of the participants.

5.2.2.2 Experiment 1

Our initial set of quality dimensions included Density (without its two sub-dimensions),

Stability, Freshness, Linkage (without its four sub-dimensions), and Redundancy (defined

slightly differently). The relevance of these Hyve dimensions was preliminary verified by expert

collaborators from our university, but a more systematic approach was sought. This experiment

aimed to evaluate the relevancy of our set of quality dimensions for the biologists.

Methodology. To evaluate the biological relevance of our initial set of quality dimensions,

we asked our research participants to evaluate the usefulness of those dimensions in assessing the

quality of genomic data. This was done in Part III of the survey questionnaire (see Appendix A),

where participants had to classify each dimension in one of the following three categories: very

useful, somewhat useful, or not useful. They also had to briefly comment on their choice. A

description and an illustrative example of each dimension were provided to the participants so

that there was common understanding about the dimensions being evaluated.

Additionally, we wanted to know if there were other relevant dimensions that could be

added to the initial set of quality dimensions. For this purpose, we asked our research










participants to describe the criteria they used when evaluating the quality of a genomic record.

This was done in question 1 of Part II of the survey questionnaire (see Appendix A). To avoid

influencing the participants with our choice of dimensions, this question was asked before

introducing our dimensions to them.

Results and Discussion. The responses to the questions in Part III of the survey

questionnaire are shown in Tables B-7, B-8, B-9, B-10, and B-11 from Appendix B. Summarized

results are presented in Table 5-1 and Figure 5-1. Table 5-1 shows the percent of responses in

each category (very useful, somewhat useful, and not useful) for the five quality dimensions.

Figure 5-1 illustrates the usefulness ratings from Table 5-1 using a stacked bar chart.

In Figure 5-1 we observe that at least 50% of the participants considered the Linkage,

Density, and Freshness dimensions to be very useful in assessing the quality of genomic data.

Linkage was the best-rated dimension, while Redundancy was the worst-rated dimension.

Another important observation is that 80% or more of the participants considered all but one of

the dimensions (namely Redundancy) to be at least somewhat useful in assessing genomic data

quality, which validates the relevancy of these dimensions (i.e., Linkage, Density, Freshness, and

Stability). The usefulness of the Redundancy dimension is not strongly supported by the

participants, since only 57% of them considered it to be at least somewhat useful. However,

rather than dropping this dimension from the initial set, we revised its definition so that it would

better reflect the biological notion of redundancy at the sequence level (rather than at the

annotations level).

Now we discuss the results for the second part of the experiment. The responses to

question 1 in Part II of the survey questionnaire are shown Table B-3. A summarized

interpretation of the textual responses from Table B-3 is presented in Table 5-2 and Figure 5-2.










Table 5-1. Participants' assessment of the usefulness of the initial quality dimensions. The
usefulness of the dimensions was evaluated in a 3-point scale: very useful (V),
somewhat useful (S), and not useful (N). The percent of responses obtained in each
category is shown.
Category Linkage Density Freshness Stability Redundancy
V 81% 50% 50% 44% 13%
S 1 3% 3 8% 3 1% 3 8% 4 4%
N 6% 13% 19% 19% 44%




100%


y 80%


*( 60%11111 O Nl~ot useful
a. tMil Somewhat useful
S40% -1 a~I Very useful


0- 20%


0%
LDFSR
Dimension


Figure 5-1. Assessment of the usefulness of the initial quality dimensions by the participants.
The usefulness of the dimensions was evaluated in a 3-point scale: very useful,
somewhat useful, or not useful. L stands for Linkage, D for Density, F for Freshness,
S for Stability, and R for Redundancy.



Table 5-2 shows a comparison between the participants' criteria for quality assessment of

genomic data and our extended set of quality dimensions. Stars represent matches between

experts (across columns) and dimensions (across rows). The last column indicates the total

support for each dimension, based on the number of experts whose criteria matched the

dimension. Some dimensions were grouped together because although we deem them different,










Table 5-2. Comparison between the experts' quality assessment criteria and our quality
dimensions. Experts are the participants of our research study. Dimensions connected
with a slash symbol are related and hence considered a single grouped dimension
here. Stars along rows indicate an expert' s criteria matching a quality dimension. The
last column is the total support received by each dimension, based on the number of
experts whose criteria matched the dimension. This is a summarized interpretation of
the original textual responses shown in Table B-3 of Appendix B.
Expert' s identifier
Dimension 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Support
Density * * 11
Features ******** *10
Publications / * 6
Literature Links
Gene Links 3
Structure Links 2
Other Links * 4
Stability / Age / 3
Freshness
Redundancy * 4
Uncertainty * 5


I I


001 %


80%


6


4


0O%



!0%



00
D T L/P G C O S/F/A R U
Dimension


Figure 5-2. Percent of participants whose criteria for quality assessment of genomic data
matched our chosen quality dimensions. D stands for Density, T for Features, L for
Literature-Links, P for Publications, G for Gene-Links, C for Structure-Links, O for
Other-Links, S for Stability, F for Freshness, A for Age, R for Redundancy, and U for
Uncertainty.










they refer to similar quality characteristics of the data (for example, age, freshness, and stability

are all temporal measures), and the criteria provided by the experts did not allow for a clear

distinction between these dimensions. Also, it should be noted that the participants provided

more criteria than the ones shown in this table, but we did not include them because they were

not general enough, their computation could not be done efficiently, or they were subj ective.

Figure 5-2 presents the percent of expert support for each dimension.

The first dimension to include was Uncertainty, since it had good expert support (over

3 1%) and encompassed an aspect of quality that none of the other dimensions covered. Next, the

Publications and Literature-Links dimensions (grouped in Table 5-2) were included as sub-

dimensions of Density and Linkage, respectively, having over 37% expert support. They were

treated as sub-dimensions because Publications is correlated with Density, and Literature-Links

is correlated with Linkage. The problem with the Publications and Literature-Links dimensions

is that conceptually they refer to the same aspect of quality (i.e., number of published works in

the literature associated to a biological record), but in practice they need to be separated. Ideally,

a count over the literature links of a record would suffice, but in reality such links are not always

present, and furthermore, they can only link to literature databases within NCBI. Hence, we also

need to count the number of textual references in a record to accurately measure the publication

dimension. The reason why we do not drop the Literature-Links dimensions is because we (and

biologists) see value in having a link as opposed to having simply a textual reference (see

responses in Table B-10 from Appendix B). In Figure 5-2 we observe that the expert support for

the joint Literature-Links / Publications dimension is about 37%.

Next, the Features dimension was included as a sub-dimension of Density, with an expert

support of 62% (the highest among the new dimensions). As it was mentioned in Chapter 3, the









Features dimension refers to the amount of feature key annotations [21] in a genomic record,

which are data of biological significance (clearly evident from the high expert support found).

The two sub-dimensions added to Density (i.e., Features and Publications) are mainly unrelated

to one another, but they both overlap with Density. We cannot drop Density because there are

portions of the record information that are not accounted for by of the sub-dimensions. Besides,

we believe Density has its own merit for being a general dimension that can be applied to atomic

or complex data items within a record (not the case for its sub-dimensions), so it can give an

estimate of the amount of information contained in a given part of the data tree (representing a

record).

The next group of dimensions added to our set consisted of Gene-Links, Structure-Links,

and Other-Links, all sub-dimensions of Linkage. As mentioned in Chapter 3, this division was

made in order to exploit existing types of links, and was refined based on feedback gathered

from expert collaborators at our university. From Figure 5-2 we observe that there is moderately

good support for the Gene-Links (18%) and Other-Links (25%) dimensions, but only little

support (12%) for the Structure-Links dimension. Nevertheless, we decided to keep the current

division of Linkage dimensions until further experimental evidence was obtained either against

or in favor of these dimensions.

Finally, the Age dimension which is grouped with the Freshness and Stability dimensions

in Figure 5-2, was added to our set of quality dimensions because it had moderately good (18%)

expert support, and we felt that this temporal aspect of quality was missing (it was certainly not

well represented by either Freshness or Stability). As a final remark, we can see in Figure 5-2

that the initial dimensions have an expert support of at least 18%.









In summary, our initial set of five quality dimensions was extended to a total of twelve

dimensions (including 6 sub-dimensions), based on findings from this experiment. The results

from this experiment show that i) biologists deem our set of dimensions useful in assessing the

quality of genomic data, and ii) biologists' own criteria for evaluating the quality of genomic

data are partially matched by our dimensions; which we believe provide support for the

biological relevancy of the chosen dimensions.

5.2.2.3 Experiment 2

The suitability of our set of quality dimensions was shown in Experiment 1. The next step

is to evaluate the ability of the quality estimates (set of scores for all dimensions) to discriminate

high versus low quality data. A related purpose is additionally sought: to find the key dimensions

(or combinations of dimensions) for classifying data of high and low quality. A 2-point scale

(with categories "high" and "low" quality) is used to make it less challenging for the experts to

provide quality assessments and to reduce bias resulting from a finer scale (levels on a 5-point

scale may have subj ective interpretations).

Data Set. The Expert data set used for this experiment was directly obtained from domain

experts through the research study we conducted (see Section 5.2.2), and consists of 371 records

from the NCBI's Nucleotide and Protein databases, along with quality assessments for each of

them (consisting of a category label: "good quality" or "poor quality"). A total of 187 records in

this data set were classified by the study participants as having good quality, while a total of 184

were classified as having poor quality. We should note that although a total of 384 accession

numbers (record identifiers) were collected in the study (see Tables B-1 and B-2 of Appendix B),

13 could not be included in the data set that was effectively used in the experiments. Reasons

included repeated accession numbers, invalid accession numbers (did not correspond to any









record in the NCBI databases), and null scores for the Uncertainty dimension after loaded in the

QMA (records without a sequence have a null uncertainty score).

Methodology. We loaded the data from the Expert data set into the prototype QMA,

extracted the quality estimates, applied a logarithmic transformation to the scores that

represented counts (Density, Features, Publications, Redundancy, and Linkage sub-dimensions),

and standardized all the scores. To evaluate the ability of the quality estimates for discriminating

data of high versus low quality, we first examined the distribution of quality scores for both the

data classified as "good quality" (hereafter high quality set) and the data classified as "poor

quality" (hereafter low quality set). Next, we built a binary classifier on top of our quality

estimates and compared the predictions made by this classifier with the quality labels given by

the experts. We describe the details of these steps next.

The logarithmic transformation was applied to reduce the influence of extreme values in

the subsequent standardization process (computation of the mean and standard deviation), and to

facilitate the inspection of the score distributions. The dimensions to which this transformation

was applied (i.e., Density, Features, Publications, Redundancy, Literature-Links, Gene-Links,

Structure-Links, and Other-Links) exhibited a prominent skew to the right of the distribution,

which was lessen after the logarithmic transformation.

The inspection of the quality scores distributions for both the high and low quality sets is

done using boxplots. For all quality dimensions, we show the boxplot of each quality set, side by

side, with whiskers set at 3 times the interquartile range (IQR). Also, a statistical test is

performed to test whether the distributions for the high and low quality sets are equal. The test

used is the nonparametric Wilcoxon rank sum test, which tests the hypothesis of equal medians

for two independent samples [52]. We chose this test over the two-sample t test primarily









because it makes no assumption about the distribution of the data (in particular, it does not

assume normality, which does not hold for many of our dimensions' scores), and also because it

is more robust to outliers in the data. Additional 3-dimensional plots are shown to further

illustrate differences in the score distributions of the high and low quality sets, along selected

dimensions.

The classifier used in this experiment was C4.5 [50], a well known and publicly-available

decision tree classifier. We chose it because one can usually analyze its output (rules and

decision tree) to gain insight into the classification problem at hand. C4.5 learns classification

rules and builds a decision tree from a given training set; it then uses the generated decision tree

to classify unseen test data. The minobjs parameter of C4.5, which specifies the minimum node

size required to further partition a node, was set to 5 (default value) in all runs.

We used k-fold cross-validation [3 1, 59] (with k =10) to evaluate how well the model

learned by the classifier predicts high and low quality data. In k-fold cross-validation, the data set

is partitioned into k disjoint subsets of equal size. Then, the classifier is run k times, using one of

the k subsets in turn as the test set and the remaining subsets as the training set. In this way, the

classifier is tested on each of the k subsets exactly once, and the results from the k folds can be

averaged to produce an estimator of the generalization error. In this experiment, the use of cross-

validation was necessary due to the small size of our data set. For each of the 10 folds, we

obtained the decision tree built by C4.5 over the training data, as well as the class-membership

scores for both HQ and LQ classes. The decision trees were later analyzed to find the key

dimensions for classification. The class-membership scores were used to make ROC curves and

score plots, which support the discussion. Class-membership scores are the predicted posterior










probabilities for each class (HQ and LQ), and they were computed with Tanagra[5 1], a publicly

available data mining software for academic and research purposes.

Results and Discussion. Figures 5-3 to 5-14 show the distribution of standardized scores

for each quality dimension, over the high and low quality sets, using boxplots. We can observe

that the distributions of the high and low quality sets for some dimensions have different central

and dispersion tendencies (e.g., Figures 5-3, 5-4, 5-5, 5-6, and 5-8). In Figure 5-15, various

three-dimensional plots of the quality scores for the high and low quality sets along selected

dimensions are shown. Although these plots contain regions clearly dominated by one of the two

quality classes, they also contain regions where data from both quality classes strongly overlap.

To quantify the observed differences in the medians of the two sets, we use the results of

the Wilcoxon rank sum test (at the 5% significance level) shown in Table 5-3. In this table we

observe that the Wilcoxon test' s null hypothesis is rej ected for five of our twelve dimensions:

Density, Features, Freshness, Publications, and Stability; thus indicating that the high and low

quality sets attain significantly different quality scores under these five dimensions.

The main problem faced when attempting to examine the discriminative capacity of our

quality model's estimates is that although there is statistical evidence that the scores of some

dimensions have different distributions for the high and low quality sets, there is still significant

overlap between the distributions of the two quality classes, which makes it difficult to determine

whether an individual dimension will be able to accurately discriminate high from low quality

data.

To address the problem mentioned above, we used our quality estimates to build a

classifier for the Expert data set, which was tested using a 10-fold cross-validation. The use of a

decision tree classifier allowed us to evaluate the discriminating (or predictive) capability of our
















4-










HQ LQ


Figure 5-3. Distribution of standardized scores for the Density dimension over the Expert data
set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.




















HQ LQ


Figure 5-4. Distribution of standardized scores for the Features dimension over the Expert data
set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.





Figure 5-5. Distribution of standardized scores for the Publications dimension over the Expert
data set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.


LQ


HQ


Figure 5-6. Distribution of standardized scores for the Freshness dimension over the Expert data
set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.

















LQ


HQ


Figure 5-7. Distribution of standardized scores for the Age dimension over the Expert data set.
The HQ boxplot shows the scores for the high quality records while the LQ boxplot
shows scores for the low quality records. Boxplot whiskers are at 3 IQR.


Figure 5-8. Distribution of standardized scores for the Stability dimension over the Expert data
set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.


L

























































I I


LQ


HQ


Figure 5-9. Distribution of standardized scores for the Uncertainty dimension over the Expert
data set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.


Figure 5-10. Distribution of standardized scores for the Redundancy dimension over the Expert
data set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.






























Figure 5-11. Distribution of standardized scores for the Literature-Links dimension over the
Expert data set. The HQ boxplot shows the scores for the high quality records while
the LQ boxplot shows scores for the low quality records. Boxplot whiskers are at 3
IQR.


HQ


LO


Figure 5-12. Distribution of standardized scores for the Gene-Links dimension over the Expert
data set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.



























HQ


LQ


Figure 5-13. Distribution of standardized scores for the Structure-Links dimension over the
Expert data set. The HQ boxplot shows the scores for the high quality records while
the LQ boxplot shows scores for the low quality records. Boxplot whiskers are at 3
IQR.


HQ


LO


Figure 5-14. Distribution of standardized scores for the Other-Links dimension over the Expert
data set. The HQ boxplot shows the scores for the high quality records while the LQ
boxplot shows scores for the low quality records. Boxplot whiskers are at 3 IQR.

























1 Ag


Age -1


-2 -2 Dens t


Density


~k6;


\.


.r
~.' 5:
:-` Ox~


1 0 -2-20 0 uncertainty
Age


2
Featu res


-2 .2


-2 -10


Publlications


Density


Uncertainty


Density


Figure 5-15. Three-dimensional view of the Expert data set along selected dimensions (scores
are standardized). A) Plot along the Density, Age, and Publications dimensions. B)
Plot along the Age, Density, and Features dimensions. C) Plot along the Uncertainty,
Age, and Features dimensions. D) Plot along the Age, Features, and Literature-Links
dimensions E) Plot along the Uncertainty, Density, and Publications dimensions. F)
Plot along the Publications, Density, and Literature-Links dimensions.


~i~ir n-'
k~~I










Table 5-3. Wilcoxon rank sum test over the standardized scores for the high and low quality sets
of the Expert data set. A 5% significance level was used for each of the quality
dimensions. The median of each dimension's scores over the two data sets (HQ and
LQ) is shown as reference. The last two columns contain the p-value and outcome of
each test, respectively. The outcome indicates whether the null hypothesis Ho was
rej ected. Results are sorted in ascending order of p-value.
Dimension HQ median LQ median P-value Ho rej ected
Density -0.171 -0.429 0.000 yes
Features -0.301 -0.321 0.000 yes
Freshness -0.407 -0.420 0.030 yes
Publications -0.205 -0.217 0.032 yes
Stability 0.520 0.741 0.035 yes
Uncertainty -0.167 -0.167 0.052 no
Age -0.430 0.128 0.100 no
Redundancy -0.356 -0.356 0.476 no
Other Links -0.326 -0.326 0.660 no
Gene Links -0.276 -0.276 0.892 no
Literature Links -0.236 -0.236 0.920 no
Structure Links -0.273 -0.273 0.998 no


Table 5-4.
Fold
1
2
3
4
5
6
7
8
9
10
Mean


Classifier performance over the Expert data set using
Generalization error (%) AUC
35.1
13.5
35.1
29.7
29.7
27.0
51.4
18.9
37.8
34.2
31.3


a 10-fold cross-validation.


0.63
0.83
0.75
0.71
0.78
0.78
0.47
0.83
0.63
0.67
0.71


combined quality estimates. The classifier selects a combination of relevant dimensions for

classifying the training data, where relevancy is based on the information gain measure [31].

Table 5-4 shows two performance measures for the classifier built in each fold of the

cross-validation, namely the Generalization error and the area under the ROC curve (AUC). We









observe that the average generalization (prediction) error obtained with the C4.5 classifier across

the 10 folds of cross-validation is 31.3%. This corresponds to an average prediction accuracy of

about 69%. A similar result is found when averaging the AUC scores across the 10 cross-

validation folds, which gives 0.71. We also notice that there is a large variance in the results

from different folds. The two most extreme cases correspond to folds 2 and 7, which exhibit the

smallest and largest error percent, respectively.

We also evaluated the classifier by analyzing the class-membership scores, which are the

predicted posterior probabilities of both the HQ (high quality) and LQ (low quality) classes.

Ideally, the HQ-class scores of records in the high quality set should be close to 1, indicating a

high probability of membership to the HQ class. Likewise, the LQ-class scores of records in the

low quality set should be close to 1, indicating a high probability of membership to the LQ class.

These class-membership scores are relevant because they represent all of our quality scores in a

single figure that can be directly used for the classification of high versus low quality data (by

setting a cutoff at 0.5 to make the class prediction).

Figure 5-16 shows the class-membership plots for each of the 10 folds of the cross-

validation, over the Expert data set. Each class-membership plot contains either the HQ-class

scores (blue line with circles) or the LQ-class scores (magenta line with crosses), depending on

what the true class of the data samples is, and has the scores sorted in descending order (from left

to right). When class-scores are above the 0.5 line, the classifier correctly predicts the class;

when class-scores are below the 0.5 line, the classifier makes a prediction error. Hence, in a

class-membership plot, crossing of the 0.5 line and the score line occurs further to the right (or

never occurs) when the classifier performance is good; whereas crossing occurs further to the left

when the classifier performance is poor. Similarly, if we are interested in high-confidence










HQ Fold 1


LQ~ Fold 1 HQ~ Fold 2


LQ~ Fold 2


20 A B 0 5 10 15 20 C 0 5 10 15 D
HQ~ Fold 3 LQ~ Fold 3 HQ~ Fold 4 LQ~ Fold 4







20 E 1 F 1 G 0 5 10 15 20 H
HQ~ Fold 5 LQ~ Fold 5 HQ~ Fold 6 LQ~ Fold 6






O 0 5 10 15 0 5 10 15 20 K 1 L

Sample


Figure 5-16. Class-membership scores for each fold of the cross-validation over the Expert data
set. The blue line with circles represents HQ-class scores; the magenta line with
crosses represents LQ-class scores. A) Scores for true HQ data in the test set of fold
1. B) Scores for true LQ data in the test set of fold 1. C) Scores for true HQ data in
the test set of fold 2. D) Scores for true LQ data in the test set of fold 2. E) Scores for
true HQ data in the test set of fold 3. F) Scores for true LQ data in the test set of fold
3. G) Scores for true HQ data in the test set of fold 4. H) Scores for true LQ data in
the test set of fold 4. I) Scores for true HQ data in the test set of fold 5. J) Scores for
true LQ data in the test set of fold 5. K) Scores for true HQ data in the test set of fold
6. L) Scores for true LQ data in the test set of fold 6. M) Scores for true HQ data in
the test set of fold 7. N) Scores for true LQ data in the test set of fold 7. O) Scores for
true HQ data in the test set of fold 8. P) Scores for true LQ data in the test set of fold
8. Q) Scores for true HQ data in the test set of fold 9. R) Scores for true LQ data in
the test set of fold 9. S) Scores for true HQ data in the test set of fold 10. T) Scores
for true LQ data in the test set of fold 10.










LQ~ Fold 7 HQ~ Fold 8


LQ~ Fold 8


HQ~ Fold 7


M O 5 10 1 20 N O 1 1 2 O 5 1 1 P
SHQ~ Fold 9 LQ~ Fold 9 HQ~ Fold 10 LQ~ Fold 10







s o q 5 0 5 "R 01 S O ""5 0T
Sample


Figure 5-16. Continued.




Table 5-5. Classifier' s prediction rate over the HQ and LQ sets of the Expert data set. Results
are shown for each fold of the cross-validation, as percentages.
Fold Prediction rate for HQ (%) Prediction rate for LQ (%)
1 78.9 50.0
2 90.5 81.3
3 73.7 55.6
4 68.8 71.4
5 61.9 81.3
6 78.9 66.7
7 38.9 57.9
8 80.0 82.4
9 64.7 60.0
10 70.6 61.9
Total 71.1 66.3




predictions, class-scores should be compared against the 0.7 line, following the same principles

outlined above. Table 5-5 shows the percent of correct predictions made by the classifier at each

fold of the cross-validation for the high and low quality sets.

In the class-membership plots from Figure 5-16, we see that the classifier' s predictive

behavior varies moderately across folds. For example, in fold 7 the classifier has a prediction rate



















08

06

04

02


0 02 04 06 08 1




08

06

04


00 ,

0 02 04 06 08 1




08

06

04

02

0
0 02 04 06 08 1




08



04


0,

0 02 04 06 08 1




08

06





0,

0 02 04 06 08 1


08

06

04

02

O
0 02 04 06 08 1




08

06

04



00
0 02 04 06 08 1




08

06

04

02

O
0 02 04 06 08 1




08

06

04

02

O
0 02 04 06 08 1




08

06

04

02

O
0 02 04 06 08 1


False Positive Rate




Figure 5-17. ROC curves for each fold of the cross-validation over the Expert data set. HQ is set

as the positive class. A) ROC curve for test set in fold 1. B) ROC curve for test set in

fold 2. C) ROC curve for test set in fold 3. D) ROC curve for test set in fold 4. E)

ROC curve for test set in fold 5. F) ROC curve for test set in fold 6. G) ROC curve

for test set in fold 7. H) ROC curve for test set in fold 8. I) ROC curve for test set in

fold 9. J) ROC curve for test set in fold 10.









of less than 58% for both high and low quality sets (see Figure 5-16 parts M and N; and Table

5-5). On the other hand, in folds 2 and 4 the classifier has a prediction rate higher than 68% for

both high and low quality sets (see Figure 5-16 parts C, D, G, H; and Table 5-5). We can also see

marked differences in the confidence with which the classifier makes predictions (i.e., reflected

in the magnitude of the class-score). For example, the classifier of fold 4 makes higher

confidence predictions than the classifier of fold 6, especially for the low quality set (see Figure

5-16 parts G, H, K, L). Differences in the predictive behavior of the classifier across folds can

also be observed in the ROC curves from Figure 5-17.

In Table 5-5 we also observe that the classifier correctly predicts more than 70% of the

high quality data in 6 of the 10 folds, while it correctly predicts more than 70% of the low quality

data in 4 folds. In average, the C4.5 classifier can correctly predict 71% of the high quality data

and 66% of the low quality data. This is also illustrated in Figure 5-18, which shows the

combined class-membership scores from all the cross-validation folds (union over disj oint test

sets). From this figure, we observe that the point where the class-scores and the 0.5 line cross is

located at approximately the 70th percentile of the samples (towards the right of the plot).

We also explored was the use of different thresholds for the classification of our data set.

The default threshold used by the classifier is 0.5 since the class-membership scores are actual

probabilities. If we choose to use a higher threshold (e.g., 0.7) with the purpose of making only

high-confidence predictions, it is possible that some data samples fall below that threshold (i.e.,

none of the class-scores is higher than 0.7). For such samples, we will assume that the classifier

cannot confidently predict whether they belong to the HQ or LQ class (hence they will not be

classified). The effect of different threshold values on the prediction rate and data selectivity

(percent of the data that can be classified above the threshold) is shown in Table 5-6, for the high




























Figure 5-18. Class-membership scores across all cross-validation folds over the Expert data set.
The blue line with circles represents the HQ-class scores; the magenta line with
crosses represents the LQ-class scores. A) Scores for the high quality set (true HQ
records, as labeled by experts). B) Scores for the low quality set (true LQ records, as
labeled by experts).



Table 5-6. Effect of threshold value on the prediction rate and data selectivity over the Expert
data set. Results are shown for the HQ and LQ data sets, and for the entire data set
(last two columns).
HQ data HQ LQ data LQ Total data Total
classified prediction classified prediction classified prediction
Threshold (%) rate (%) (%) rate (%) (%) rate (%)
0.50 100.0 71.1 100.0 66.3 100.0 68.7
0.55 98.4 70.7 99.5 66.7 98.9 68.7
0.60 97.9 70.5 98.4 66.9 98.1 68.7
0.65 89.3 70.1 93.5 67.4 91.4 68.7
0.70 80.7 70.2 87.5 66.5 84.1 68.3
0.75 69.5 74.6 79.3 65.8 74.4 69.9
0.80 60.4 72.6 73.9 68.4 67.1 70.3
0.85 43.3 71.6 58.7 71.3 50.9 71.4
0.90 33.7 73.0 40.2 70.3 36.9 71.5


0 50 100 150
Sample


0 50 100 150
Sample


and low quality sets as well as for the entire data set. We can observe from this table that the

prediction rate for the HQ and LQ sets does not exhibit a monotonic increase with higher

threshold values. However, the overall prediction rate shows a roughly monotonic increase when









higher threshold values are used. Regarding data selectivity, we clearly see a decrease in the

amount of data that can be classified when the threshold increases. However, it is interesting to

note that is that when the threshold is increased up to 0.6, most of the data (98.1%) can still be

classified, but as we further increase the threshold, the percent of data that is classified drops

rapidly. The key finding from Table 5-6 is that the gain in prediction accuracy is only marginal

compared to the loose in data selectivity, when the classification threshold is increased.

So far we established that it is possible to build a classifier based on all our quality

estimates that distinguishes the expert-defined high and low quality sets with an accuracy of

approximately 70%. Next we show that some of the dimensions in our original set of quality

dimensions are key for the classification task at hand.

Table 5-7 shows the quality dimensions that are ranked among the top five split-attributes

chosen by the decision tree classifier across the ten folds of the cross-validation. The second

column of this table shows the number of supporting folds for each dimension (data in the table

is sorted in descending order of supporting folds). A fold supports a dimension if the decision

tree classifier built in this fold selected the dimension among its first five split-attributes. The

third column of Table 5-7 shows the average rank obtained by each dimension, across the

supporting folds. The rank of a dimension in a fold is defined by the order in which the

dimension was selected by the decision tree classifier of that fold.

Table 5-7 therefore summarizes our analysis of the ten decision trees built in each of the

cross-validation folds, regarding the most relevant dimensions for classification. The first seven

quality dimensions shown in Table 5-7 (Uncertainty, Density, Age, Features, Literature-Links,

Publications, and Other-Links) have support from more than half of the cross-validation folds,

which makes them candidate key dimensions. The last two dimensions shown in Table 5-7 have










Table 5-7. Most relevant dimensions for classifying the HQ and LQ sets of the Expert data set.
Dimensions ranked among the top five split-attributes selected by the decision trees
across the 10 cross-validation folds are shown. Each row shows a dimension, the
number of supporting folds for this dimension, and the average rank obtained by this
dimension across its supporting folds.
Dimension Number of supporting folds Average rank
Uncertainty 10 1.0
Density 10 2.3
Age 10 3.9
Features 9 4.1
Literature Links 7 4.1
Publications 7 5.0
Other Links 5 3.2
Structure Links 2 3.5
Freshness 1 4.0




support from less than half of the folds, so we do not consider them as candidate key dimensions.

The other three dimensions that are not present in Table 5-7 are dimensions that did not make it

among the top-five split attributes selected by the classifier, in any of the folds. We proceed to

analyze the seven candidate dimensions further.

The first three dimensions from Table 5-7 (Uncertainty, Density, and Age) are supported

by all the cross-validation folds (i.e., by all decision trees). Remarkably, the Uncertainty

dimension is selected as the first split-attribute in all ten trees, which is a very strong evidence of

the usefulness of this dimension for classification. Density is selected as the second split-attribute

in seven of the trees and as the third split-attribute in the remaining three trees, which is also

strong evidence of its usefulness. Age is selected as the third or fourth split-attribute in eight of

the trees, and as the fifth split-attribute in the remaining two trees. We can thus see a clear

pattern of usage of these dimensions within the decision trees, in terms of which ones are used

first in the classification process (clearly shown by the average rank for these dimensions in

Table 5-7). Therefore, Uncertainty, Density, and Age become the initial set of key dimensions.










The next candidate dimension to consider is Features, the fourth dimension in Table 5-7.

This dimension has support from nine of the ten folds, being selected as the fourth split-attribute

in eight of the decision trees and as the fifth one in the remaining tree, for an average rank of 4.1i.

We thus have evidence for the usefulness of the Features dimension in the classification task

(this dimension clearly becomes relevant after the top-three dimensions have been used), and we

add it to our set of key dimensions. The next two dimensions from Table 5-7, Literature-Links

and Publications, are both supported by seven folds, but differ in their average rank. Literature-

Links is selected as the second split-attribute in two of the decision trees, and as the fifth split-

attribute in remaining five trees, for an average rank of 4.1i. Publications is selected as the fifth

split-attribute in all seven trees, for an average rank of 5. Finally, the Other-Links dimension is

supported by fiye folds, and has an average rank of 3.2.

Before deciding about the usefulness the last three dimensions, we tested the classifier' s

performance when each of these dimensions was given as input. For this purpose, we ran three

different 10-fold cross-validations over the Expert data set using i) five dimensions: Uncertainty,

Density, Age, Features, and Literature-Links, ii) six dimensions: Uncertainty, Density, Age,

Features, Literature-Links, and Publications, and iii) seven dimensions: Uncertainty, Density,

Age, Features, Literature-Links, Publications, and Other-Links. Table 5-8 shows the

generalization error for three classifiers built over the Expert data set using different sets of

candidate key dimensions. The results in Table 5-8 indicate that using more dimensions reduces

the error. However, the reduction in the generalization error of the classifier when going from 6

to 7 dimensions is small compared to the reduction obtained when going from 5 to 6 dimensions.

We thus decided to add both Literature-Links and Publications to our set of key dimensions since

their inclusion as input attributes improved the classifier' s generalization ability. On the other










Table 5-8. Comparison of the classification error obtained when using different sets of quality
dimensions over the Expert data set. The generalization error of three classifiers built
using 5, 6, and 7 candidate key dimensions (respectively) is shown as percentage.
Number of dimensions used by classifier
Fold 5 6 7
1 43.2 37.8 37.8
2 18.9 21.6 13.5
3 35.1 40.5 37.8
4 32.4 32.4 29.7
5 29.7 18.9 29.7
6 40.5 32.4 29.7
7 48.7 48.7 51.4
8 29.7 27.0 18.9
9 35.1 37.8 37.8
10 39.5 31.6 36.8
Mean 35.3 32.9 32.3
Stdev 8.3 8.9 10.7




hand, we decided to drop the Other-Links dimension from the set of candidates since its

inclusion as input attribute did not improve much the classifier' s generalization ability. In

summary, our Einal set of key dimensions thus consists of Uncertainty, Density, Age, Features,

Literature-Links, and Publications. The usefulness of these key dimensions for classifying high

and low quality data is demonstrated by comparing the classifier' s prediction error when using

only the key dimensions versus using all the twelve dimensions (Tables 5-4 and 5-8). When

using all the dimensions the average prediction error of the classifier is 3 1.3% (with standard

deviation of 10.4%), meanwhile when using only the key dimensions the average prediction

error of the decision tree classifier is 32.9% (with standard deviation of 8.9%).

5.2.2.4 Experiment 3

The previous experiment showed that a classifier based on all our quality estimates was

able to distinguish (with 69% prediction accuracy) high versus low quality data labeled by

experts. It also showed that similar results are obtained when using a smaller set of key









dimensions. The maj or drawback of this experiment was the use of a rather small data set (recall

that one of our challenges was to obtain a reasonable large labeled data set from the experts). To

address this problem we performed one more experiment, this time over a much larger data set

with labels derived from experts' assessment of the overall quality of certain databases. The

purpose of this experiment is to validate the usefulness of both the entire set of quality

dimensions and the smaller set of key quality dimensions, for discriminating high from low

quality data, over a large data set.

Data Sets. Three data sets were used for this experiment. We describe each of them next.

The EST data set consists of 780 records randomly sampled the NCBI' s EST (Expressed

Sequence Tags) database. The EST database contains only nucleotide records; specifically,

single-pass cDNA sequences (or Expressed Sequence Tags), from a number of organisms [37].

The reason for choosing EST as the source database for one of our samples was because domain

experts pointed out that most records in the EST database were of low quality. This was

confirmed in our research study, as shown in Table 5-9, where the average rank given to the EST

database indicates that it is regarded as a low quality database, at least compared to the other Hyve

databases considered. Hence, a "low quality" label was assigned to all records in our EST sample

data set.

The GenBank data set consists of 780 records randomly sampled from the NCBI's

GenBank database. The sample was specifically obtained from the CoreNucleotide division of

GenBank, which essentially contains all nucleotide records that are not in the EST (Expressed

Sequence Tags) or GSS (Genome Survey Sequence) databases. GenBank CoreNucleotide was

chosen as the source database for one of our samples because the average rank given to this

database by the study participants (see Table 5-9) indicated that it was regarded as the highest









Table 5-9. Participants' quality rankings for six NCBI databases. The last row shows the
average rank of each database (lower ranks indicate higher quality). All other rows
show the number of participants who assigned the rank in column one to the
corresponding database. This is a summary of the original responses shown in Table
B-4 of Appendix B.
Rank dbEST dbGSS dbSTS GenB ank HTC RefSeq
1 0 1 0 9 1 6
2 5 2 2 4 2 2
3 1 2 2 2 2 3
4 2 3 0 0 0 0
5 2 0 1 1 1 0
6 0 1 0 0 0 2
N/A 6 7 11 0 10 3
Avg. rank 3.1 3.2 3.0 1.8 2.7 2.4



quality database, among the six presented. Hence, a "high quality" label was assigned to all

records in our GenBank sample data set.

The Re/Seq data set consists of 780 records randomly sampled from the NCBI's RefSeq

database. Although RefSeq contains both nucleotide and protein sequences, only nucleotide

records were considered for this sample with the purpose of maintaining consistency across data

sets used in the experiments. RefSeq was chosen as the source database for one of our samples

because domain experts indicated that most records from this database were of high quality

(since this is a curated database). Results from our research study (see Table 5-9) confirmed that

RefSeq is regarded as a high quality database (based on the average rank given to this database).

Hence, a "high quality" label was assigned to all records in our RefSeq sample data set.

Methodology. We loaded the data from the EST, GenBank, and RefSeq data sets into the

QMA, extracted the quality estimates corresponding to the key dimensions, applied a logarithmic

transformation to scores representing counts (Density, Features, Publications, and Literature

Links), and standardized these scores. Then, the ability of the key dimensions for discriminating

data of high versus low quality was tested using i) the GenBank and EST data sets, and ii) the









RefSeq and EST datasets. These two pairs of data sets each includes one high quality set and one

low quality set (the GenBank and RefSeq data sets are high quality sets -on average, while the

EST data set is a low quality set -on average).

The steps involved in this experiment closely resemble the ones used in Experiment 2,

namely building a binary classifier over the data set of interest, and measuring its discriminating

capability (using the generalization error, AUC, ROC curves, and class-scores plots). However,

the way in which we selected the test set for the classifier is different in this experiment: instead

of using cross-validation, we separated 30% of the data (randomly chosen) for testing and used

the remaining 70% for training. This could be done because the size of the data sets used in this

experiment was much larger than the size of the Expert data set used in Experiment 2. The

composition of the randomly selected GenBank-EST test set was: 238 high quality (HQ) records

and 223 low quality (LQ) records. The composition of the randomly selected RefSeq-EST test

set was: 237 high quality (HQ) records and 232 low quality (LQ) records. Additionally, we

created some 3-dimensional plots using various combinations of key dimensions to show that

high and low quality data were mostly separable.

Results and Discussion. We first discuss the results for the Genbank-EST data set. Table

5-10 shows the generalization error and AUC measures of performance for the classifier built

over all dimensions and for the classifier built over the key dimensions only. These measures are

computed over the GenBank-EST test data set. Figure 5-19A shows the ROC curve over the

GenBank-EST test set, for the classifier built using all twelve quality dimensions. Figure 5-19B

shows the ROC curve over the GenBank-EST test set, for the classifier built using only the six

key quality dimensions. Figure 5-20 shows six three-dimensional views of the GenBank-EST

test set, along select combinations of key dimensions.










Table 5-10. Classifier performance over the GenBank-EST data set. The generalization error
and AUC measures are shown for the classifier built using all twelve quality
dimensions, and for the classifier built using only the six key quality dimensions.
Dimensions Generalization error (%) AUC
All 0.64 0.999
Key 0.64 0.991




1 -- 1.

o 0.8~ I 0.8
S0.6 0.6-

0. 0.4 0.. 0.4-

S0.2 0.2

0 .,
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False Positive Rate False Positive Rate
A B


Figure 5-19. ROC curves of two classifiers over the GenBank-EST test set. HQ is set as the
positive class. A) ROC curve for the classifier built using all twelve quality
dimensions. B) ROC curve for the classifier built using the six key quality
dimensions.




From Table 5-10 we observe that the generalization (prediction) error of the C4.5 classifier

over the GenBank-EST data set is 0.6%, which corresponds to a prediction accuracy of 99.4%. A

similar result is found with the AUC measure, which is close to 1 (see Table 5-10). Interestingly,

the generalization error is the same for the classifier built over all the twelve dimensions than for

the classifier built over the key dimensions. Likewise, the AUC values for these two classifiers

are essentially the same. This can also be observed from the classifiers' ROC curves in Figure 5-

19. Given that there is no significant difference in the predictive performance of the classifiers

built over all dimensions and the classifier built over the key dimensions, we continue the

experiment using only the key dimensions (plots in Figure 5-20 are based on key dimensions














2.5 o


"Vi~hl


P ii~


a ~p


*~auE~'



Density v .2 2


9 1:_ ~p~?~:f ~b I'.'2~ *C~llp

-1 D
0 1 2 4 2
Age Density


JI~~ .h
'rw i~
'3


2

Density


5
Uncertainty


Density


-2 -5


1
m
,v
~i 0.5
a


Ipur
e


Density


Features


20 01 Lrterature--Linkts


10 15


-2 0-- 5
Uncertainty


Figure 5-20. Three-dimensional view of the GenBank-EST test set along selected key
dimensions (scores are standardized). A) Plot along the Density, Age, and Features
dimensions. B) Plot along the Age, Density, and Literature-Links dimensions. C) Plot

along the Age, Density, and Publications dimensions. D) Plot along the Density,
Uncertainty, and Literature-Links dimensions E) Plot along the Density, Uncertainty,
and Publications dimensions. F) Plot along the Features, Literature-Links, and
Publications dimensions.










alone). In Figure 5-20 parts B, D, E, F we observe that the high quality data is clearly separable

from the low quality data. In Figure 5-20 parts A and C we can see that there is a small overlap

between the high and low quality data, but still they are mostly separable. Hence, combinations

of key dimensions like the ones shown in Figure 5-20 are useful in separating the high quality

from the low quality data.

We now discuss the results for the RefSeq-EST data set. Table 5-11 shows the

generalization error and AUC measures of performance for the classifier built over all

dimensions and for the classifier built over the key dimensions only. These measures are for the

RefSeq-EST test data set. Figure 5-21A shows the ROC curve over the RefSeq-EST test set, for




Table 5-11. Classifier performance over the RefSeq-EST data set. The generalization error and
AUC measures are shown for the classifier built using all twelve quality dimensions,
and for the classifier built using only the six key quality dimensions.
Dimensions Generalization error (%) AUC
All 2.4 0.994
Key 0.4 0.999




1I -- 1,

S0.81 0.8
cu0.6 0.6

0. 0.4 0. 0.4

S0.2~ 0.2

0" 1,
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False Positive Rate False Positive Rate
A B


Figure 5-21. ROC curves of two classifiers over the RefSeq-EST test set. HQ is set as the
positive class. A) ROC curve for the classifier built using all twelve quality
dimensions. B) ROC curve for the classifier built using the six key quality
dimensions.























/$C

~bd~ ,,
r*R
'r'


% C

"tc,* d ~
a~. o


2
0 DesityN


Density


P

.~Jc
~ iO







0 ~22


4' `i : fmB-,~p-


2
Publications


Densrty


Density


5; o.


S4 6 5 5
Density Uncertainty


1-


0,
P O-


.1-


S 10
5
0 Uncertainty


4

Density


-2 -5


Figure 5-22. Three-dimensional view of the RefSeq-EST test set along selected key dimensions

(scores are standardized). A) Plot along the Age, Density, and Features dimensions.

B) Plot along the Age, Density, and Literature-Links dimensions. C) Plot along the
Density, Age, and Publications dimensions. D) Plot along the Density, Publications,
and Features dimensions E) Plot along the Density, Uncertainty, and Age dimensions.

F) Plot along the Density, Uncertainty, and Publications dimensions.









the classifier built using all twelve quality dimensions. Figure 5-21B shows the ROC curve over

the RefSeq-EST test set, for the classifier built using only the six key quality dimensions. Figure

5-22 shows six three-dimensional views of the RefSeq-EST test set, along select combinations of

key dimensions.

From Table 5-11 we observe that the generalization error of the classifier built using all

twelve dimensions is 2.4% (corresponding to a prediction accuracy of 97.6%), whereas the

generalization error of the classifier built using the six key dimensions is 0.4% (corresponding to

a prediction accuracy of 99.6%). An interesting finding is that the classification error is higher

when all the twelve dimensions are used by the classifier than when only the key dimensions are

used. Nonetheless, the AUC measures for the two classifiers are essentially the same (see Table

5-11), which can too be inferred from the ROC curves in Figure 5-21. Since the performance of

the classifier built over all dimensions is at best comparable to the performance of the classifier

built over the key dimensions, the remaining parts of the experiment were carried on using only

the key dimensions. From the plots in Figure 5-22 we see that the high quality data is clearly

separable from the low quality data (i.e., there is no maj or overlap among the two sets, for any of

the plots). Hence, combinations of key dimensions like the ones shown in Figure 5-22 are useful

in separating the high quality from the low quality data.

5.3 Evaluation of the Quality Management Architecture

As it was outlined in Chapter 4, the Quality Metadata Architecture (QMA) enables the

integration of our Quality Estimation Model with existing biological repositories. The purpose of

this evaluation is to highlight the most relevant capabilities of the QMA, and to validate the

usefulness of our prototype QMA system based on its operational costs.









5.3.1 Relevant Capabilities of the QMA

5.3.1.1 Non-intrusive quality metadata augmentation

The QMA makes it possible to augment an existing biological data source with quality

metadata in a non-intrusive way, by acting as a middleware layer between the original biological

repository and the end users. One of the main advantages of the QMA is that it delivers its

quality-aware services with minimal impact on the existing repository. By minimal impact we

mean that no changes are made to the original data schema, format, or storage. The only changes

attributable to the QMA are an increased query load for periodically polling data updates, and an

enhanced quality-aware interface. The QMA makes this minimum-change feature possible by:

* Caching data from the existing repository into a local database, and enforcing coherency
between the cache and the external data source.

* Providing an XML wrapper for the existing repository's native data format.

* Storing quality metadata in a separate repository from the data repository.

* Providing maintenance operations that refresh and update the quality information in the
metadata source whenever changes to the data are detected.

We believe that the non-intrusive augmentation of quality metadata is a relevant feature of

the QMA because it enables a smooth transition from current to quality-aware data sources. If

the quality metadata augmentation was done more intrusively to trade in for performance, it is

unlikely that current repositories would easily adopt such model, especially if many changes to

the existing implementation are required.

5.3.1.2 Support for quality-aware queries

Although the User Interface (UI) is beyond the scope of the QMA, we still had to imagine

how the UI would generally look like and especially what new functionalities would be needed

in order to allow users to take full advantage of the quality information available. As a concrete

example, let us take the NCBI's web interface system: Global Query. This UI allows general









text-based queries, queries by accession number (record identifier), and a variety of options or

constraints for each of these queries. If we want to allow users of this system to have access to

the underlying quality metadata (assume it has been already integrated into the NCBI system),

two new functionalities are needed, namely i) to allow the user to choose if he wants to retrieve

quality information along with the data being queried, and ii) to allow the user to choose which

quality information to bring along with the data. A more refined functionality would additionally

allow the user to specify how the quality information should be used during query processing;

for example, the user could specify a filtering condition based on the value of one or more

quality scores (our quality metadata is in the form of scores along various dimensions), or it

could specify to sort (rank) the results based on a particular quality dimension. The addition of

these new functionalities will convert the original interface into a quality-aware interface. All we

need then is support from the QMA to do the actual query processing.

Inside the QMA, query processing is done by the Query Processor component. In the

prototype QMA that we implemented, the Query Processor handles only queries based on

accession number, but it supports both queries that retrieve only data and queries that retrieve

data plus quality metadata. It also supports queries that retrieve a subset of the quality metadata

by using an optional filter that indicates what dimensions to retrieve. We believe that enabling

quality-aware queries is one of the most relevant capabilities of the QMA because it has a direct

impact on both the way users interact with the biological data source, and the ability to succeed

in obtaining the most valuable data for the task at hand.

5.3.2 Operational Cost for the Prototype QMA System

The operational costs of our prototype QMA system are divided into metadata retrieval and

metadata computation costs. QM retrieval cost refers to the cost increase associated to processing










queries that involve quality metadata. QM computation cost refers to the cost increase associated

to loading and maintaining the quality metadata in the QMA system.

Platform. The time measurements shown in this part of the evaluation were obtained using

a 3.2GHz Pentium 4 machine with 2GB of physical memory and a single IDE disk, running

Microsoft Windows XP Professional. The database system used was Oracle 10g Release 2

Enterprise Edition with buffer pool size set to 100MB. The Java Runtime Environment used was

JRE 1.6.0.

5.3.2.1 Cost of metadata retrieval

We ran four different types of queries in our prototype QMA system: i) data-only queries,

ii) data and metadata queries using all 12 dimensions, iii) data and metadata queries using 6

dimensions, iv) data and metadata queries using 1 dimension. The dimension used for the latter

type of queries was Density, while the dimensions used for the 6-dimension queries were the

"key" dimensions found in Experiment 2 (see Section 5.2.2.3), namely Uncertainty, Density,

Age, Features, Literature-Links, and Publications. A total of 300 records were queried and the

measurements for data and metadata retrieval time were averaged across records. We queried

only records that were already loaded in the QMA system to avoid cache misses. Results of this

experiment are shown in Table 5-12 and Figure 5-23.

Table 5-12 compares the retrieval time for data and quality metadata, per-record. Time

measurements are performed for both record-level and node-level quality metadata, when using

1, 6, and 12 dimensions. Record-level quality metadata refers to the metadata associated to the

root node of the XML tree (document) representing a record. Node-level quality metadata refers

to the metadata associated to every node in the XML tree of a record. In Table 5-12 we observe

that the cost of metadata retrieval is negligible with respect to the cost of data retrieval, even

when using the node-level metadata. Particularly, the amount of time needed for retrieving the













QM 12 dimensions
Data Record Node
2080 4.4 22.4










-0 6



z 12


Table 5-12. Per-record retrieval times for data and quality metadata. Time measurements (in
milliseconds) are performed for both record-level and node-level quality metadata,
when using different number of dimensions.


QM 6 dimensions
Record Node
4.3 17.9


QM dimension
Record Node
3.0


10 20
Retrieval time (ms)

SRecord-level QM m Node-level QM


Figure 5-23. Retrieval time of quality metadata (per-record) for different number of dimensions.
Time measurements (in milliseconds) are performed for both record-level and node-
level quality metadata.



quality metadata is in average less than 1.1 % of the amount of time needed for retrieving the

actual biological data. This proves that the increased cost of quality-aware queries (involving

metadata retrieval) is small.

Figure 5-23 compares the per-record metadata retrieval time of queries involving 1, 6, and

12 quality dimensions, for both record-level and node-level quality metadata. This is an

illustration of the data shown in Table 5-12 (data retrieval time is omitted). We can see in Figure

5-23 that the record-level QM retrieval time increases in 44% when going from 1 to 6









dimensions, while it only increases 14% when going from 6 to 12 dimensions. On the other

hand, the node-level QM retrieval time clearly increases with the number of dimensions with a

constant increase factor of 2.2 both when going from 1 to 6 dimensions and from 6 to 12

dimensions. Another observation is that the retrieval time of node-level metadata when using 1

dimension is 2.7 times larger than the retrieval time of record-level metadata; when using 6

dimensions it is 4.2 times larger, and when using 12 dimensions it is 5.1 times larger. In

summary, we found that the cost for record-level quality metadata retrieval is small compared to

the cost for node-level quality metadata retrieval, and that reducing the number of dimensions

has significant savings in time when node-level metadata is requested.

5.3.2.2 Cost of metadata computation

Cost of bulk-loading. We bulk-loaded a total of 300 records from the NCBI databases into

our prototype QMA system and measured the download time (i.e., data and link retrieval from

the NCBI system) as well as the processing and storage time (i.e., computation and storage of

quality metadata scores, as well as local caching of data). Bulk-loading in the context of our

QMA system involves both data and metadata. However, for comparative purposes we bulk-

loaded the system using only data (i.e., no metadata computation or storage was performed).

Results of this experiment are shown in Table 5-13.

Table 5-13 compares the per-record bulk-load time of data and metadata with the bulk-load

time of data-only. Bulk-load time is divided into two parts: download time, and processing plus

storage time. In Table 5-13 we observe that when both data and metadata are bulk-loaded (versus

data-only), the download time increases by a factor of 7, and the processing and storage time

increases by a factor of 4. When the total bulk-load time is considered, there is an increase factor

of 6 for data and metadata versus data only. Interestingly, we see that the download time is a

significant part of the total bulk-load time, contributing with 71% to the data bulk-load time, and









Table 5-13. Per-record bulk-load times for data versus data and quality metadata. Time
measurements (in milliseconds) for the two main parts of the bulk-load process
(download, and processing plus storage) are shown.
Data Data and Metadata
Download 1844 12346
Processing and Storage 762 2752
Total 2605 15097



with 82% to the data plus metadata bulk-load time. We believe the download time could be

greatly reduced if we were not limited by the "3-seconds rule" imposed by the NCBI

administrators. This rule restricts the frequency of queries to the NCBI databases to no more than

1 every 3 seconds.

Although the cost of data and metadata bulk-loading is significantly large (15 seconds per

record) with respect to the cost of data-only bulk-loading (2.6 seconds per record), this is a one-

time-only cost that we will run into only when the existing biological repository is migrated for

the first time to the quality-aware system enabled by the QMA.

Cost of maintenance. A total of 57 records were used, for which updates (i.e., multiple

versions) were available. These records were first loaded into the prototype QMA system. Then,

maintenance cost was measured for subsequent versions in terms of download time (data and

link retrieval from NCBI) as well as processing and storage time (update and storage of quality

metadata and data). In the context of our QMA system, maintenance involves both data and

metadata; however, for comparative purposes we updated the system using only data (i.e., simply

deleting the previous version of the record and storing the newer one into the data cache).

Results of this experiment are shown in Table 5-14.

Table 5-14 compares the maintenance times for data and metadata per-record. Maintenance

time is divided into two parts: download time, and processing plus storage time. The results in
















15223


Table 5-14 show that when metadata maintenance is performed, the download time increases by

a factor of approximately 700 with respect to data-only maintenance. On the other hand, the

processing and storage time increases only by a factor of approximately 2 for metadata

maintenance. The total maintenance time increases by a factor of 6 for data and metadata versus

data only. This cost increase is similar to the one previously obtained for bulk-loading, and the

reason is that the link-download step that is common to both maintenance and bulk-load

processes has a large cost. This step accounts, in average, for 70% of the maintenance time, and

for 68% of the bulk-load time, when metadata is involved. During maintenance, the data-

download time is actually negligible with respect to the link-download because data updates are

downloaded from the NCBI ftp site in a compressed format, which allows for a low per-record

cost. We restate here that the download time is negatively affected by the practical limitation

imposed by NCBI regarding query frequency (see description of the "3-seconds rule" in previous

section).

It is important to mention that the results of this experiment are based on the atypical

scenario that all records in the cache are updated on the same day. Usually, we expect only a

small fraction of the cached records to be updated on any single day, which would have the

effect of reducing the overall maintenance cost. For GenBank and RefSeq, the percent of records


Download 15 10620
Processing and Storage 2544 4603


Table 5-14. Per-record maintenance times for data versus data and quality metadata. Time
measurements (in milliseconds) for two parts of the maintenance process (download,
and processing plus storage) are shown.
Data Data and Metadata


Total


2560









that are daily updated is about 0.05% and 0.03%, respectively". Assuming the same update rate

for records in the QMA, the amortized maintenance cost per-record is 1.2 seconds, which is 13

times smaller than the cost suggested in Table 5-14.














































15 This estimate was obtained using five daily updates downloaded in the month of October, 2007.









CHAPTER 6
CONCLUSIONS AND FUTURE WORK

6.1 Conclusions

We developed a model for estimating the quality of data in biological databases. This

model defines various quality dimensions and their respective measures. Unlike previous works

in the area, our model is based on quality dimensions that can be quantitatively measured using

data already stored in data sources. Part of the novelty of our approach resides in the way our

quality estimates are systematically computed for biological data represented in a semi-

structured model.

Along with the quality estimation model, we developed the quality management

architecture, which enables the integration of the quality model into an existing biological

database. The key principle of this architecture is to minimize the changes to the existing system,

which we believe is necessary for a smooth transition from current to quality-aware data sources.

The biological significance of the Quality Estimation Model was evaluated using expert-

feedback, gathered through a research study that we conducted among 16 biologists. The

usefulness of the generated quality estimates to discriminate high versus low quality data was

experimentally tested using various data sets from the NCBI databases, for which quality

assessments were either directly obtained from experts or derived from general ratings of the

source databases by the experts. Results of this evaluation show that it is possible to build a

classifier, based on our quality estimates, that distinguishes high quality from low quality data

(labeled by experts) with a prediction accuracy of 69%. Results from the semi-automatically

labeled data sets show that the high quality and low quality data are clearly separable, which is

evident from the prediction accuracy of 0.6% and 0.4% obtained by the classifiers built over the

GenBank-EST and RefSeq-EST data sets, respectively.









The usefulness of the Quality Management Architecture was also evaluated with respect to

its functional capabilities and operational costs (for query, bulk-load, and maintenance

operations), using the implemented QMA prototype. A subset of the NCBI databases was used in

this case as the biological repository to be integrated with the QEM through the QMA. This

evaluation shows that two of the most relevant and beneficial capabilities of the QMA are the

non-intrusive quality augmentation, and the support for quality-aware queries. Evaluation results

of the operational costs of the QMA show that the increased cost of metadata retrieval (in

queries) is less than 1.1 % of the cost for data retrieval. For bulk-loading and maintenance costs,

there is an increase factor of 6 for data and metadata versus data only.

6.2 Contributions

We believe our work represents an important contribution to the area of Data Quality in the

context of biological databases. First, we identified a set of quality dimensions that are obj ective,

measurable, and biologically-relevant. Second, for each quality dimension, we formulated a

quantitative measure that can be computed when data is represented in a semistructured model.

Third, we defined a core set of maintenance and query operations for managing the quality

metadata associated to the biological data, under a semistructured data model. Fourth, we

designed and implemented a quality management architecture that enables the integration of our

quality model (dimensions, measures, and operations) with existing biological repositories. Fifth,

we conducted a research study among sixteen domain experts (biologists) with the purpose of

collecting experts' assessments about the quality of genomic records, which were used during the

evaluation of our quality model. Sixth, we performed an experimental evaluation of our quality

model using the responses collected from our research study, as well as data from the NCBI' s

databases (a public genomics repository widely used by the scientific community). Results from

this evaluation show that our set of dimensions are useful in assessing the quality of genomic










data, and that our quality estimates can be used to build a classifier that discriminates high

quality data from low quality data. Additionally, we evaluated the usefulness of the prototype

quality management architecture that was implemented by showing that a smooth migration from

current to quality-aware data sources will be facilitated by the QMA, and that users of current

data sources will benefit from the quality-aware queries enabled by the QMA.

6.3 Future Work

We now outline some possible directions for future work. One area in which we foresee

future work is the development of benchmarks for biological data quality. In fact, the lack of

such benchmarks was one of the main challenges we faced during the evaluation phase of our

work. We overcame it by means of a research study conducted among expert biologists, but this

study was limited both in the number of participants (only 16 experts) and in the number of

quality assessments that were collected (only 24 per expert). We believe more studies like this

are needed so that more test data is available to future researchers.

A second future research direction is the exploration of new cache replacement strategies

that are optimized for the typical access pattern of biological data. Example queries or sequences

of queries (e.g., data workflow) would need to be collected from users of current repositories in

order to design query profiles that help design cache algorithms. We believe that the link

structure of biological records from sources like NCBI could be exploited to either pre-fetch or

evict data in the cache. As a pre-fetching strategy, data linked to already cached data can be

loaded into the cache before it is actually requested by a user. The hypothesis here is that users

normally expand their initial set of results by searching data linked from the initial set. As a

victim selection strategy, link information can be used to evict the record with least number of

links (the hypothesis here being that highly-linked records are more likely to be accessed in the

future, so it is better to keep them in the cache as long as possible).










Future research could also be directed towards the improvement of the current measures

and/or expansion of the current set of quality dimensions. More efficient metrics and operations

could be investigated, as well as the incorporation of new dimensions based on special

information offered by different biological repositories. Granularity levels at which quality

information is relevant could also be investigated (e.g., record-level may require different quality

dimensions and measures than database-level or sub-record level).

Finally, we envision future research in aspects related to the management of quality. In

particular, our quality management architecture could be improved by the development of a

storage-optimization strategy that exploits the hierarchical structure of the data (in a

semistructured representation) to inherit quality metadata. Savings in metadata storage and

maintenance could be significant if many child nodes have the same quality scores of their parent

node.










APPENDIX A
SURVEY QUESTIONNAIRE

Online Survey Questionnaire presented to the participants of the research study

"Comparative Study of Automated and Human-based Quality Assessments over Genomic Data".






Please provide the informartion requested below.

1. Choose from the NCBI's nucleotide or protein databases 12 records whose
quality you consider to be "Good" (i.e., above average), and write thre accession
number of each record in the spacer provided below (order is not important).
Please choose records from different projects and publications.


2. Choo~se from the NCBI's nucleotide or protein databases J12 records whose
quality you consider to be: "Poor" (I.e., below average), and write the accession
number of each record in the space provided below (order is not important).
Please choose records from different projects and publications.

















Please- answer each of the following questions to the best of your knowledge.

1. What criteria do you use when evaluating the quality of a genomric record? If
you include several criterions, please rank them in order of importance (1 being
the most important, 2 being the second most important, etc). Also, please
include a description of each quality factor you provide.








2. How would you rank the overall quality of the following NCBI data sources?~
Use ranks from 1 (highest quality) to 6 (lowest quality), or N/A (if you have
never used the database). You may use the same rank for two or more
databases if you consider them to be equivalent in terms of their quality.
Ranking



db5TS ~
Ger` Barnk I





3. When using genomics databases, do you usually work with a well defined set
of records on a regular bas~iis or do you work with a different set of records
every tiime? Explain briefly.








4. Do you know of any g~enomics database that provides quality scores for its
da~ta? If so, please men~tionl it here.






















Evaluate the usefulness of the iFollowing fagctors In assessing the quality of~ genomic records. For each f~actorr
select a cat~opry and briefly jurstify your~ choice. Also, feel free to add any comments relevant to how these
factors mnay be enhanced or extended.

1. STABILITY

Description: A. record is stable if its contents (both sequence and annotations)
remain mostly unchanged during various database updates. Convrersely, a
record is unstable if either a significant fraction of its contents changed recently,
or if a series of recent updates affected smaller portions of its contents. An
example of a stab~le record is A~F053747.

How useful do you think STABILITY would be! in assessing the 4uaelity of genomic
records Briefly justify your choice and add comments (if any) in the! space
provided below.
O Ycy -cu
O S*m***** use**










2.DENSITY~

Description: A record is dense if it contains a large amount of information
(especially annotations). An example of a dense record is AM270298.

How useful do you think DENSITY would be in assessing the quality of genomic
records? Briefly justify your choice and add comments (if any) in thre space
po ide blow.


a~~cl~:ublr~u~l~rli~nrclslr9~11*li~:118













O ver **fu











S. FREJSHNIESS

Description: A record is fresh if it has recently been updated, regardless of the
extent of the update (i.e., what fraction of the record's contents changed). An
example of a fresh record is NM 001033493.

How useful do you think FRESHNIESS would be in assessing the, quality of
genomic recordsZ Briefly Justify your choice and add comments (if any) in the
space provided belowv.
O Ve* usfu
O Samsw*a us**
O NI'l usfu









4. LINrKAGE

Description: A record that is linked to/frolm many other records, publications, or
reso~urces has high linkage. An example olf a record with high linkage is
NM 001083617.

How useful do you think LINKAGE would be in assessing the quality of genomic
rrecords? Briefl~y justify your choice and add comments9 (if any) in the spaet
provided below.
O Ve* usfu












Somewhat useful
O NOL***h









5s. REDUNDANCY

Descriptionr: A record is redundant if other records in the database: contain
similar informfation about the sequaence represented by the record. An example
of a set of redundant records is DK598901, DX598902, DX598903, DX598904,
DX598905. and DX598906.

How useful do you think REDUNDANCY would he in assessing the quality of
genornic records? Briefly justify your choice and add commrents (if any) in the
space providled below.v


O Nr seful
















You have fnirshe~d completing the survey qu~esuonnalre''
Now wre just need to collect your name and mailing address so that we~ can send yolu the compensation
check.


1. Please enter your Rname below. **We will address the crompensatiocn check to
the name you provide here.**
Note: This information will be used for compensation purposes only.




2. Please enter your mailing address belowv. *We will mail your compensation
check to the address you provide here. **
Note: This information will be used for compensation purposes only.





















7-AF305913
8 -170480
9 -AJ000759
10-AF394914
11 -11164038
12-Z93766
7-AF311734
8 -DQ144621
9-AF271234
10-118192966
11-AY530931
12-X15656
7 -AM236040
8-174 364020
9-AUB302211
10 -XM 001465100
11-124 001028104
12 -NM 153171
7-XM 001437591.1
8 -X{h 001444703.1
9-XM 001458802.1
10-BC140538.1
11-btd 128106.3
12 -NM 181070.4
7-DN956224
8 -CK348291
9-CID579131
10-DN956222
11 -AJ639612
12-AW670836
7-AAli83149
8 -NP 000851
9-NM 008969
10-NP 001034220
11 -NM 198052
12 -NM 008517


APPENDIX B
SURVEY RESPONSES

Table B-1. Answers to Question 1, Part I of Survey Questionnaire: Choose from the NCBI 's
nucleotide or protein databases 12 records whose quality you consider to be "Good"
(i.e., above average), and write the accession number of each record in the space
provided below (order is not important). Please choose records fr~om different


projects and publications.
Expert's identifier Expert's answer
1 1 -AFO30859
2-CID416297
3 -Q9SIM9
4-S52036
5 -S75487
6-X77231
2 1-EFO51116
2-AJ812277
3 -AFO24715
4-AJ489258
5-AJA409201
6-ACY594174
3 1 -AM700587
2-AJ784829
3 -AM421461
4 -AM707022
5-AB302131
6 -NM 006926
4 1 -X{h 969059.1
2-XM 001468499.1
3 -X{h 001468429.1
4-XM 001468594.1
5-124 001468299.1
6 -XM 001466914.1
5 1 -AM270121
2 -BC139879
3 -C10234118
4-BC140635
5 -AACKW01000063
6-ES228856
6 1 -NP 032304
2-NP 033792
3 -NM 008042
4-ABL01512
5 -NM 011198
6 -NM 009660










Table B-1. Continued.
Expert's identifier
7


Expert's answer
1 -DQ332556
2-EMQ200544
3 -EFO59394
4 -NM 000059
5 -NM 009844
6-DD322688
1 -191 173095
2-NM 001042503
3 -NM 066627
4-NM 065114
5 -NM 066249
6-AFO29303
1-EF555725
2-EFO32775
3 -EQ979194
4-ACY861780
5 -ACY519200
6-DQ125864
1 -AF104363
2-AF104364
3 -AF104365
4-AF104366
5 -AF115530
6-AF115531
1-NM 011891
2 -NM 023324
3 -XM 747199
4-NM 014294
5 -EFl39429
6-NM 001002964
1 -NT 033779
2-NM 003508
3 -bf 033777
4 -NP 649897
5 -NP 722812
6 -AAC26105
1 -196972
2-U15194
3 -193171
4-1080145
5 -)13449
6-107101445


7-NW 001589540
8 -AM503065
9-AFO95794
10 -AF309689
11 -AADC01011023
12-EFO62220
7-191 005514
8 -NM 006188
9-NM 153673
10-AY379549
11 -NM 009324
12-NM 007305
7 AY3 873 13
8-EFl56505
9-AUB267447
10-ACY207060
11 -AJ270473
12-EMQ979064
7-AF115532
8 -AF373692
9-AF373693
10-AF373694
11-ACY273808
12-EMQ288271
7-NM 031504
8 -EF529813
9 -EF589044
10-NC 009511
11 -NM 003616
12-DiQ151655
7 -AA392812
8-AukFl3280
9-AAUC37178
10 -AAB59378
11 -NM 169733
12 -NP 611323
7-AFl35043
8 -U17165
9-K01569
10-1040227
11 -127841
12-D38129





Table B-1. Continued.
Expert's identifier
14


Expert's answer
1 -AY666164
2 -NM 008356
3 -/B279705
4-DQ863513
5 -NM 073736
6-AL110479
1-ACY562383
2-118167815.
3 -NC 009501.
4-EFO35538.
5 -10030402.
6-EF418587
1-Pl7561
2-Q9UM22
3 -AF353717
4-Q25472
5 -PO4734
6 -CAG33109


7-NM 001086442
8 -EF529813
9-EF418587
10-EFO77626
11 -NM 002231
12-EF615587
7-Z49966.
8 -NM 000879.
9 -AY390507.
10-NM 204321.
11-ACY329443.
12-EFl85297.
7-C/0330452
8-CAA25855
9-NP 001013128
10-NM 214575
11 -191 214538
12 -NM 204920










Table B-2. Answers to Question 2, Part I of Survey Questionnaire: Choose from the NCBI's
nucleotide or protein databases 12 records whose quality you consider to be "Poor "
(i.e., below average), and write the accession number of each record in the space
provided below (order is not important). Please choose records fr~om different
projects and publications.


Expert's identifier
1






2






3






4


Expert's answer
1-ACKO25477
2 -A3(043232
3 -ACY222857
4-Z73946
5 -AJ001681
6-AFO26267
1 AM 69175
2 -EF 429311
3 -DiQ132859
4-AJ867487
5-AB014347
6-TYU65089
1-EFl56409
2-CS542023
3-AC203221
4-EF486657
5 -EF535228
6-U61190
1 -BM864850.2
2-AUL436983.1
3-CZl69617.1
4-NC 009356.1
5 -AFO205 75 .1
6-L12674.1
1 -X66057
2-NC 001146
3 -NC 001142
4-NC 001136
5 -NM 009458
6-NC 001147
1 -NM 007940
2-lBAAO5120
3 -AAJ325016
4 -AAB24352
5-AukB22287
6-AAA48742


7-ACY148428
8 -AJ306826
9-X97853
10-AF239989
11 -DQ847149
12-AFl23508
7 -DQ845786
8 -AM498281
9-AM1f76568
10-DQ631892
11-DQ358913
12-EF101929
7-DQ118014
8 -AUB258994
9-AB231801
10-/JB264051
11-EFl92191
12-AB290840
7 -AJ235805.1
8 -NM 004562.1
9-XM 001033076.2
10-NM 080562.4
11-CS543085.1
12-191 174874.3
7-1000096
8-AE014073
9-U80836
10-Z75712
11 -BA000007
12-C]R382129
7 -035936
8 -AAX28625
9-AAli24742
10 -AAQ89268
11-NG 001397
12-NP 001009963










Table B-2. Continued.
Expert's identifier
7


Expert's answer
1 -AI975940
2-T14408
3 -AMh698098
4-BFl36800
5-EFO58453
6-BG932368
1-M92315
2-M96401
3 -L36968
4 -X67761
5 -X70955
6-AQ989463
1-ACY935163
2-EMQ924857
3 -ACY389859
4-107096175
5 -EQ219795
6-DQ520876
1-AF455256
2-AF455258
3 -AF455259
4-DQ272153
5 -L24953
6-L24929
1-2LY813643
2-NM 001044933
3 -AM159109
4-DP000195
5 -191 001047841
6-Ahil83808
1 -D0(404004
2-AL427955
3 -CAA58508
4-2LY402753
5 -XP001236998
6-ABC66167
1 -11162348
2-lBAJ84184
3 -ALAM61760
4-X64147
5 -199646
6-Z49180
1 -AMh270351
2 -NC 004329
3 -NM 008337
4-XM 001031006
5-DQ060000
6-XM 001440354


7-XM 001357913
8 -DS185054
9-NC 007645
10-U66338
11-Al975600
12-Auk999407
7-EMQ438499
8-EMQ009322
9-/JB241668
10 -DD377352
11-AZ935200
12-ABO74509
7-707185509
8 -AF016523
9-707098488
10-EFO67828
11-EF555516
12-DQ170852
7-AF104371
8 -AY386254
9-ACY386255
10-KY386257
11-AF213323
12-?????
7-CP000713
8-NC 009513
9 -EF589960
10-ACY676118
11 -AF310681
12-NC 003070
7-AAAQ75417
8-XP001122518
9-NP 035274
10-ABL63586
11 -CAJ19243
12-CV224730
7-EMQ455749
8 -lF408908
9-118128863
10-ACY455665
11 -AF544975
12-EF408901
7-NT 166530
8 -AY584752
9-NM 001033511 JCL 539918
10-EF571582
11-NC 009512
12-EF589953










Table B-2. Continued.
Expert's identifier


Expert's answer
1-Z50116.
2-AF286313.
3-EF202098
4-DQ388337.
5 -EFl20880.
6-ACY152759.
1-EMQ033715
2-XP 418830
3-AY278559
4-107027861
5 -)P 786460
6-CAF88218


7-EF108313
8-EF000001.
9-EFO27094
10-DQ978772.
11-EF210470.
12-EMQ084886.
7-AukZ43086
8 -CIC817283
9-XP 549467
10-(TV863989
11 -CR559896
12-AAli00686










Table B-3. Answers to Question 1, Part II of Survey Questionnaire: What criteria do you use
when evaheating the quality of a genontic record? If you inchede several criterions,
please rank them in order of importance (1 being the most important, 2 being the
second most important, etc). Also, please inchede a description of each qualityfactor
you provide.
Expert' s Expert' s answer
identifier
1 1. Total number of information: tissue where it was isolated, information included in
the proj ect title.
2. How clear is the name of the gene. May I straightforward associate a record to
another gene?
3. Update of the publications
4. Presence and information about the UTR, often polyA site or signal is not
specifically indicated.
2 I work with viruses whose genomic sequence is around three thousand base pairs so it
is easy to deal with the genome. I have a general look and define according to the
organization of the supplied sequence and how it fits with literature I know off. For
example some people define the primers they use and then you find a partial sequence
that is outside of these primers as part of the sequence you are dealing with so you
know it is a vector and not to be trusted. Other sequences define regions not
according to literature and create there own annotation of the genes. Thus I get away
from these peoples sequences.
3 1. Detail description of gene or proteins when possible.
2. Detail description of the organism or sample (taxonomy, strain, collection site and
date, country, etc).
3. Includes information about publication or several publication that cited the
sequence.
4. Details of the programs that were used to predict gene structure.
4 Nucleotide quality (N, W, etc.), partial vs. whole, annotations, reading frame.
5 I was concerned with the revisions. The good records contained 1-2 revisions vs. the
bad records contained 20+ revisions.
6 1. The amount of information available of that protein or gene sequence. Says a lot
about how reliable the information is and how useful it has been to others.
2. Publication of the whole sequences. Usually incomplete sequences tells us about
feasibility of working with that specific protein and/or suggests multiple variants of
the same enzyme, suggesting that not everything about that protein is known.
7 Quality Length intron/exon genomic or mRNA.









Table B-3. Continued.
Expert' s Expert' s answer
identifier
8 1. If there are large numbers of n's/unknown nucleotides in the data (that counts as
poor quality) especially if there are sequential stretches of n's or n's at the very end of
a sequence, which could easily be cut off when editing, so it indicates perhaps the
sequence was not well edited.
2. The amount of information provided about the sequence:
2a. Phylogenetic classification (where expected e.g. if there are nearly exact
genbank species matches yet the sequence is simply classified as "uncultured
bacterium," with no attempt at any level of taxonomic classification, then it is poor
quality. This is compounded if it is part of a series of hundreds of similar clones that
can crowd out the sequence information that has relevant data on it)
2b. Functional information about the gene
1. What it effects/reacts with
2. Related homologs/orthologs
3. Areas of expression in the organism
4. Phenotypic effects
3. Number of publications/if it has been published. The quality of the publication
journal can add (ex. Nature), and if it was published by different authors (not
overlapping authors in all the groups) that improves the quality as well.
4. Whether it has been recently updated/updated at all, which would indicates the
authors are putting in effort to make the information in the entry accurate (this
increases the quality)(For example one of the sequences I listed as being poor due to
it having outdated information that placed it in an old taxonomic category which
conflicted with 98%+ matching sequences listed as the new taxonomic category,
making the sequence identification confusing and difficult to cite in a paper). This
somewhat overlaps with 2a, but applies more to the area where it mentions if the
sequence has been updated.
9 Source of sequence-location or experiment type.
10 Completeness- are there ambiguity characters and is sequence complete in length.
11 Having a complete sequence with minimal "N"'s is the most important thing I look
for when evaluating quality.
12 1. Inter- species compari son/conservation
2. Consistency with online database/algorithmic predictions
3. Author concurrence
4. Literature confirmation of experimental data and expansion of/ characterization
based on previously published data
5. Personal/local experimental data. thoroughness and depth of data
6. Inter-collegial support.
13 1. Information about the gene structure: if the specifically structures of each
framework are clearly showed up that I can choose the exactly sequences I need.
2. For human gene, if their chromosome site is clearly written, that will be nice.
14 1. Easy access to the nucleotide sequence.
2. An explanation of the function and or relationship to other genes or proteins of the
gene or protein in question.









Table B-3. Continued.
Expert' s Expert' s answer
identifier
15 Generally, the more informative the record the better so that it is clear what the
submission is and what organism and where it came from. The good records were
published, included detailed information about the gene or region submitted and
exactly what they are, included information about where the organism was from
including host species (for parasitic organisms), included translation information
when appropriate.
16 1. Sequence containing undetermined nucleotides (Ns or Xs)
2. Amino acid sequences with frame shifts
3. Chimera sequences (including stretches of amino acids that are not part of the
protein)
4. Sequences without a properly assigned name (unnamed protein product), even
when their orthologs are known.





























































16 The 1. I choice is represented as a blank cell in this table.


Table B-4. Answers to Question 2, Part II of Survey Questionnaire: How would you rank the
overall quality of the following NCBI data sources? Use ranks fr~om 1 (highest
qualityi) to 6 (lowest qualityi), or N A6 (if you have never used the database). You may
use the same rank for two or more databases if you consider them to be equivalent in
terms of their quality.
Expert' s dbEST dbGSS dbSTS GenBank HTC RefSeq
identifier
1 2 3 3 1 3 2
2 2 2 3 1
3 5 5 3 6









Table B-5. Answers to Question 3, Part II of Survey Questionnaire: When using genomics
databases, do you usually II or do you II 1, k\ ~I ithr a different set of records every time ? Explain briefly.
Expert' s Expert' s answer
identifier
1 I generally switch records because I work on several proj ects.
2 Well defined set of records.
3 I worked with a defined group of organisms and one specific gene but not a specific
group of records. I worked with environmental samples, all of them are from the
same kingdom and same gene. I compared my data with many different papers or
records in GenBank.
4 Different.
5 I work with a different set every time. I do this because it I want to set up in silico
experiments with random samples like I do with microarray samples.
6 I usually work with a defined set of records related to my research.
7 Typically I work with a defined set of records based on the proj ect. However, I do
work with different sets of records for each proj ect.
8 I usually work with a well-defined set of records, such as cyanobacteria/bacteria and
c. elegans, with occasional branches out to look for homologs of interesting genes,
usually in more complex organisms (mice, humans). Basically the set will focus on
the organism/taxa of interest, and only change if I change maj or proj ects (every few
years) .
9 I'm probing diversity of my sequences so generally different records.
10 Different databases and genes.
11 I work with an organism which doesn't have a fully sequenced genome so I usually
work without a well defined set of records.
12 Try to compare and contrast multiple databases/literatures for consistency.
13 I usually work with a well defined set of records because I check the sequences of
other species of my gene from the published paper. The work from the well-known
paper are always believable and useful.
14 Usually well defined set of records.
15 For most work, I have multiple sets of defined records that I work with on a daily
basis. These are sequences that I use as reference sequences.
16 I work with a well defined set of records. But a new sequence for my favorite protein
is available from different organism; I must analyze it to determine its quality and if it
is truly a new member of the protein family.










Table B-6. Answers to Question 4, Part II of Survey Questionnaire: Do you Imow ofany
genomics databa~se that provides quality scores for its data? If so, please mention it
here.
Expert' s Expert' s answer
identifier
1 SGN provides quality scores for the markers.
2 No.
3 No.
4 No.
5 Most of the time I deal with p-values.
6 No.
7 I don't think any of the databases I use on a regular basis provide quality scores. If
they do I'm not sure I understand them.
8 None that I know of.
9 Greengenes collects a subset of "good" 16s environmental sequences from Genbank
and RDP.
10 No.
11 I am unaware of any databases which do provide quality scores.
12 Some databases, e.g., yeast databases provide p-values for comparison accuracy.
13 NCBI is the mainly database I work with, I believe in it most of time.
14 No I don't.
15 No.
16 No, I not aware of that kind of database.















1 S


A database that is quickly updated is a lot better than one in which records
are fossilised.
The sequence is much more informative and should be stable but the
annotation may change as more knowledge becomes available and it is
important to have a standard criteria for the annotation with the availability
of more sequences and knowledge. Thus in order to keep a sequence
informative the gathering of knowledge and references in one place is very
important.
It is impossible to have only stable records because people obtain new
information about their data (which is good) but a record with changes in
primary information such as organism, collection sites, strains is not
useful .
Genome sequencing tends to change the database and retire genes; it
would be nice if all sequences were stable, but lets face it some areas are
just hard to sequence thru.
As long as a curator does not take away any old information from records I
consider them to be quite stable.
It suggests the information is reliable and that it has been corroborated by
different investigators.
It is important to know when information has changed as it might affect
the results of any prqj ect you might be using the record for.
Although I think it is important to know whether or not a record has been
updated/changed, if it has changed, that would be a reflection on the
quality of the old record, not necessarily the new one. It would be good to
have for evaluating older papers/references that might have used the old
data, but would not necessarily be of use to the new sequence. Also, how
stable it was would to some degree depend more on how much new
research was being done on that sequence/organism rather than the relative
quality of the sequence itself (e.g. a sequence could be very low quality,
but if no one else researches the organism for 10 years, it may be very
stable).
Good records are complete and don't need continual updating.
Stability would be very useful, if things change often, especially sequence,
then the results of prqj ects could change.
Working with unstable accession ID's can be confusing and time
consuming.


5 V

6 V

7 V

8 N










9 V
10 V

11 V


"7 The usefulness was assessed in a 3-point scale: very useful (V), somewhat useful (S), and not useful (N).


Table B-7. Answers to Question 1, Part III of Survey Questionnaire: How useful do you think
STABILITY would be in assessing the quality of genomic records?'7 Briefly justify/
your choice and add comments (ifany) in the space provided below.
Expert' s Expert' s Expert' s comments
identifier rating










Table B-7. Continued.
12 S The fact that the data has not been contested/redefined in several years
suggests that it is accurate; however, if no significant progress or analysis
has been carried out for great time periods, i.e. within 2-3 years or more, it
is indicative of lack of analysis rather than stability of data.
13 S It depends. As long as the information is right at that time, updating is ok.
14 V If a record stays unchanged it shows the quality of the sequencing.
15 N If a record does not change it often means that there is no work being done
and thus no errors ARE found. In this case, the "stable" record could be
worse.
16 S The usefulness of stability for sequence records will depend on the specific
records studied. I'm glad that several records that I knew had some
problems were corrected in later updates of GenBank. But others have
stayed unchanged for almost 7 years, even though I have evidence that
they are incorrect.










Table B-8. Answers to Question 2, Part III of Survey Questionnaire: How useful do you think
DENSITY would be in assessing the quality of genomic records? '8 Briefly justify/
your choice and add comments (ifany) in the space provided below.
Expert' s Expert' s Expert' s comments
identifier rating
1 S That record is too chaotic. It may be good for a first screen but the specific
gene sections are more important.
2 V The more the sequence knowledge is available for the user the more it is to
look at and make use of it and this provides like a comfortable or relaxing
feeling that you do not have to look into the details of the sequence to
determine what you need especially in the case of chromosome sequences.
3 V Detail information (lots of annotations) is specially important in cases
when the paper has not been published and it is very useful for
comparisons with other datasets (e.g. studies of environmental samples).
4 N Though it would be nice to have and extra link to get to 'all' the
information, extremely dense records seem unnecessary.
5 V I would consider this to be very useful in assessing the quality of genomic
records. I believe that any type of ancillary data, especially extra literature,
to be very useful as opposed to very little information.
6 V It suggests the sequence is well known and characterized, which makes
experiment design easier. It gives you a clearer idea as to whether the
sequence you are dealing with is unique or could have variants.
7 N I think these dense records would be better if you cold link to the
information from one page, rather than scrolling for miles to find the
information you need.
8 V The more publications and information on the gene there is, almost
certainly the more useful and higher quality the sequence, since it has been
presumably checked over multiple times by different researchers and
reviewed by various j ournals. It is also of greater use to a researcher
attempting to obtain information about the gene/its location/its homologs,
etc. It might be nice if there could be some redundancy of author check as
well, since if there were 5 publications all with one author in it, I would
consider that lower quality than 5 different authors publishing the same
sequence.
9 S Some dense records just have useless excess info.
10 V This would be nice because it shows papers and proj ects that have used the
sequence.
11 S More information can be valuable or overwhelming.
12 S Multiple data sets can be helpful in identifying multiple
isoforms/transcripts of individual genes. Conversely, they may indicate
potential for more confined, specific sequence requirement for true
expression, or for tissue specific expression, etc. Or the data could
represent conflicting reports/views/data sets from competing laboratories.


1s The usefulness was assessed in a 3-point scale: very useful (V), somewhat useful (S), and not useful (N).









Table B-8. Continued.
Expert' s Expert's Expert's comments
identifier rating
13 S Information is surely useful, but sometimes too much information in one
record will make user lost.
14 S Density can be good to obtain information on the record and how it was
obtained, but it can get cumbersome to understand.
15 V Dense usually means that a lot of information is included, and much of it is
likely useful.
16 V Density is very useful because most bibliographic and sequence
information is linked to the record. In addition, you can know in a glance
the function, and special features of the gene/protein.









Table B-9. Answers to Question 3, Part III of Survey Questionnaire: How useful do you think
FRESHNESS would be in assessing the quality of genomic records? '9 Briefly justify/
your choice and add comments (ifany) in the space provided below.
Expert' s Expert' s Expert' s comments
identifier rating
1 V It is necessary when there are similar sequences: are they sequencing
errors? SNPs or different genes or whatelse? The updates as presented in
the acc number presented are perfect.
2 N It is not useful because the updating of all sequences in the gene bank is
not practical so it depends on the field you are working in and how much
this update is fresh. But in general this is not the bases.
3 N It is good to have the last version of a record but it is not essential to
evaluate the quality of the record.
4 N Seems unnecessary.
5 V I use freshness as additions to old information mostly for updates.
6 V Things always change and people make mistakes. Updates are always
good and very informative.
7 V It is important to have all the information available.
8 S I think it would be an important factor to know if a sequence had been
updated recently, because, as stated before, it would indicate that effort
was being put into to keeping the record up-to-date and accurate. However,
this measure could put initially very high-quality records at a disadvantage.
9 S This type of updating isn't common in the records that I generally use so
I'm not sure how useful this would be.
10 V Knowing when things have been updated is critical.
11 S Intuitively it seems that a record with more freshness would help in
determining something's quality.
12 S Previously characterized data has, on occasion, been shown to be
misinterpreted, in which case maintaining a fresh database can be
extremely valuable to researchers who might be struggling for answers to
mysteriously unanswerable questions. However, data that was posted in
the past, say five years ago or more, could be either because data analysis
has experimentally stood the test of time, or, by contrast, it could indicate
that the molecule has remained unstudied since initial publication, that no
experiments have been conducted to prove, disprove, or further
characterize the molecule.
13 V The information is always updated, so to show the update information is
very important to user.
14 V If a record has changed it should be stated.
15 S I think this could be useful because it indicates that work is ongoing and
theoretically the information is increasing or getting better.
16 V Freshness is very useful because genes not previously linked protein
domains or recent literature can be identified.


19 The usefulness was assessed in a 3-point scale: very useful (V), somewhat useful (S), and not useful (N).









Table B-10. Answers to Question 4, Part III of Survey Questionnaire: How useful do you think
LINKAGE would be in assessing the quality of genomic records? 20 Briefly justify/
your choice and add comments (ifany) in the space provided below.
Expert' s Expert' s Expert' s comments
identifier rating
1 V It helps a lot. I think they are complementary.
2 S What it is linked this.
3 V This is very useful because you can obtain a lot of information about that
record.
4 N I would like to see a separate link to access that stuff.
5 V I always teach my students to include everything. New information should
always be used to qualify them in papers, posters, and reports.
6 V Same as question 2. [It suggests the sequence is well known and
characterized, which makes experiment design easier. It gives you a
clearer idea as to whether the sequence you are dealing with is unique or
could have variants].
7 V This demonstrates how many people have done work on this sequence,
which in turn relays the importance and accuracy of the sequences.
8 V I think a record with high linkage is almost always of greater quality and
usefulness than one of low linkage, again because it has been evaluated
multiple times and presumably has more information attached to it to assist
a researcher with learning about it.
9 V Well linked records indicate through prep of the sequences/metadata.
10 V This would be nice, but have not used it.
11 V Having quick links readily accessible enhance quality assessment.
12 V The more publications about a gene/protein, the more research has been
conducted, and the more is known about its characteristics, expression
patterns, and functions. More extensive knowledge could be of tremendous
benefit to medicine, even if more recent data negate initial conclusions. It
may also be useful in that a gene/protein that receives more attention may
be indicative of necessity to obtain detailed answers swiftly and precisely.
13 V This is the most part I like. I really would like to check the publications
and resources conveniently. That helps my work a lot.
14 V Very useful to know what other publications are linked to one record.
15 S This may be useful because it indicates the authors have examined many
similar items rather than just one.
16 V Linkage is of fundamental importance, because most of what is know
about a record can be assessed in a glance.


20 The usefulness was assessed in a 3-point scale: very useful (V), somewhat useful (S), and not useful (N).





5 N
6 V
7 N

8 S






9 N
10 S
11 N
12 V


21 The usefulness was assessed in a 3-point scale: very useful (V), somewhat useful (S), and not useful (N).


Table B-11. Answers to Question 5, Part III of Survey Questionnaire: How useful do you think
REDUNDANCY would be in assessing the quality of genomic records? 21 Briefly
justify, your choice and add comments (ifany) in the space provided below.


Expert's
identifier
1
2


Expert's
rating
N
S


Expert' s comments


If they are exactly the same they are useless and troublesome.
The sequence if there and is redundant this means that the sequencing
process is to be trusted. However this is extremely important in some cases
for the study of genetic diversity and population genetics who look at point
mutations but from a practical point of view for genes and working with
them it is somehow important to be sure from several sequences especially
in case of pathogens or important conserved domains that they are stable
and the sequence is repetitive.
If it is exactly the same sequence, redundancy makes searches really
complex. It would be better to have links instead getting pages and pages
of the same hits.
I would like to see it all combined, and I would appreciate it if all the
sequences that link together for one gene were submitted in one file.
N/A
It helps corroborates the published data. Although it could be misleading.
I think redundant records should be combined until information is
provided that the record can stand alone.
As with any scientific data, the greater the ability to replicate, the higher
the quality of the data. However, I again would be concerned if the set of
records was all from one author or not I've found many sets of hundreds
of similar sequences all from one proj ect which I would not necessarily
rank as highly in adding to the quality of a sequence (though it would still
be better than having no replicates).
Not useful.
Don't know.
Redundancy seems redundant.
Redundancy could potentially indicate gene duplications that resulted in
multi-tissue expression, with either redundant, novel, or pleiotropic
functions.
It depends. Meaningless repeat is just a waste and sometimes just disturb
user's eyes. But sometimes similar records exist and maybe bring user
information.
Usually good to know how consistent a sequence is.
This may be useful to allow you to compare the redundant records, but I do
not think it indicates much about quality itself.










Table B-11. Continued.
Expert' s Expert's Expert's comments
identifier rating
16 S Redundancy is useful if it comes from cDNA sequencing from several
tissues, because you can determine if there are tissue specific variants, or if
they represent several readings of a gene, and you can construct a contig
from them. Redundancy is not useful if the tissue used was the same.









APPENDIX C
ORIGINAL INSDSEQ XML SCHEMA

We append here the original INSDSeq XML Schema (Version 1.4, 19 September 2005)

automatically generated by the DATATOOL utility (version 1.8.1, 18 January 2007).





::DATATOOL:: Generated from "insdseq.asn"
::DATATOOL:: by application DATATOOL version 1.8.1
::DATATOOL:: on 01/18/2007 23:07: 18


xmlns:xs="http://www.w3 .org/200 1/XMLSchema"
xmlns="http ://www.ncbi.nlm.nih.gov"
targetNamespace="http://www. ncbi .nlm.nih. gov"
elementFormDefault="'qualified"
attributeFormDefault="unqualified">


























INSDReference_position contains a string value indicating the
basepair span(s) to which a reference applies. The allowable
formats are:

X..Y : Where X and Y are integers separated by two periods,
X >= 1 Y <= sequence length, and X <= Y

Multiple basepair spans can exist, separated by a
semi-colon and a space. For example : 10..20; 100..500

sites : The string literal 'sites', indicating that a reference
provides sequence annotation information, but the specific
basepair spans are either not captured, or were too numerous
to record.

The 'sites' literal string is singly occurring, and
cannot be used in conjunction with any X..Y basepair spans.

References that lack an INSDReference_position element apply
to the entire sequence.










































INSDXref provides a method for referring to records in
other databases. INTSDXref~dbname is a string value that
provides the name of the database, and INTSDXref~dbname
is a string value that provides the record's identifier
in that database.











INTSD~eature_operator contains a string value describing
the relationship among a set of INSDInterval within
INSDFeature intervals. The allowable formats are:

jomn : The string literal 'join' indicates that the
INSDInterval intervals are biologically joined
together into a contiguous molecule.

Order : The string literal 'order' indicates that the
INSDInterval intervals are in the presented
order, but they are not necessarily contiguous.

Either 'j oin' or 'order' is required if INSDFeature~intervals
is comprised of more than one INSDInterval .





































































INSDInterval~iscomp is a boolean indicating whether
an INSDInterval from / INSDInterval to location
represents a location on the complement strand.
When INSDInterval~iscomp is TRUE, it essentially
confirms that a 'from' value which is greater than
a 'to' value is intentional, because the location
is on the opposite strand of the presented sequence.
INSDInterval~interbp is a boolean indicating whether
a feature (such as a restriction site) is located
between two adj acent basepairs. When INSDInterval~iscomp
is TRUE, the 'from' and 'to' values must differ by
exactly one base.









































































APPENDIX D
INSDSEQQM XML SCHEMA

We append here the XML Schema used by the Local Cache in the Quality Management

Architecture, named INSDSeq_QM.xsd. This is a modified version of the INSDSeq schema

shown in Appendix C. We have removed all the lengthy comments that appeared in the original

version of Appendix C for the sake of brevity.





::DATATOOL:: Generated from "insdseq.asn"
::DATATOOL:: by application DATATOOL version 1.8.1
::DATATOOL:: on 01/18/2007 23:07: 18



Modified on 06/18/2007 by Alexandra Martinez
Changes were made to adapt this schema for
the BIODQ Proj ect's Quality Metadata Engine


xmlns:xs= "http://www.w3 .org/200 1/XMLSchema"
xmlns:xdb= "http://xmlns.oracle.com/xdb"
xmlns= "http://biodq. cise.ufl.edu"
targetNamespace= "http:.//biodq. cise.ufl.edu"
elementFormDefault="'qualified"
attributeFormDefault= "unqualified" >


$Revision: 1.6 $


ASN. 1 and XML for the components of a GenBank/EMBL/DDBJ sequence record
The International Nucleotide Sequence Database (INSD) collaboration
Version 1.4, 19 September 2005






COMPLEX TYPES ADDED for BIODQ Proj ect








































="INSDSeqlocus" type=" stringWithld"/>
="INSDSeq_1ength" type="integerWithld"/>
S"INSDSeq_strandedness" type= "stringWithld" minOccurs= "O"/>
S"INSDSeq~moltype" type=" stringWithld"/>
="INSDSeq~topology" type=" stringWithld" minOccurs="O"/>
S"INSDSeq_division" type=" stringWithld"/>
S"INSD Seqlupdatedateat" type= stringWithld"/>
S"INSDSeq_create-date" type= "stringWithld" minOccurs= "O"/>
S"INSDSeqlupdate-release" type= "stringWithld" minOccurs= "O"/>
S"INSDSeq_create-release" type= "stringWithld" minOccurs=" ""/>
="INSD Seq_definition" type= stringWithld"/>
S"INSD Seq_primary-accession" type= "stringWithld" minOccurs= "0 "/>
S"INSD Seq_entry-version" type= "stringWithld" minOccurs=" "0"/>
S"INSD Seq_accession-version" type= "stringWithld" minOccurs=" ""/>
="INSDSeq_other-seqids" minOccurs="O">



































































































































































































































































PAGE 155

155 sequence data at the flatfile -format level of detail. Further documentation regarding the conten t and conventions of those formats can be found at: URLs for the DDBJ, EMBL, and GenBank Feature Table Document: http://www.ddbj.nig.ac.jp/FT/full_index.html http://www.ebi.ac.uk/embl/Documen tation/FT_definitions/feature_table.html http://www.ncbi.nlm.nih. gov/projects/collab/FT/index.html URLs for DDBJ, EMBL, and GenBank Release Notes : ftp://ftp.ddbj.nig.ac.jp/database/ddbj/ddbjrel.txt http://www.ebi.ac.uk/embl/Documenta tion/Release_notes/current/relnotes.html ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt Because INSDSeq is a compromise, a number of pragmatic decisions have been made: In pursuit of simplicity and familiarity a number of fields do not have full substructure defi ned here where there is already a standard flatfile format string. For example: Dates: DD-MON-YYYY (eg 10-JUN-2003) Author: LastName, Initials (eg Smith, J.N.) or Lastname Initials (eg Smith J.N.) Journal: JournalName Volume (issue), page-range (year) or JournalName Volume(issue):page-range(year) eg Appl. Envi ron. Microbiol. 61 (4), 1646-1648 (1995) Appl. E nviron. Microbiol. 61(4):1646-1648(1995). FeatureLocations are representated as in the flatfile feature table, but FeatureIntervals may also be provided as a convenience FeatureQualifiers are represented as in the flatfile feature table. Primary has a string that re presents a table to construct a third party (TPA) sequence. other-seqids can have strings w ith the "vertical bar format" sequence identifiers used in BLAST for example, when they are non-INSD types. Currently in flatfile format you only see Accession numbers, but there are others, like patents, submitter clone names, etc which will appear here

PAGE 156

156 There are also a number of elements that could have been more exactly specified, but in the interest of simplicity have been simply left as optional. For example: All publicly accessible sequence records in INSDSeq format will include accession and accession.vers ion. However, these elements are optional in optional in INSDSeq so that this format can also be used for non-public sequence data, prior to the assignment of accessions and version numbers. In such cases, records will have only "other-seqids". sequences will normally all have "se quence" filled in. But contig records will have a "join" statement in the "contig" slot, and no "sequence". We also may consider a retrie val option with no sequence of any kind and no feature table to quickly check minimal values. Four (optional) elements are specif ic to records represented via the EMBL sequence database: INSDSeq_updat e-release, INSDSe q_create-release, INSDSeq_entry-version, and INSDSeq_database-reference. One (optional) element is specific to records originating at the GenBank and DDBJ sequence da tabases: INSDSeq_segment. ******** -->

PAGE 157

157


PAGE 158

158 = 1 Y <= sequence length, and X <= Y Multiple basepair spans can exist, separated by a semi-colon and a space. For example : 10..20; 100..500 sites : The string literal 's ites', indicating that a reference provides sequence annotation information, but the specific basepair spans are e ither not captured, or were too numerous to record. The 'sites' lite ral string is singly occuring, and cannot be used in c onjunction with any X..Y basepair spans. References that lack an IN SDReference_position element apply to the entire sequence. -->

PAGE 159

159


PAGE 160

160


PAGE 161

161

PAGE 162

162


PAGE 163

163 APPENDIX D INSDSEQ_QM XML SCHEMA We append here the XML Schema used by the Local Cache in the Quality Management Architecture, named INSDSeq_QM.xsd. This is a modified version of the INSDSeq schema shown in Appendix C. We have removed all the le ngthy comments that appeared in the original version of Appendix C for the sake of brevity.
PAGE 164

164 =============== ========================= -->

PAGE 165

165


PAGE 166

166


PAGE 167

167


PAGE 168

168


PAGE 169

169
-->


PAGE 170

170 LIST OF REFERENCES 1. Abiteboul, S., Buneman P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann P ublishers, San Francisco, CA (2000) 2. Adams, D.: Oracle XML DB Developer’s Guide, 10g Release 2 ( 10.2). Oracle (2005). http://download.oracle.com /docs/cd/B19306_01/appdev.102/b14259.pdf 3. Ballou, D., Madnick, S., Wang, R.: Assu ring Information Quality. J. Management Information Systems 20(3), 9-11 (2004) 4. Beall, J.: Metadata and Data Quality Problems in the Digital Library. J. Digital Information 6(3), No. 355 (2005) 5. Benson, D.A., Karsch-Mizrachi, I., Lipman D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Res. 35(Databa se issue), D21-D25 (2007) 6. Bochmann, G., Hafid, A.: Some Principles for Quality of Service Management. Distributed Systems Engineering J. 4(1), 16-27 (1997) 7. Boeckmann, B., Bairoch, A., Apweiler, R., Bl atter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O' Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365-370 (2003) 8. Bossa, S.: Montag (August 2007). http://montag.sourceforge.net/index.html 9. Bouzeghoub, M., Peralta, V.: A Framework for Analysis of Data Freshness. In: Proceedings of the International Workshop on Information Quality in Information Systems (IQIS), pp. 59-67 (2004) 10. Buneman, P.: Semistructured Data. In: Pr oceedings of the ACM Symposium on Principles of Database Systems (P ODS), pp. 117-121 (1997) 11. Buneman, P., Davison, S., Hillebrand, G., and Suciu, D.: A query language and optimization techniques for unstructured data. In: Proceedings of the ACM SIGMOD International Conference on Manage ment of Data, pp. 505-516 (1996) 12. Calvanese, D., De Giacomo, G., Lenzerin i, M.: Modeling and Querying Semi-Structured Data. Networking and Informati on Systems J. 2(2), 253-273 (1999) 13. Centrum voor Wiskunde en Informatica: MonetDB (August 2007). http://monetdb.cwi.nl/project s/monetdb/Home/index.html 14. Cyran, M.: Oracle Database Concep ts, 10g Release 2 (10.2). Oracle (2005). http://download.oracle.com/ docs/cd/B19306_01/server.102/b14220.pdf

PAGE 171

171 15. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Ro jas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L.: Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB. Appl Environ Mi crobiol. 72(7), 50695072 (2006) 16. Drake, M.: Oracle XML DB White Pa per. Oracle (2005). http://downloaduk.oracle.com/otndocs/tech/xml/x mldb/TWP_XML_DB_10gR2_long.pdf 17. European Bioinformatics Institute: EMBL Nucleotide Sequence Database (February 2006). http://www.ebi.ac.uk/embl/ 18. France Telecom S.A.: Introduction to ASN.1 (August 2007). http://asn1.elibel.tm.fr/en/introduction/ 19. Garcia-Molina, H., Ullman, J., Widom, J. Database Systems: The Complete Book. Prentice Hall, New Jersey (2002) 20. International Nucleotide Sequence Data base Collaboration: INSDC (July 2007). http://www.insdc.org/page.php?page=home 21. International Nucleotide Sequence Databa se Collaboration: INSDC Feature Table Definition Document (September 2007). http:/ /www.insdc.org/files/feature_table.html 22. Kulikova, T., Akhtar, R., Aldebert, P., Althor pe, N., Andersson, M., Baldwin, A., Bates, K., Bhattacharyya, S., Bower, L., Browne, P., Castro, M., Cochrane, G., Duggan, K., Eberhardt, R., Faruque, N., Hoad, G., Kanz, C ., Lee, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., Lorenc, D., McWilliam, H., Mukherjee, G., Nardone, F., Pastor, M.P., Plaister, S., Sobhany, S., Stoe hr, P., Vaughan, R., Wu, D., Zh u, W., Apweiler, R.: EMBL Nucleotide Sequence Database in 2006. Nuclei c Acids Res. 35(Database issue), D16-D20 (2007) 23. Lee, Y.W., Strong, D.M.: Knowing-Why A bout Data Processes and Data Quality. J. Management Information Systems 20(3), 13-39 (2003) 24. Lee, Y.W., Strong, D.M., Kahn, B. K., Wang, R.Y.: AIMQ: A Methodology for Information Quality Assessment. Information and Management 40(2), 133-146 (2002) 25. McHug, J., Abiteboul, S., Goldman, R., Quass, D., Widom, J. Lore: A Database Management System for Semistructured Da ta. ACM SIGMOD Rec. 26(3), 54-66 (1997) 26. Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R., Catarci, T., Batini, C.: Managing Data Quality in Cooperative Information System s. J. Data Semantics. Lecture Notes in Computer Science, vol. 2800, pp. 208-232. Spri nger-Verlag, Berlin Heidelberg New York (2003) 27. Meier, W.: eXist (August 2007) http://exist.sourceforge.net/

PAGE 172

172 28. Mihaila, G., Raschid, L., Vidal, M.E. : Querying “Quality of Data” Metadata. In: Proceedings of the IEEE MetaData Conference, pp. 526-531 (1999) 29. Missier, P., Batini, C.: A Multidimensional Model for Information Quality in Cooperative Information Systems. In: Proceedings of the International Conf erence on Information Quality (ICIQ), pp. 25-40 (2003) 30. Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: Quality views: capturing and exploiting the user perspective on data quali ty. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 977-988 (2006) 31. Mitchell, T.: Machine Learning. McGraw Hill (1997) 32. Mueller, L.A., Solow, T.H., Taylor, N., Sk warecki, B., Buels, R., Binns, J., Lin, C., Wright, M.H., Ahrens, R., Wang, Y., Herbst, E.V., Keyder, E.R., Menda, N., Zamir, D., Tanksley, S.D.: The SOL Genomics Networ k. A Comparative Resource for Solanaceae Biology and Beyond. Plant Phys iol. 138(3), 1310-1317 (2005) 33. Mller, H., Naumann, F., Freytag J.C.: Data Quality in Genome Databases. In: Proceedings of the International Conferen ce on Information Quality (ICIQ), pp. 269-284 (2003) 34. National Center for Biotechnology Informati on: DATATOOL NCBI da ta conversion tool (August 2007). http://www.ncbi.nlm.nih. gov/data_specs/NCBI_data_conversion.html 35. National Center for Biotechnology Informa tion: GenBank Overview (February 2006). http://www.ncbi.nlm.nih.gov/Genbank/ 36. National Center for Biotechnology Information: Entrez, The Life Scie nces Search Engine (August 2007). http://www.ncbi.n lm.nih.gov/gquery/gquery.fcgi 37. National Center for Biotechnology Informati on: Expressed Sequence Tags database (Sep 2007). http://www.ncbi.nlm.nih.gov/dbEST/ 38. National Center for Biotechnology Info rmation: Index of schema (August 2007). http://www.ncbi.nlm.nih.gov/data_specs/schema/ 39. National Center for Biotechnology In formation: RefSeq (January 2006). http://www.ncbi.nlm.nih.gov/RefSeq/ 40. National Center for Biotechnology Information: The NCBI Handbook (2003). http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books 41. National Center for Biotechnology Info rmation: The NCBI Help Manual (2006). http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books 42. National Institute of Genetics: DDBJ-DNA Data Bank of Japan (January 2006). http://www.ddbj.nig.ac.jp/

PAGE 173

173 43. National Library of Medicine: PubMed (July 2007). http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed 44. Naumann, F., Freytag J.C., Leser, U.: Co mpleteness of integrated information sources. Information Systems 29(7), 583-615 (2004) 45. Naumann, F., Rolker, C.: Assessment Met hods for Information Quality Criteria. In: Proceedings of the International Conferen ce on Information Quality (ICIQ), pp. 148-162 (2000) 46. Naumann, F., Roth, M.: Information Qu ality: How Good Are Off-The-Shelf DBMS? In: Proceedings of the International Conferen ce on Information Quality (ICIQ), pp. 260-274 (2004) 47. Oracle: Oracle XML DB, 10g (October 2007). http://www.oracle.com/technology/tech/xml/xmldb/index10_2.html 48. Preece, A.D., Jin, B., Pignotti, E., Missier, P., Embury, S.M., Stead, D., Brown, A.: Managing Information Quality in e-Scie nce Using Semantic Web Technology. In: Proceedings of the European Semantic Web Conference (ESWC), pp. 472-486 (2006) 49. Pruitt, K.D., Tatusova, T., Maglott, D.: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, tran scripts and proteins. Nucleic Acids Res. 35(Database issue), D61-D65 (2007) 50. Quinlan, J.R.: C4.5: Programs for Machin e Learning. Morgan Kaufmann Publishers, San Francisco, CA (1993). 51. Rakotomalala, R.: TANAGRA: a free softwa re for research and academic purposes. In: Proceedings of EGC, RNTI-E -3, vol. 2, pp. 697-702 (2005) 52. Rice, J.A.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press, Belmont, CA (1995) 53. Scannapieco, M., Virgillito, A., Marchetti, M., Mecella, M., Baldoni, R.: The DaQuinCIS Architecture: A Platform for Exchanging and Improving Data Quality in Cooperative Information Systems. Information Systems 29(7), 551-582 (2004) 54. Schmutz, J., Wheeler, J., Grimwood, J., Dick son, M., Yang, J., Caoile, C., Bajorek, E., Black, S., Chan, Y.M., Denys, M., Escobar, J ., Flowers, D., Fotopulos, D., Garcia, C., Gomez, M., Gonzales, E., Haydu, L., Lopez, F ., Ramirez, L., Retterer, J., Rodriguez, A., Rogers, S., Salazar, A., Tsai, M., Myers, R.M.: Quality assessment of the human genome sequence. Nature 429(6990), 365-368 (2004) 55. Steinmetz, R., Wolf, L.C.: Quality of se rvice: where are we? In: Proceedings of the International Workshop on Quality of Service (IWQoS), New York, USA (1997)

PAGE 174

174 56. Strong, D., Lee, Y., Wang, R.: Data Qua lity in Context. Comm unications of the ACM 40(5), 103-110 (1997) 57. Sumner, T., Khoo, M., Recker, M., Marlino, M.: Understanding Edu cator Perceptions of "Quality" in Digital Libraries. In: Proceedings of the AC M/IEEE-CS Joint Conference on Digital Libraries, pp. 269-279 (2003) 58. Swiss Institute for Bioinformatics and Eur opean Bioinformatics Institute: SwissProt (November 2006). http://www.expasy.org/sprot/ 59. Tan, P., Steinbach, M., Kumar, V.: Introduc tion to Data Mining. Addison Wesley (2005) 60. Wang, R.Y., Reddy, M.P., Kon, H.B.: To ward Quality Data: An Attribute-Based Approach. Decision Support Systems 13(3-4), 349-372 (1995) 61. Wheeler, D.L., Barret, T., Benson, D.A., Brya nt, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L.Y., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Magl ott, D.R., Ostell, J., Miller, V., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T ., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., Yaschenko, E.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35(Database issue), D5-D12 (2007) 62. World Wide Web Consortium: Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation 6 October 2000 (Augus t 2007). http://www.w3.org/TR/2000/RECxml-20001006 63. World Wide Web Consortium: XML Schema (August 2007). http://www.w3.org/XML/Schema 64. World Wide Web Consortium: xml:id Vers ion 1.0, W3C Recommendation 9 September 2005 (July 2007). http://www.w3.org/TR/xml-id/ 65. World Wide Web Consortium: XQuer y 1.0: An XML Query Language, W3C Recommendation 23 January 2007 (August 2007). http://www.w3.org/TR/xquery/

PAGE 175

175 BIOGRAPHICAL SKETCH Alexandra Martnez was born on December 28, 1978 in San Jos, Costa Rica. She grew up with her older sister and parent s in Heredia, a city close to San Jos. She graduated from the Escuela Cubujuqu (Cubujuqui Elementary Scho ol) in 1990 and from the Colegio Cientfico Costarricense (Costa Rican Scientific High Sc hool) in 1995. She earned her B.S. in computer and information science from the University of Costa Rica in 2000. Upon graduation, she joined ArtinSoft, a leading software migration company in Costa Rica. In 2001, she decided to pursue graduate studies abroad after receiving a fellows hip from the cole Polytechnique Fdrale de Lausanne, Switzerland, to enroll in the PreDoctoral Program in computer science. Upon completion of this program in2002 she moved to Florida, where she continued her Master and Doctoral studies in computer engineering at the University of Florida. She obtained her M.S. in 2006, and her Ph.D. in 2007. Alexandra has been married to Arturo Ca macho since 2002, and they have a lovely daughter, Melissa Maria, who is 18 moths old.