<%BANNER%>






Model for the fast estimation of basis set superposition error in biomolecular systems
http://aip.org/ ( Publisher's URL )
CITATION PDF VIEWER
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/IR00000631/00001
 Material Information
Title: Model for the fast estimation of basis set superposition error in biomolecular systems
Series Title: J. Chem. Phys. 135, 144110 (2011); doi: 10.1063/1.3641894
Physical Description: Journal Article
Creator: Faver, John
Publisher: American Institute of Physics
Place of Publication: THE JOURNAL OF CHEMICAL PHYSICS
Publication Date: 10/12/2011
 Notes
Abstract: Basis set superposition error (BSSE) is a significant contributor to errors in quantum-based energy functions, especially for large chemical systems with many molecular contacts such as folded proteins and protein-ligand complexes. While the counterpoise method has become a standard procedure for correcting intermolecular BSSE, most current approaches to correcting intramolecular BSSE are simply fragment-based analogues of the counterpoise method which require many (two times the number of fragments) additional quantum calculations in their application. We propose that magnitudes of both forms of BSSE can be quickly estimated by dividing a system into interacting fragments, estimating each fragment’s contribution to the overall BSSE with a simple statistical model, and then propagating these errors throughout the entire system. Such a method requires no additional quantum calculations, but rather only an analysis of the system’s interacting fragments. The method is described herein and is applied to a protein-ligand system, a small helical protein, and a set of native and decoy protein folds.
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by John Faver.
Publication Status: Published
 Record Information
Source Institution: University of Florida Institutional Repository
Holding Location: University of Florida
Rights Management: All rights reserved by the submitter.
System ID: IR00000631:00001

Downloads

This item is only available as the following downloads:

JChemPhys_135_144110 ( PDF )


Full Text

PAGE 1

Model for the fast estimation of basis set superposition error in biomolecular systems John C. Faver, Zheng Zheng, and Kenneth M. Merz Citation: J. Chem. Phys. 135, 144110 (2011); doi: 10.1063/1.3641894 View online: http://dx.doi.org/10.1063/1.3641894 View Table of Contents: http://jcp.aip.org/resource/1/JCPSA6/v135/i14 Published by the American Institute of Physics. Related Articles Polymorph specific RMSD local order parameters for molecular crystals and nuclei: -, -, and -glycine J. Chem. Phys. 135, 134101 (2011) Mathematical analysis of the boundary-integral based electrostatics estimation approximation for molecular solvation: Exact results for spherical inclusions J. Chem. Phys. 135, 124107 (2011) Influence of temperature on thymine-to-solvent vibrational energy transfer J. Chem. Phys. 135, 114505 (2011) Spectral characterization in a supersonic beam of neutral chlorophyll a evaporated from spinach leaves J. Chem. Phys. 135, 114303 (2011) Pairwise additivity of energy components in protein-ligand binding: The HIV II protease-Indinavir case JCP: BioChem. Phys. 5, 08B622 (2011) Additional information on J. Chem. Phys. Journal Homepage: http://jcp.aip.org/ Journal Information: http://jcp.aip.org/about/about_the_journal Top downloads: http://jcp.aip.org/features/most_downloaded Information for Authors: http://jcp.aip.org/authors Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 2

THEJOURNALOFCHEMICALPHYSICS 135 ,144110(2011)Modelforthefastestimationofbasissetsuperpositionerror inbiomolecularsystemsJohnC.Faver,ZhengZheng,andKennethM.Merz,Jr.a)QuantumTheoryProject,TheUniversityofFlorida,2328NewPhysicsBuilding,P.O.Box118435, Gainesville,Florida32611-8435,USA(Received14June2011;accepted1September2011;publishedonline12October2011) Basissetsuperpositionerror(BSSE)isasignicantcontributortoerrorsinquantum-basedenergy functions,especiallyforlargechemicalsystemswithmanymolecularcontactssuchasfoldedproteinsandprotein-ligandcomplexes.WhilethecounterpoisemethodhasbecomeastandardprocedureforcorrectingintermolecularBSSE,mostcurrentapproachestocorrectingintramolecular BSSEaresimplyfragment-basedanaloguesofthecounterpoisemethodwhichrequiremany(two timesthenumberoffragments)additionalquantumcalculationsintheirapplication.Wepropose thatmagnitudesofbothformsofBSSEcanbequicklyestimatedbydividingasystemintointeractingfragments,estimatingeachfragment'scontributiontotheoverallBSSEwithasimplestatisticalmodel,andthenpropagatingtheseerrorsthroughouttheentiresystem.Suchamethodrequires noadditionalquantumcalculations,butratheronlyananalysisofthesystem'sinteractingfragments. Themethodisdescribedhereinandisappliedtoaprotein-ligandsystem,asmallhelicalprotein,anda setofnativeanddecoyproteinfolds. 2011AmericanInstituteofPhysics .[doi: 10.1063/1.3641894 ]INTRODUCTIONTheapplicationofquantumchemistrytolargemolecularsystemsisachallengingendeavorthatiscomplicatedby severalfactors.First,thehighnumberofdegreesoffreedommakesorbitalandconformationaloptimizationvery computationallydemanding,whichhasledtonovellinear scalingalgorithmssuchasFMO,1MFCC,2anddivideand conquerschemes.3 5Inaddition,largemolecularsystems containmanydifferenttypesofchemicalinteractions,allof whichneedtobeaccuratelymodeledbytheenergyfunctioninordertoreliablyestimatetheenergyofthecompositesystem.6 7Effortshavebeenmadetoestimateandcorrectforthesefragment-basedinteractionenergyerrorsas well.8 9Compactmolecularsystemswithmanyinter-andintramolecularcontactsintroduceyetanothersourceoferror inquantumchemicalcalculationsonlargesystems:basisset superpositionerror(BSSE).BSSEisaconsequenceofusingincompletebasissets,andstemsfromthefactthatfragmentAofasystemcanusebasisfunctionsfromaproximalnonbondedfragmentBtovariationally(andarticially) lowerA'scontributiontotheelectronicenergyand,intheend, overestimatethestrengthofthenonbondedmolecularinteractionbetweenfragmentsAandB.Thecounterpoiseprocedure hascommonlybeenutilizedtocorrectforBSSEintheintermolecularcase.10Intheprocedure,theenergiesofsystems AandBareevaluatedbothwithandwithoutthebasisfunctionsofthepartnersystem.Thesumofenergydifferencesbetweenthecalculationswith(E AandE B)andwithout(EAand EB)theadditionalbasisfunctionsisthemagnitudeofarticial stabilizationduetoBSSE( EBSSE). EBSSEfromEq. (1) is a)Authortowhomcorrespondenceshouldbeaddressed.Electronicmail: merz@qtp.u.edu.alwaysnegativeandshouldbesubtractedfromthecalculated interactionenergybetweenAandB: EBSSE= E AŠ EA+ E BŠ EB. (1) IntramolecularBSSE(IBSSE)hasbeenobservedin moleculesassmallasbenzene,forwhichanonplanaroptimumgeometryisobservedwhenusingsmallPople-stylebasissetswithMP2.11 12AmainconcernaboutIBSSEisthatit affectstheabilitytocomparedifferentconformationsofthe overallsystem.Balabin'sestimationofIBSSEinsmallpeptidessuggeststhatIBSSEcanoftenbeequaltoorevengreater inmagnitudethantherelativeenergiesbetweensmallpeptide conformations,13whichmightprohibitquantum-basedenergy functionscontainingIBSSEfromproducingreliableresults inanycomputationalstudyrequiringaccuratepotentialenergysurfacessuchasfreeenergycalculations,moleculardynamicssimulations,orevensimplegeometryoptimization. MostcurrentmethodsofestimatingIBSSEareintramolecularanaloguesofthecounterpoisemethodforintermolecular systems.Theoverallsystemisbrokendownintomolecular fragments(orinsomecasesindividualatoms14)whichare thenanalyzedwithandwithoutneighboringbasisfunctions toestimatetheenergydifferencesduetoIBSSE.Thesemethods(withtheexceptionoftheatom-basedmethod)require inputfromtheuserabouthowtofragmenttheoverallsystem, whichisnon-unique.Furthermore,thesemethodsrequireadditionalquantumcalculationsforeachfragment,leadingtoa totalof2N + 1calculationswhereNisthenumberoffragments(unlesstheisolatedfragmentoratomicenergiesare storedinadatabase,inwhichcasetherewouldbeN + 1 calculations).Thegeneratedfragmentsmaybeleftasradicalsorsaturatedwithhydrogenlinkatoms,eitherofwhich mayaltertheelectronicenvironmentofthefragmentandyield0021-9606/2011/135(14)/144110/8/$30.002011AmericanInstituteofPhysics 135 ,144110-1 Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 3

144110-2Faver,Zheng,andMerz,Jr.J.Chem.Phys. 135 ,144110(2011) FIG.1.DistributionsofBSSEmagnitudes(atMP2/6-31G*)ofinteractionsinourproteinfragmentdatabase.(a)Distributionofalltypesofinteract ions(N = 997).(b)Distributionofpolarinteractionsincludingbackbone-backbonehydrogenbondsandchargedinteractions(N = 643).(c)Distributionofnonpolar interactions(N = 354).uncertaintiesintheestimationofthefragment'scontribution toIBSSE.STATISTICALMODELOFFRAGMENT-BASED CONTRIBUTIONSTOBSSERecentlywehaveintroducedamethodofestimatingerrorsinenergyfunctionsforlargemoleculesbyestimatingfragment-basederrorsandpropagatingtheseerrors overtheinteractingfragmentsoftheentiresystem.8 9The fragment-basederrorestimatesarederivedfromadatabase ofcommoninteractingfragmentsfoundinproteinsand protein-ligandcomplexesandtheresultingerrorprobability densityfunctionsconstructedbycomparingtheircalculated energiesfromagivenmethodwithaccuratereferenceenergies(e.g.,CCSD(T)/CBS).Byassumingthatthefragmentbasedinteractionscontainerrorsthatareindependentfrom oneanother(thisseemstobeanacceptableassumptionfor largelyelectron-localizedsystemssuchasproteins),15each fragment'scontributiontotheoverallerrorcanbeestimated withtheappropriateprobabilitydensityfunctionandthenbe propagatedthroughouttheoverallsystemtoyieldanoverall estimateofbothsystematicandrandomerror. InordertoapplythesemethodstotheproblemofBSSE, wehavegeneratedthousandsofinteractingmolecularfragmentsfromhighresolution( < 2.0)crystalstructuresfrom theProteinDataBank(PDB)withanin-housefragmentationprogram.EachPDBstructurewasrstsaturatedwithhydrogenatomswiththeprogramREDUCE(Ref. 16 )followed byanoptimizationofthehydrogenpositionswithff99sb (Ref. 17 )inAMBER(Ref. 18 )beforefragmentation.A descriptionofthefragmentationalgorithmisgiveninthesupplementarymaterial.21Arandomsampleofnearly1000interactingfragmentswasselectedandcategorizedbytheinteractiontypesofbackbone-backbonehydrogenbonds(312), charged(107),polar(224),andnonpolar(354)interactions. Theinteractingfragmentswereanalyzedforgas-phaseelectronicinteractionenergywithMP2/6-31G*(anarbitraryexampleenergymodelsuretoyieldsignicantBSSE)withand withoutthecounterpoisecorrectioninordertodeterminethe BSSEmagnitudes.ThecalculationswereperformedusingtheGAUSSIAN09program.19ThedistributionsofBSSEmagnitudes(kcal/mol)areplottedinFigure 1 ,whichshowsaclear distinctionbetweenthevanderWaals/nonpolarandhydrogen bonded/polarfragmentpairs. InourrstattempttoestimateBSSEforlargesystems, weproposedtousethesamestrategyasdescribedpreviously Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 4

144110-3ModelofbasissetsuperpositionerrorJ.Chem.Phys. 135 ,144110(2011) FIG.2.UsingGaussianprobabilitydensityfunctionstodescribeandpredictBSSEinbackbone-backbonehydrogenbondsinproteins.involvingtheconstructionofGaussianprobabilitydensity functions(pdf's)describingthelikelymagnitudesofBSSE betweendatabasefragments.Anexamplepdfisgivenin Figure 2 forbackbone-backbonehydrogenbonds.SimplyusingthemeanandstandarddeviationofthesefunctionstopredictBSSEbetweenfragmentinteractionsmaybeaveryfast methodbutithassomedisadvantages.First,BSSEalwaysincreasesthestabilityofdimersandthuswillonlylieonone sizeofzeroontherealnumberline.Therefore,theBSSE valuescannotbetrulynormallydistributed.Second,foreach interactiontype,theapproximatenormaldistributionsare allverywide,whichwouldyieldimpreciseBSSEestimates andlargepropagatedrandomerrorbars.Finally,ineachof theinteractiontypedistributions,therewereseveraloutliers withextremelyhighvaluesofBSSEwhichlikelyhavearisen frompoorlyrenedcontacts(stericclashes)inthecrystal FIG.3.Distancedependenceofcalculatedandpredictedbasissetsuperpositionerror(atMP2/6-31G*)inapairofhydrogen-bondedbackbonepeptide fragments(red)andanonpolarcomplex(blue)takenfromourproteinfragmentdatabase.ThemeasuredBSSEvaluesareplottedascirclesalonglines,a nd thepresentmodel'spredictions(Eq. (2) )areshownassquareswiththeirrespectiveerrorbars.AsterisksmarktheintermoleculardistancesfoundinthePDB structures.SincethemodelwasparameterizedwithPDBgeometries(i.e.,nearequilibrium),theBSSEmodelhasmoresuccesswithinteractionsatnea requilibriumdistancesthanatverycloseintermoleculardistances. Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 5

144110-4Faver,Zheng,andMerz,Jr.J.Chem.Phys. 135 ,144110(2011) FIG.4.(a)Modelforpredictingbasissetsuperpositionerrortrainedwith312proteinbackbone-backbonehydrogenbondingfragmentinteractions. Thedata weretwithalinearregressionmodelusingabimolecularproximitydescriptorasanindependentvariableandhadanR2of0.89.(b)ThemodelforBSSE trainedwith354nonpolarcomplexesfromtheproteinfragmentdatabaseyieldedanR2of0.85.structures.Eitherfurtheroptimizationoftheproteinstructures orsomesortofdatabaselteringcriteriawouldberequiredto addressthisproblem. SinceBSSEhasastrongdependenceonthegeometric orientationoftwointeractingfragments,weintroduceasimplegeometry-dependentmodeltoestimatefragmentcontributionstoBSSEratherthanaGaussianpdf.Inordertobuild ourmodel,weintroduceabimolecularproximitydescriptor toquicklyandroughlymeasuretheproximityPoftwofragmentsAandB: PAB= a + bNAi NBjeŠ cr2 ij, (2) whereNAandNBarethenumbersofheavy(non-hydrogen) atomsinfragmentsAandB,a,b,andcare(positiveandreal) optimizableparameters,andrijisthedistancebetweenheavy atomsiandj.Thus,onlytheproximalnon-hydrogenatomson twodifferentfragmentssignicantlycontributetotheoverall proximityscore.Thescorehasafewdesirableproperties:it takesononlypositivevalues,issmallfornon-interactingfragments,anditqualitativelymodelstheexponential-likedecay inactualdistancedependencecurves(seeFigure 3 ).Inaddition,theBSSEestimatorinEq. (2) shouldbelesssensitivetothemolecularpartitioningschemethanacounterpoisebasedmethodforIBSSE,sincetheatomiccontributions toBSSEfalloffquicklywithdistance.Inotherwords, Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 6

144110-5ModelofbasissetsuperpositionerrorJ.Chem.Phys. 135 ,144110(2011) FIG.5.DistributionoferrorsinBSSEpredictionsforthefragmentsmakinguptheHIV-2protease/indinavircomplex.Althoughtwooftheindividuale rrors werequitelarge,weobservedfavorablecancellationoftheseerrorstowardzero,whichallowedforourcloseestimateofoverallBSSEforthecomplex .includingadditionaldistantatomsinafragmentinteraction willhavelittleeffectonthesuminEq. (2) .Theparametersa, b,andccouldbemadetodependontheatomsiandjconsidered,sincedifferentfragmentcontributionstoBSSEmay havedifferentdistancedependencies(e.g.,aliphaticvs.aromaticorionicfragments),butinourinitialinvestigationswe usedoneparametersetforeachtypeofinteraction. WetrainedEq. (2) totthecomputedBSSE(atMP2/631G*)ofthe312hydrogen-bondedbackbone-backbonesystemsattheirPDBgeometriesandfoundthebestagreement withthecalculatedvalueswhena = 0.254,b = 3.88,and c = 0.191.Weinvestigatedvaryingthepowerofrijin Eq. (2) andfoundnosignicantimprovementsfromusing rijratherthanr2 ijinthebackbone-backbonehydrogenbond complexes.Ourbestmodelhadacoefcientofdetermination ofR2= 0.89(Figure 4(a) ).Thesamefunctionparameterized forthenonpolar(vanderWaals)complexesyieldedtheoptimalparametersa = 0.522,b = 9.11,andc = 0.285,andan R2of0.85(Figure 4(b) ).Thenonpolarcomplexesprovided moreofachallengetotthanthebackbone-backbonehydrogenbondsduetotheirhigherchemicaldiversity.Thecharged systemsweredividedintopositiveandnegativelychargedinteractions.ThenalparametersetsareshowninTable I ByusingthefragmentBSSEdataasareferenceset,we cannowpredictsystematicandrandomerrorsduetoBSSETABLEI.ParameterizationofEq. (2) forfourdifferentinteractiontypes. TypeNabcR2 Nonpolar3540.2543.8830.19070.85 Hydrogenbond3120.5229.1050.28470.89 Positivelycharged440.98329.350.42260.68 Negativelycharged631.5729.280.34560.77 inlargebiomolecularsystems.Aftercalculatingatotalenergy forasystem,itisfragmentedaccordingtothesamerulesused indesigningthereferencedatabase.Eachresultingfragment isthenlabeledaccordingtointeractiontype.Inthecaseof anyfragmentwithmultipleinteractiontypes,ahierarchywas usedwhichwasdeterminedbytherelativecontributionsto BSSEfromthefourdifferentinteractionclasses.Negatively chargedmoietiestakethehighestprecedenceduetobeingthe highestBSSE-contributinginteractionclass.Intheabsenceof negativecharges,positivechargesaresought,followedbyhydrogenbonds.Intheabsenceofallthesefeaturestheinteractionisconsiderednonpolar.Thepredicted"systematicerror" (BSSE)thencomesdirectlyfromevaluationofEq. (2) (using theappropriateparametersetfromTable I )andtherandom errorcomesfromthelinearregressionmodelasevaluatedby Eq. (3) below,wheretistheStudent'st-valuewhichdepends onthepopulationsizeNandthedesiredcondencelimit, x isthenewlyestimatedBSSEvalue,xiandyiarethedatabase predictedandmeasuredBSSEvalues,and xisthemeanpredictedBSSEvalueinthereferenceset. ErrorRandom= tS1 / 2 1 + 1 N + ( x Š x )2 N i(xiŠ x)2, (3) where S = N i[yiŠ (bxi+ a)]2 N Š 2 (4) Byassumingadditivityoffragmentcontributions,theoverall systematicerror(totalBSSEestimate)isthenthearithmetic sumofthepredictedfragmentBSSEcontributionsandthe overallrandomerror(totalerrorbar)isthePythagoreansum oftherandomerrorestimates. Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 7

144110-6Faver,Zheng,andMerz,Jr.J.Chem.Phys. 135 ,144110(2011)ApplicationsTodemonstratetheuseofthesemodels,werstexaminedintermolecularBSSEinthecaseofprotein-ligand bindingbystudyingtheHIV-2protease/indinavircomplex (PDBID:1HSG).Thefragmentsstudiedwerethesamethat wereusedinapreviousstudy.8The21fragmentswereevaluatedforBSSEatMP2/6-31G*andwereclassiedaccording tointeractiontype.Wepredictedthefragmentcontributions toBSSEwithEq. (2) andTable I andcomparedourpredictionswiththemeasuredBSSEvalues.Theresultsarelistedin Table II .Overall21fragmentinteractions,wepredictedthe totalBSSEtowithinanerrorof1.02kcal/mol,whichlay withinourestimatederrorbarof2.26kcal/mol(68%condence).Weobservedthatsomeoftheindividualfragment BSSEpredictionswereoffbyasignicantamount(ourmodel predictedfragment4tobe2.14kcal/moltoohighinBSSE), butoverallthepopulationofindividualerrorsseemedtocancelfavorablytowardzero(Figure 5 ). WethenexaminedintramolecularBSSEinthecaseofa small,synthetic,helicalproteinwithaknowncrystalstructure(Figure 6 ,PDBID:1AL1).Thestructurewassaturated withhydrogenatoms,followedbyanoptimizationoftheir positionswiththeff99sbforceeld.Thestructurewasthen partitionedwithourin-housefragmentationprograminto 10backbone-backbonehydrogenbondedcomplexesand3 nonpolarcomplexesfromsidechain-sidechaininteractions.TABLEII.ResultsfrompredictingtheBSSEof21independentchemical fragmentsinvolvedinthebindingofindinavirtoHIV-2protease.Thefragmentsareidentiedbysystemnumberandarelabeledbyinteractiontypes: np:nonpolar,p:polar,pc:positivelycharged,andnc:negativelycharged.The lasttworowscontainarithmeticandPythagoreansumsforthepropagated systematicandrandomerrors.Unitsarekcal/mol. PredictedMeasuredPredictedMeasured NumberTypeBSSEBSSEerrorerror 1np1.492.050.30 Š 0.56 2np0.830.900.30 Š 0.07 3pc1.661.450.380.21 4nc10.408.271.012.14 5nc6.976.900.930.07 6nc2.103.470.93 Š 1.37 7p2.623.070.17 Š 0.44 8np1.452.130.30 Š 0.68 9np1.411.980.30 Š 0.56 10np2.081.900.300.19 11np1.261.330.30 Š 0.06 12np1.261.660.30 Š 0.40 13np1.431.720.30 Š 0.30 14np0.910.700.300.21 15np0.841.100.30 Š 0.26 16np0.730.630.300.11 17np1.511.150.300.36 18np1.331.280.300.05 19np0.881.050.30 Š 0.18 20np1.621.570.300.05 21nc3.262.770.920.49 Arithmeticsum46.0647.08... Š 1.02 Totalerrorbar......2.26... FIG.6.Helicalproteinfragmentstructure(PDBID:1AL1)usedinthe demonstrationofthepresentedmodelforintramolecularbasissetsuperpositionerror.Thefragment-basedmodelpredicted30.94 0.74kcal/molof overallIBSSE.ThesumofcalculatedBSSEvaluesfortheinteractingfragmentswas31.60kcal/mol.Byanalyzingtheintramolecularinteractionsmakingupthe overallsystemandusingthemodelsbuiltfromEq. (2) and propagatingerrorestimates,weestimatedtheoverallIBSSE atMP2/6-31G*tobe30.94 0.74kcal/mol(68%condence).Forcomparison,wealsoseparatedtheindividual chemicalfragmentsandmeasuredtheBSSEbetweenthem withthetraditionalintermolecularcounterpoisemethod.The sumoffragment-basedcontributionswas31.60kcal/mol, whichisclosetotheestimatefromthestatisticalmodeland lieswithintheestimatederrorbar. Thelasttestofourmethodinvolvedtheinvestigationofa setof9nativeNMRand33decoyfoldsofthePin1WWdomain(PDBID:1I6C).Acommonwayoftestingscorefunctionsandmethodsofproteinfoldingpredictionistocompare nativeanddecoyproteinfoldsandattempttoenergetically separatethem.Thefreeenergydifferencesbetweennative andnon-nativeproteinfoldsaretypicallyontheorderof10 20kcal/mol,soaccurateenergycomputationisveryimportantforsuccessfuldiscriminationbetweennativeanddecoy folds.FMO-MP2/6-31G* + PCMenergiesofthisparticular setwereevaluatedpreviously20butwereunabletodiscriminatebetweennativeanddecoyfolds.Toexaminetheeffectof IBSSEonthisresult,weestimatedthemagnitudesofIBSSE ineachfoldaccordingtothepresentlydescribedmethod.Asa validationstep,wecomputedthesumofmeasuredfragment BSSEvaluesforoneofthenativeNMRmodelswhichwas 97.7kcal/mol.Ourestimatedvalueusingthestatisticalmodel was93.28 3.85(95%condence).Overthewholesetof decoys,weobservedthatthenativeNMRmodelshadtighter intramolecularpackingandthereforeyieldedgenerallyhigher IBSSEestimatesthanthenon-nativefolds.Wealsoobserved thatthespreadinIBSSEestimateswasaround70kcal/mol, whichwasunexpectedlylarge.BSSEisusuallythoughtof asasystematicerrorinthatitalwaysoverestimatesstability, andtheseerrorsarehopedtolargelycancelwhencomparingdifferentconformationsofthesamesystem.However,we observeinthissetaverywidedistributionofIBSSEvalues inthesameproteinsystem,implyingthatmuchoftheerror wouldnotcancelwhencomparingconformationalenergies. Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 8

144110-7ModelofbasissetsuperpositionerrorJ.Chem.Phys. 135 ,144110(2011) FIG.7.(a)Theestimatedintramolecularbasissetsuperpositionerrorsofasetofnativeanddecoyfoldsofasmallproteinfragment,thePin1WWdomai n (PDBID1I6C).ThenativeNMRmodelsarehighlightedwithasterisks.(b)TherelativeFMO-MP2/6-31G* + PCMenergiesplottedagainstall-atomrootmean squaredeviationfromareferenceNMRstructure.UncorrectedenergiesareshowninblackwhileBSSE-correctedenergiesareshowninredwiththeires timated errorbars.AllfoldscontainedasignicantamountofBSSE,butthevarianceintheBSSEmagnitudesleadtoadifferentorderingoffoldsbyenergyafte rBSSE corrections.Thisleadstoadifferentrankingoffoldsbyenergybeforeand afterIBSSEcorrections(Figure 7 ).CONCLUSIONSWehavepresentedasimpleparameterizedmodelusinganovelbimolecularproximitydescriptortoquicklyestimatethebasissetsuperpositionerrorofsmallmolecularfragmentsconstitutinglargebiomolecules.Thesefragment-based BSSEestimationscanbepropagatedoveralargebiomolecule orcomplextoestimateinter-orintramolecularBSSE.The methodhastheadvantageofrequiringnoadditionalquantum calculations,butratheritrequiresananalysisofthecomprisingmolecularinteractionsandreliesonttedstatisticalmodelsthatassumeadditivityoffragmentcontributionstooverall IBSSE.AlongwithanestimateforoverallBSSE,themethod alsocangenerateerrorbars,allowingtheresearchertointroducecondencelimitsintheirresultswhenattemptingtodistinguishbetweenproteinfoldsorligandposes.Themethod couldeasilybeextendedforusewithotherchemicalsystems, quantummethods,orbasissetsbyreplacingthetrainingset data. Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions

PAGE 9

144110-8Faver,Zheng,andMerz,Jr.J.Chem.Phys. 135 ,144110(2011)ACKNOWLEDGMENTSThisworkwasfundedbytheNationalInstitutesofHealth ( http://nih.gov ):GM044974andGM066689.Fundingfor open-accesspublicationofthisarticlewasprovidedbythe UniversityofFloridaOpen-AccessPublishingFund.Thefundershadnoroleinstudydesign,datacollectionandanalysis, decisiontopublish,orpreparationofthemanuscript.1K.Fukuzawa,K.Kitaura,M.Uebayasi,K.Nakata,T.Kaminuma,and T.Nakano, J.Comput.Chem. 26 (1),1(2005).2X.H.ChenandJ.Z.H.Zhang, J.Chem.Phys. 125 ,044903(2006).3W.YangandT.-S.Lee, J.Chem.Phys. 103 ,5674(1995).4S.L.DixonandK.M.MerzJr.,in EncyclopediaofComputational Chemistry ,editedbyP.v.R.Schleyer(Wiley&SonsLtd,BafnsLane, Chichester,1998),Vol.1,p.762.5X.HeandK.M.Merz, J.Chem.TheoryComput. 6 (2),405(2010).6K.A.Dill, JBiol.Chem. 272 (2),701(1997).7K.M.Merz, JChem.TheoryComput. 6 (5),1769(2010).8J.C.Faver,M.L.Benson,X.He,B.P.Roberts,B.Wang,M.S.Marshall, M.R.Kennedy,D.C.Sherrill,andK.M.Merz, J.Chem.TheoryComput. 7 (3),790(2011).9J.C.Faver,M.L.Benson,X.He,B.P.Roberts,B.Wang,M.S.Marshall, D.C.Sherrill,andK.M.Merz, PloSONE 6 (4),e18868(2011).10S.F.BoysandF.Bernardi, Mol.Phys. 19 (4),553(1970).11D.Moran,A.C.Simmonett,F.E.Leach,W.D.Allen,P.V.Schleyer,and H.F.Schaefer, J.Am.Chem.Soc. 128 (29),9342(2006).12D.Asturiol,M.Duran,andP.Salvador, J.Chem.Phys. 128 ,144108(2008).13R.M.Balabin, J.Chem.Phys. 132 ,231101(2010).14F.Jensen, J.Chem.TheoryComput. 6 (1),100(2010).15M.N.Ucisik,D.S.Dashti,J.C.Faver,andK.M.Merz, J.Chem.Phys. 135 ,085101(2011).16J.M.Word,S.C.Lovell,J.S.Richardson,andD.C.Richardson, J.Mol. Biol. 285 (4),1735(1999).17V.Hornak,R.Abel,A.Okur,B.Strockbine,A.Roitberg,andC.Simmerling, Proteins 65 (3),712(2006).18D.A.Case,T.A.Darden,I.T.E.Cheatham,C.L.Simmerling,J.Wang, R.R.E.Duke,andR.C.W.Luo,AMBER11,UniversityofCalifornia,San Francisco,2010.19M.J.Frisch,G.W.Trucks,H.B.Schlegel etal. ,GAUSSIAN03 Revision E.01,Gaussian,Inc.,Wallingford,CT,2004.20X.He,L.Fusti-Molnar,G.L.Cui,andK.M.Merz, J.Phys.Chem.B 113 (15),5290(2009).21Seesupplementarymaterialat http://dx.doi.org/10.1063/1.3641894 for adescriptionofthemolecularfragmentationalgorithmusedinthis study. Downloaded 25 Oct 2011 to 128.227.40.106. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions