Server Architecture: UF Servers for the UF Digital Collections and SobekCM
Trusted Repository Audit Checklist TRAC) for Trusted Digital Repository (TDR) Status
This documentation is scheduled for updates for 2015-2016 as part of the TRAC project. For more on TRAC, please see:
The UF Libraries hold the full digital library collections for UF and hosted partner collections within the same SobekCM instance, sometimes referring to all of these simply as UFDC (for the UF Digital Collections, not the UF Data Center or East Campus Data Center) instead of SobekCM@UF.
The UF Libraries utilize the centralized UF Data Centers and hosting support for secure, highly available, highly reliable systems. More information on the commodity hosting and related services from UF is available from the UFIT website.
For UFDC/SobekCM@UF, the UF Libraries utilize the SobekCM Open Source Repository Software and UFIT services including:
INFORMATION BELOW PENDING UPDATE FROM TRAC PROJECT, ETA 2015-2016
UFDC Server Architecture
At UF, most web services and data services are placed on two servers. The data server includes the SQL database, the caching service, and an instance of Solr/Lucene. A separate Aware JPEG2000 server is used to provide zoomable access to JPEG2000 images of the digital resources. In addition, due to the number of pages ( over seven million as of early 2011 ), a separate file server is used to house all of the digital resource files.
In the image above, the data server interacts directly with the file server because the SobekCM Bulk Loader runs continuously on that server, monitoring various drop boxes and FTP boxes for new incoming packages to load into the library. While the system supports self-submittal and upload of files directly into the web application, this application allows for quick, large-scale loading of items from local directories or remote servers.
UFDC Server Details
All three application servers below run as virtual machines on the CNS VMware vSphere4 infrastructure. This configuration currently supports over 10 million page images, four hundred thousand volumes, and 5.5 million human hits a month with ten search engine robot hits a second.
This server runs the ASP.net web application which supports UFDC, dLOC, and several other names instances/portals. As such, this server is publicly available via HTTP and runs IIS 7.0 web server.
This server has very limited accessibility and is restricted to access via the web server and several desktops used for development and testing. This runs Microsoft Enterprise SQL Server 2008 and also runs Apache Tomcat to host the Solr/Lucene indexes used for full-text searching. This server also runs the SobekCM Builder/Bulk Loader service. Previously, this supported the caching server, although this is no longer utilized due to changes to the memory profile of the web application.
Aware JPEG2000 Server
This servers only purposes is to host the Aware JPEG2000 server and an experimental SobekCM Image Server used for highlighting regions on page images. As such, it is web-accessible so users can view the zoomable images and runs the Apache Tomcat Application/Web Server.
The file server is 14TB of shared storage on CNS Managed storage. This space is currently 90% full with all of the digital resource files for the system.