Monthly Usage Statistics

For software-specific information, please see the Sobek Repository Community site: http://sobekrepository.org/sobekcm/monthlystats


Monthly statistics are compiled from the IIS server logs each month. In addition, all material within UFDC is analyzed to determine if it should be linked to a new registered user. Once the statistcs are compiled, stored, and users linked to each item, monthly usage emails are sent to each user linked to items which received usage during the month.

Relationships between Users and Items

Users can be linked to items in a variety of ways. By default the SobekCM system includes the following possible relationships:

  1. Submittor
  2. Author
  3. Contributor
  4. ANALYZED; NO RELATION - used when a registered user should not be linked to an item and to suppress later analysis.
  5. Thesis Advisor
  6. Unknown

Users are linked to items in the mySobek_User_Item_Link table in the database.

Monthly Process

The monthly process performed is detailed below:

  1. Run the SobekCM_Admin_Suggest_User_Item_Links stored procedure in the database which outputs a spreadsheet. This compares logged user names to the list of authors for all material in the system which is not already linked to that author.
  2. Step through each username in there and try to decide how to link the user with the material. For certain users within your system, it will be obvious which links belong. For others some research may be necessary. Then, add a relationship ID based on the value from the list above.
    1. The review process is supported by the Curator for Digital Collections and Digital Scholarship Librarian.
  3. Transform the first three columns to become an insert command into the mySobek_User_Item_Link table.
  4. At the same time, pull the web logs and build the statistics..
    • Using the SobekCM_Statistics_Reader application, read the IIS web logs and create insert commands to add all the new monthly stats to items, aggregations, etc. (not to user)
    • Examine the list of users who were the biggest hits (usually top 1000) and examine their 'user agent' information (i.e, which browser/applications/os) to determine if this user is actually a search engine robot. If hits are found that are being logged as non-robot which are actually robots, update the search engine robot identification algorithm and re-run the stats collection. Update the web applications' search engine robot identification algorithm similarly.
    • Run the outputted statistics insert commands against the database
  5. Once all the usage stats for the month and the new relationships are added to the database, run the SobekCM_Statistics_Aggregate stored procedure which will only let itself be run once for each month. This compiles all the statistics in an upwardly fashion. This is where, for example, the aggregations get the statistics added for item usage for all the items that are linked to them. This also marks which users have items linked to them which have statistics to make the email queued and to add the option to the mySobek users menu.
  6. Run the SobekCM_Statistics_Reader application and send all the usage emails for the month(s) in question.

Future Plans

Future plans include integrated this process into the SobekCM_Builder, rather than using a separate application, and also having an option where it will perform the inserts directly against the database.

In addition, future plans include an auto-suggest directly to users when they logon to ask if recently added material which appears to be linked to them should be. Likewise, when a new user registers, it would be good to ask the user if the user is linked to any of the existing items.