CHC Omeka in Azure Overview, Integrity Check, and Restoration

Topic Index

The CHC's Omeka Repository in Azure

Omeka is a digital asset management system and web access platform. Based on the Linux, Apache, MySQL, PHP (LAMP) stack. An in-depth understanding of the architecture of Omeka can be found in the Omeka documentation, Installing Omeka.

An instance of Omeka is capable of hosting multiple omeka sites. Currently, CDASH is the only site.

The Historical Commission's installation of Omeka is hosted in Microsoft Azure in a resource group named CHCOmekaIsland, provisioned by the Cambridge IT Department (ITD). The components of the resource group and their interfaces are illustrated below.

Figure 1: Architecture of the CHC Omeka Resource Group in Azure

CHCOmeka Instance -- an Azure Web Application For Linux

The core of the CHCOmeka installation is hosted as an Azure Web Application for Linux. This is a virtual Linux server that can be accessed through the Azure Portal. The portal console provides monitoring tools and access to the windows command line through the Secure Shell (SSH) interface, which is found in the Development Tools section of the web app's left-hand navigation menu.

Omeka Instances and Sites

An omeka installation or instance is capable of hosting multiple "sites" that each can have a distinctive theme, and will appear to users to be different web sites. The distinction between Instances and Sites can be confusing. In our case, we have two instances of Omeka: the Production Instance and the Stage Instance. Each instance can host multiple sites, but currently each instance hosts only one site: CDASH.

The CHCOmeka Stage Instance

Not shown in the diagram above, is a second instance of CHCOmeka that is used for testing and staging new features and content before they are deployed to the production instance. The stage instance, named CHCOmekaStage is identical in architecture to the production instance and has its own MySQL database schema and Persistent file shares. Each instance has its own database schema in the MySQL server and its own file shares.

The Docker Registry and Deployment Images

Docker is a popular tool for packaging and deploying web applications. The CHC Omeka instance is built from a Docker Container Image that contains the Omeka application, the CDASH theme, and all of the modules and configuration files that are necessary to run the CHC's customized version of Omeka. The docker image is stored in a private Docker Registry named CHCRegistry.

Docker Configuration Files

The docker image is built from a set of instructions contained in a Dockerfile that is stored in the CHCOffline file share. The docker image is built on a private workstation known as the Development Instance. When the development instance is ready for deployment, a slightly different set of docker instructions are prepared for running on a desktop workstation for development purposes. Details about how the development Docker instance is set up can be found at Building and Configuring the CDASH Development Instance.

The configuration files for the production instances are stored in the CHCOffline file share in the folder named Docker. The docker image is built from these files and pushed to the CHCRegistry. Notes on the setup of the production instance are stored in the CHCOffline file share in the Documentation folder. Docker images in the registry can be viewed in the Services/Repositories section of the CHCRegistry resource group. Each image is named with a tag that lists the version of Omeka that is referenced, and the date that the image was built. Whenever the web app is re-started it is initialized with the latest version of the docker image from the registry.

The MySQL Database

The docker image for CHCOmeka instance contains the initial state of the application. As the application configured with modules, themes and config files (maintained in persistent file shares that are mounted on the web app). Additional configuration information including resource templates and the properties of all of the items added to the repository are stored in the Azure Database for MySQL, named, CHCMySQL.

The MySQL database is accessed through the desktop application SQL Workbench. Adminstrators should be familiar with SQL Workbench as it is a critical tool in the maintenance and recovery of the CHC Omeka instance.

Persistent File Shares

Because Docker images create an initial Omeka instance, files that are created or manipulated in the development and management of sites and collections are stored in persistent file shares that are mounted on the web application.

The CHCPersist Storage Account

Every Omeka-S installation is installed with four folders that are normally stored in the root folder of the Omeka installation. These are:

  • config: This folder contains configuration files for the Omeka application including database connection information and settings for modules and themes.
  • files: This folder contains all of the media files that have been uploaded to Omeka. These include images, PDFs and other media files that are attached to items in the repository.
  • logs: This folder contains log files that can be useful in diagnosing problems with the application.
  • modules: This folder contains the code for all of the modules that have been installed in the Omeka instance.
  • themes: This folder contains the code for all of the themes that have been installed in the Omeka instance.

In our Azure-based installation, these folders are stored in the CHCPersist file share. The CHCPersist file share is mounted on the web application as /var/www/html/persist. This means that the config folder is found at /var/www/html/persist/config and so forth. These are represented in the Omeka-s root folder as symlinks that are created in the docker image.

.Htaccess, robots.txt and access.log files

The config folder holds the various .ini files that that are part of the Omeka setup. In our installation, the config folder also holds the .htaccess, robots.txt and access.log files that are useful for ongoing monitoring and management of the web service.

CHCOffLine File Share

Resources related to the setup and management of the CHCOmeka web application that are not shared on the open web are stored in a the CHCOffline file share. This file share contains the following folders:

  • Documentation: This folder contains private documentation that may be of interest for the Cambridge IT department.
  • Docker: Here, you will find the files necessary to build the Docker image for the production instance.
  • DB Exports SQL dumps that are taken montly and at strategic times, such as before and after Omeka version upgrades. These dump files are made and recovered with SQL Workbench application.
  • Database_Connect has connection parameters for connecting to the MySQL servers in the CHCOmekaIsland resource group. These saved connections may not contain the actual login credentials, which may be obtained through the Azure portal.
  • ScanPrep: This folder contains code for preparing scans for bulk import

ChcAltPersist File Share

The CHCAltPersist file share is a second persistent file share that is used by the stage instance of CHCOmeka.

CHCScans Storage Account

Media files uploaded to Omeka are stored in a Scans file share in the CHCScans storage account. Because of the number of scan files and the relatively low priority for speed, this separate storage account is stored as a medium access-speed tier.

The Scans file share is mounted in the web root folders on both the production and stage instances. The top level of the Scans file share contains a sub-folder for each batch of media files uploaded to Omeka with the CSV Import Module. A subfolder named Stage, which serves as the side-load folder that is accessed by the Omeka CSV Bulk Opload module.

Even though the import process saves a copy of each of the original media files (renamed with a GUID assigned by Omeka, of the the original media files are kept on-line in Azure as they are useful in some troubleshooting and recovery situations.