CHC Omeka in Azure Overview, Integrity Check, and Restoration
A note about Technical Level of Detail: The purpose of this article is to provide a concise overview of the critical components of the CHC Omeka instance in Azure with special attention to how these are backed up and restored if needed. Lots of technical details are omitted but can be learned by exploring the Omeka documentation, Installing Omeka. All of the pages we will visit in the Azure Portal have associated help documentation that can be reviewed for deeper understanding.
Topic Index
- The CHC's Omeka Repository in Azure
- A Dynamic Collection of Information Assets
- Walking the Fences: Tour the Essential Omeka Components and their Backups in the Azure Portal
- The Docker Registry and Deployment Images
- The CHC Omeka Web Application
- Azure File Shares
- MySQL Database
- Building an Empty CDASH Instance
- Recovering a Past State of CDASH
The CHC's Omeka Repository and Publishing Platform
Omeka is an off-the-shelf repository and publishing tool. The CHC Omeka repository is capable of hosting any number of collections. What we think of as CDASH is a collection hosted in the Omeka application. What makes CDASH particularly unique is a customized conceptual schema of related Places, Documents and Folders and a customized Omeka Theme that provides the map-driven user interface.
The Historical Commission's installation of Omeka is implemented as an Azure Web Application for Linux -- a virtual server somewhere on the internet cloud.
The diagram in figure 1 shows CHC Omeka instance running in the city's Azure cloud resource group has three primary components and two supplemental file systems:
Primary Components
The primary components are critical to the function of Omeka. These would be the most important to understand for the verification of backup and recovery capabilities.
- Docker Container Image that contains all of the customized Omeka code and modules.
- A CHCPersist file share that contains the CDASH Theme, some config files, and the place where Omeka will store images and other digital media that represent the digitized documents.
- A MySQL database that holds all of the properties of Places, Documents and Folders and the relationships that bind them together.
Supplemental File Shares
- CHCScans:During the item import process original media files and metadata are archived in the CHCScans File Share. The CHCScans file share and its contents wil be discussed in a forthcoming web page about importing items.
- CHCOffline: Here, you will find
- The files necessary to build the Docker image for the development and production instances.
- Code for preparng scans for bulk import
- Private documentation that may be of interest for the Camridge IT department.
A Dynamic, Growing Collection of Information Assets
Safe operation of Omeka and understanding recovery scenarios starts with an understandnng of how information acumulates in the database and file systems in the following ways:
- Bulk Upload of Folder, Place, Document metadata and and Media through the customized CDASH CSV Import Module.
- Manual creation of Place and Document items -- with associated Media -- and Folders.
- Enrichment of item properties Experts improving on the naming, classification, annnotation and locations for CDASH items.
- Modification of the Relationships among folders, places and documents.
- Creation of Pages and Exhibits
All of the changes itemized above are recorded in the MySQL database by the Omeka application.
Media files that are uploaded to Omeka are stored in the persistent file-system, CHC_Persist witin the Files subdirectory.
Walking the Fences: Tour the Essential Omeka Components and their Backups in the Azure Portal
Assuming you are a CHC Omeka Administrator, or a member of the Cambridge IT staff, you should be able to launch the Microsoft Azure Portal where you will find yourself sitting at the controls for the CHC_Omeka_Island Resource Group. CHC_Omeka_Island is a domain that holds all of the components for the CDASH production instance and the stage instance. A person with the right permissions can also create new instances here.
Backups and Recovery
Azure has many measures for providing redundancy for recovery from hardware failure. Azure also provides customizable scheduled snapshots that allow for recovery from accidental data loss. It is critical for the CDASH administrator to understand how these work and to regularly verify that they are working. For more information about planning for successful recovery scenarios, see CDASH Disaster and Accident Recovery The sections below will discuss each of the critical CDASH components and their snapshot and recovery provisions.
The Docker Registry and Deployment Images
Omeka is built on a commonly used web server configuration known as LAMP: (Linux, Apache PhP, and MySQL. The target="omeka">Omeka-S Installation documention, explains how preparing an Omeka installation boils down to putting a bunch of files into a properly configured LAMP stack that includes several special modules and config files. Creating instances of Omeka used to involve a lot of set-up for each server instance. Thankfully, our Omeka installation has been set up as a Docker Container Image that will reliably build a running installation of the CHC's particular Omeka site -- with CDASH customizations -- on any computer, physical or virtual.The original CHC Omeka instance, known as the Development Instance, is hosted on a private workstation. For more information see: Building and Configuring the CDASH Development Instance. When the developmet instance is ready for deployment, a slightly different set of docker instructions are prepared for running Omeka in Azure. You can think of Docker Container Images as the installer files for a CHC Omeka instance. The files used to build the container images for the Development and Production are copied in the CHCOffLine filesystem.
The Docker container images for the production instance of CHC Omeka are stored n the CHCRegistry a private docker registry hosted on the CHCOmeka Island resource group.
The CHC Omeka Web Application
The core of the CHC Omeka instance is an Azure Web Application for Linux. You can think of this as the web server that renders pages to web browsers. The web application is built from the docker image. To see how the web app references the docker image go to the CHCOmeka/Deployment Settings
Backup and Restore of Web Applications
Azure automatically creates a snapshot of web app services every hour. The hourly backups are saved for 3 days. 3-hour snapshots are saved for 14 days, 6-hour snapshots are saved for 30 days. Snapshots contain all of the settings for the web app. App services are recovered to a deployment slot that is attached to the original app service. See Back up and restore your app in Azure App Service
The administrator should be aware, however, that recovering the App service will not recover the database or media files. In most recovery scenarios, it would not be necessary to recover the app service. The information stored in Omeka is stored in the CHCPersist file system and in the MySQL database.
Azure File Shares
As discussed above, there are three file shares associated with the CHC Omeka installation. The Omeka administrator should be aware of each of these especially with respect to their contents and Backup Status.
File shares, CHCPersist, CHCOffline and AltPersist are contained in the CHCPersist Storage Account. The CHCScans fileshare is in its own storage account. During our next re-build of the CHC Omeka Island we will consolidate these into a single storage account.
To examine the contents of each file share, you should do the following:
- Open the overview page for the storage account.
- Click Data Storage > File Shares
- Then click the name of the share to open the overview page for that share.
To browse the contents of the file share you can click Browse
To check the status of backup snapshots for a file-share, click Operations in the left-hand navigation menu, then click Backup. You should verify that there are 30 days of backups including one taken this morning.
The Omeka Database in MySQL
The last piece of the data storage architecture that we need to look at is the MySQL database where Omeka stores all of the information that constitutes all of the settings and items that accumulate in the CDASH instance. Each instance of Omeka, e.g. the Production Instance and the Stage Instance have their own tablespace in the same Azure MYSQL server. In our case, the Azure MYSQL Server is named CHCMySQL. Take a look at its configuration. You can see the individual databases for Omeka (production) and Omeka_Alt (stage) as shown below.
Omeka Database Schema
As you can see, the Azure Database for MySQL is actually a host to multiple "databases". Alternate terms for a database in this sense may be Tablespace or Schema. For the purposes of our discussion here we are going to use the term Schema to refer to the MySQL database that an instance of Omeka uses.
You will see that the CHCMySQL server has several schemas. The two of interest for us are:
- Omeka: Schema for the production instance
- Omeka_Alt: Schema for the Stage instance
Connecting Omeka to a Database
Part of setting up or restoring an Omeka instance involves connecting the instance to a mySQL database. If you are setting up a new instance of CDASH, you will need to create a schema in a MySQL server, and point to it using the config/database.ini file that exists in the persist/config folder.
Use MySQL Workbench to explore and manage tables and schemas
Azure's tools for backing up and exporting databases are very coarse grained. If you want to explore tables and import/export individual schemas then you should use MySQL Workbench
MySQL Backup
It is useful to know that our CHCMySQL database automatically stores a complete backup every day. The snapshots are saved for 30 days. Any snapshot can be restored to a new Azure MySQL server. The illustration below shows the details and backup / restore functions at CHCMySQL -> Settings -> Backup Restore as shown below:
Cloning a new Empty Instance of CHCOmeka
Creating a completely new empty instance of the CHCOmeka web app is a useful exercise and also a way to make a new stage instance.
- Create a new Web App for Linux in Azure
- Use the web app deployment tools to choose the latest CDASH docker image.
- Create a new file share for persistent files. The contents for this file share can be copied from the existing chcpersist file share, but do not copy the files directory. Instead, create an empty Files directory in your new persist file share.
- Mount the new persist file share onto your new app service as /var/www/html/persist.
- Create a new Omeka tablespace in an Azure MySQL Database Server. YOu can create this by importing the initial CDASH schema that has been saved to CHCProdOffline file share in the db_exports folder.
Note that the CDASH Schema does not need to be named Omeka. Be careful that you don't wipe out the production Omeka schema!
- Modify the database.ini file to point to your new mySQL schema.
- Point your browser at the new CDASH instance. Omeka will ask you to log in. If you can't find your login credentials, check the CDASH memo in your local admin wiki.
Recovering a Prior State of CDASH from Azure Backups
Recovering a former state of a CDASH instance is similar to creating a new empty CDASH instance, except that if your recovery state was built on the same docker image that is currently in use, you can save yourself the trouble of creating a brand new web application by simply creating a new Deployment Slot on the existing CDASH web app.
- Create a new deployment slot for the CHCOmeka web app.
- Figure out what date and time you need to restore. If media files have been uploaded in the period covered since the chosen date and time, then you must recover the CHC Persist file share. In particular, you need to recover the contents of the persist/files directory. If you don't need to restore media, then you can skip the next two steps.
- Restore the CHCPersist file share as a new file share in an existing storage account.
- Mount the new persist file share to the CDASH web app as /var/www/html/persist.
- Restore the CHCMySql database server to a new Azure Database for MySQL.
- Modify the config/database.ini file in your newly restored persist file share to point to the newly restored schema.
- Point your browser at your new deployment slot.