Data Warehousing and Building Analytics at Cooper Hewitt, Smithsonian Design Museum

A paper I wrote about the collection and analysis of large amounts of data generated by interactive devices in a museum.

“Binders and boxes on shelves in a large archive” by Samuel Zeller on Unsplash

Contents

Abstract

Cooper Hewitt, Smithsonian Design Museum (CHSDM) is generating and collecting large amounts of data. This data, generated by the use of an interactive pen (The Pen) presents an opportunity for CHSDM to develop a deeper understanding of its visitors.

To better understand the data, CHSDM has started to develop new analytics tools to allow its staff the ability to look into all of its datasets at once, generating queries and reports in a more holistic and cohesive way by using a technique known as Data Warehousing.

This paper will use data collected and generated by CHSDM as a case study. At the time of this writing The Pen has been issued to over 130,000 visitors and used to collect over 3 million objects in galleries and through interactive tables. Each time an object is collected, a record is created of the Pen that was used to collect it, the time and date it was collected, and whether it was collected by tapping the Pen to a label near an object or through the use of an interactive table.

CHSDM knows which objects are the most popular at any given time, how long visitors spend in its galleries, and many other quantitative facts about visitor behavior it previously was unable to understand. The museum is beginning to be able to develop a deep understanding around the ways its visitors behave with the Pen and this paper will attempt to explain how it can continue to develop the tools necessary to dig even deeper.

Background

For just about as long as museums have been around, they have been in the business of collecting and storing data. Museums collect data about objects, data about people, and even data about time. One could argue that a museum is in a way a kind of data warehouse.

Initially, museums have focused their efforts on understanding and making accessible their collection meta-data. Today, museum collection meta-data is pretty common, and more recently, museums have begun to take steps to make this meta-data available to the public.

The publication of this data has allows the public, students, scholars and researchers easy access to the knowledge developed by the museum around its holdings. Building rich collections websites allows this knowledge to be more discoverable, offering new ways of learning.

But collection meta-data is only one aspect of the kind of data one might find in and around a place like a museum. More recently, museums have begun collecting and storing meta-data about, visits to the museum’s website, data about ticket sales and attendance at public programs, and even the temperature and humidity inside its galleries.

In parallel to this extensive work to generate and collect the data, there is the notion that there is some sort of underlying knowledge living within. The idea being that if institutions like museums are collecting all of this data, they will one day understand more about the museum, more about their visitors, and eventually be able to be more successful as an institution.

In 2014, CHSDM reopened to the public after a long period of renovation and redesign. To coincide with the museum’s reopening, CHSDM launched a re-imagined visitor experience, filled with technology and innovation and built from the ground up. One of the more interesting aspects of this new experience (The New Experience) at CHSDM is certainly the Pen. The Pen is an interactive device that is issued to each visitor allowing one to collect the things they find interesting and store them on a personal web-page for retrieval at a later date.

There have been many writings, reviews, and discussions about the Pen and its implications within the context of a museum. This paper will address the behind the scenes aspects of the Pen, all of the data it generates and how these sources of data may be used to help the museum begin to ask questions around the behavior of its visitors.

Beginning Analytics

Analytics, is a set of tools & methods which allow one to understand some aspect of a complex data-set. Formally, Analytics is the systematic or computational analysis of data or statistics (Google Definition). It’s important to understand that analytics is not data, and data is not analytics.

For a long while, museums have been using tools such as Google Analytics to better understand the Internet traffic to its website. Internet traffic to a museum’s website can offer insight into its audience and offerings, and can help museum staff to make informed decisions, and develop a strategy for future efforts. Through the use of Google Analytics museums are able to easily understand who is coming to its website, what these web visitors are interested in, and what they might find frustrating or confusing. Overall, Google Analytics is an incredibly powerful tool that has been used for years to uncover just one aspect of a museum’s operations.

One of the more interesting side effects of the use of Google Analytics is that it has raised the expectation about what is possible. The tool itself is very easy to use and after a short tutorial can be prove very enlightening to even the least tech-savvy employee.

Another major data-set within a museum is its collection meta-data. This is typically the data stored in collection databases such as The Museum System(TMS), a powerful collections management tool created by Gallery Systems. These types of databases were originally designed as repositories of information to help a museum store and locate objects in their collections. Collections management systems were originally meant to behave like inventory management applications, allowing a museum to track its vast collections and the movements of all its objects and holdings.

Eventually, collection meta-data became interesting in another way. Museums began to use collection meta-data as a new way of exploring the shape of their collections. While an individual collection record might tell the story of a single object in the collection, the total set of data becomes useful in telling the story of the museum’s collection as a whole.

One of the most popular analytics tools for exploring collection data is the museum’s collections website. Museums the world over have gone through great lengths to put their collections online with the usual purpose being to allow easier access for its visitors. The end result is an extensive tool, similar to Google Analytics, which allows anyone to learn about the shape and purpose of a collection by dividing it up into browsable and searchable parts. At CHSDM, this idea of browsability and discovery is at the forefront of the work, often times leading the creative direction across the entire visitor experience.

A System of Parts

CHSDM has developed a complex system of parts and services that make the entire museum experience possible. Nearly every aspect of this complex system generates, collects and stores some kind of data.

The web servers powering www.cooperhewitt.org create log messages for every request, registrars record entries in TMS for every movement of an object, ticketing systems record information about visitors and the tickets they purchase.

The Pen generates data each time a visitor receives one, uses it to collect an object, uses it to create a digital design, uses it to browse their collection at the digital tables and finally when they return it to the visitor services staff. Each system stores this data for safe keeping, and generates log messages to record the events that happen as they unfold. This data is all stored on a long list of servers and systems, each with their own complex configurations and formats.

Application Programming Interfaces

The nexus of all of the systems involved in the experience at CHSDM is its Application Programming Interfaces (API). These APIs are the glue that holds the entire system together and allows for each service and system to communicate with one another. Tap the Pen to a table, and the data on the Pen is transmitted through that API to a database. Every time that interaction happens, log messages are created and stored.

An API is a set of procedures within an application that can be accessed from another application. One can think of an API like an old telephone operator. The phone is used to send instructions, the operator receives these instructions, and as long as the instructions sent are readable, the operator carries out the call, connecting the caller to whomever they were trying to reach.

In the case of the Pen, CHSDM has developed several APIs that allow all of the necessary components to communicate, so that when a visitor taps their pen at one of the digital tables, the call gets through to its expected destination, and the visitor is instantly connected with the right data on the other end.

Every time this interaction happens, the API makes a record of it in a database. Every time a visitor docks their pen at one of the digital tables, there is a log recorded. Each time an object label is tapped with a Pen and the Pen is read by one of the reader boards, another API method is called, more data is stored, and a log recording this event is saved in a database.

CHSDM now has several APIs. There is an API that deals with the purchasing of tickets at the visitor experience desk and online. Once a ticket has been purchased, the visitor experience staff “pairs” your ticket with a Pen. This act is critical, as it makes sure our systems know which Pen the visitor was using. The pairing procedure calls both the ticketing API and the Pen API, and connects these two sets of data together by storing their respective IDs next to each other in both databases.

Foreign Keys

CHSDM now have two separate databases (one having to do with tickets, and one having to do with the Pen) in two separate systems. The only connection between these two systems is the ID of the ticket, and the ID of the Pen.

The data is stored in completely different physical locations, and with completely different underlying systems (MySQL vs. MSSQL) but the connection between the two is there, and this means CHSDM can theoretically connect the two data-sets together if it wants to.

In databases, this is called a Foreign Key, and it is one of the most common methods for creating relationships among data both inside the same database and between completely separate databases.

Log Files

Mentioned previously, the systems and services at CHDSM create millions of log messages. These messages are written as files on disks and are generated for a wide variety of reasons including the tracking of events happening with the Pen.

As an example, Diagram 01 illustrates a typical log message created when a visitor returns their Pen at the end of their visit.

Diagram — 01

The first message, which is collapsed in Diagram 01, is the “pen-return” log, indicating that the Pen was returned, and all the details about its safe return. The second message, also illustrated in Diagram 01, happened just before the “pen-return” and is an “item-collect” event. This refers to an object that the Pen was used to collect. In the boxes below, most of the details (some blurred for privacy reasons) about the item that this Pen was used to collect are visible. The system knows which object it collected (18796987) what time the object was collected, and which visit it has to do with (also blurred). These log messages are stored on a server within CHSDMs data infrastructure, but as illustrated here, CHSDM has developed a simple interface so staff can view these logs at anytime from an administration area on one of its websites.

At the digital tables, a Near Field Communication (NFC) reader board, downloads all the data off the visitor’s Pen. This data is then formatted and sent to the Pen API which processes the data and stores it before responding by displaying the items that the visitor collected on the screen in front of them. This operation invokes several log messages, recording every step in the process. If something were to go wrong during one of these steps, CHSDM staff would be able to see an error message like the one in Diagram 02, which would allow CHSDM staff to diagnose the problem with the underlying system.

In this case, the API responded with an “invalid user” error. This was probably a configuration problem with one of the underlying systems. A log like this can be used to diagnose issues like this one, and the ability for CHSDM staff to look at this data, from within an administration panel makes it possible for CHSDM staff to quickly resolve these kinds of problems.

Diagram — 02

Logstash

CHSDM have begun to develop millions of log files, stored on a multitude of systems. Log files are typically are stored on the systems that generate them, and CHSDM have many of these systems. Log files are stored on server instances within Amazon Web Services cloud, our data center in Herndon, Vriginia, and locally on PCs and Raspberry Pi computers in numerous locations throughout the museum.

In order to better deal with all of these logs, CHSDM have employed an open source system called Logstash. Logstash works a conduit which log messages travel through. As systems generate log files, they are sent to Logstash which formats and processes in to a variety of outputs.

Logstash can be highly customized, allowing CHSDM to store copies of its logs in multiple locations and in multiple formats. At CHSDM logs get processed into text files. Additionally, each log message is formatted into JSON which is then inserted into an ElasticSearch index for ease of access and search-ability. This ElasticSearch index is used to create the administration pages illustrated in Diagram 01 and Diagram 02. These administration pages, leveraging the data generated by the logs, have become one of the main diagnostic and analytics tools for the entire system at CHSDM.

Diagram — 03

Logstash comes packaged with an application known as Kibana. (Illustrated in the Diagram 03). Kibana is an analytics tool which allows one to search through data stored in an ElasticSearch index. At CHSDM, Kibana has proven to be a very useful tool. It is easy to set up, and a great way to develop ideas around how one might use log data to gain insights.

However, CHSDM eventually found Kibana a little limited in what it could do. It also became troublesome to expose Kibana on the public Internet, allowing CHSDM staff to log in from anywhere to investigate a problem. So that CHSDM staff could take control over how it analyzes and accesses its logs, CHSDM staff decided to begin building administration pages that would allow one to view the data stored in its Logstash generated, ElasticSearch dataset.

Below is a series of diagrams of example log pages. CHSDM have built in graphic functions to provide a simple data visualization of what has been happening over time. CHSDM staff plan to expand the facility of these administration pages in future iterations.

Administration Diagrams

Diagram — 04

Pictured in Diagram 04 are two graphs representing “visits.” The top graph is “visits by day” where the green line represents the past 28 days, and the blue represents the 28 days prior. Pictured below are visits by hour. With this simple diagnostic tool CHSDM staff can easily see which days and times are the most active in the galleries.

Diagram — 05

Pictured in Diagram 05 is a series of useful administration charts. In these charts CHSDM staff can see data having to do with objects being collected each day. Staff members can see in the top graph, objects collected and created overall by day. Below, data is broken out into “collected with the Pen” and “collected at the tables.”

Diagram — 06

Pictured in Diagram 06, CHSDM staff are able to look at objects collected by hour. In this example staff can see that on January 16th, at 20:00 UTC (3pm in NYC) visitors used their Pens to collect and save 4772 objects. Studying the diagram closely, staff can see that the next day, on the 17th at 19:00 UTC (2pm in NYC) the museum had its daily peak. It is possible that this was the busiest time during these two days, explaining the peak, but it allows CHSDM staff to start asking questions. For example, why is this the busiest time? Do these peaks correspond to ticket sales? As illustrated here, CHSDM log files are starting to give its staff insight into its visitor’s behavior.

Monitoring & Notifications

Log messages allow CHSDM staff to know when something has gone wrong. In the figures above, CHSDM staff can easily tell when things are working properly. Just as easily, CHSDM staff can start to notice when things are not working the way they should.

CHSDM staff use numerous services and tools to find out about problems quickly, and respond to them (and hopefully fix the problem) as soon as possible. Whenever something fails, it usually means visitors are stuck waiting, becoming frustrated and in general, not having a great experience. These are some of the tools that help CHSDM staff alleviate those frustrations.

  • Supervisord — Monitors many of CHSDM services. If something has stopped, it tries to start it back up again.
  • Watchdog — If the service can’t be restarted by Suporvisord, watchdog sends messages. These messages take the form of logs, but also alerts that wind up in CHSDM Slack channels.
  • New Relic — This service monitors the health of CHSDM applications and servers. CHSDM staff can log into a dashboard and inspect all the data having to do with nearly all components of CHSDM systems. If a server is straining to keep up with its load, NewRelic sends an alert to Slack so CHSDM staff are alerted to the problem.
  • Pingdom — This service continually checks if a website is working or not. If it’s not, it send CHSDM staff a text message and posts a message to Slack.
  • Slack — Slack is a tool CHSDM staff are using for internal communications. Additionally, Slack has become a hub for CHSDM staff alerting by many of the systems listed above.

Re-play & Failing Gracefully

Another incredibly useful aspect of logging data has to do with the concept of replay and graceful failure. In a complex system with many moving parts, it’s only a matter of time before something breaks. Typically it’s one small piece of the larger complex system. It would be a shame that one single issue would result in CHSDM visitors not being able to have an enjoyable experience, so systems are designed to do their best to “fail gracefully.” This means that if one small part of the system becomes temporarily unavailable, the rest of the parts can continue to operate. Since each system produces log messages for every action it performs, the part that was temporarily unavailable can usually be “replayed” once it has recovered.

The following example will be used to illustrate this concept.

A visitor arrives at the museum and purchases a ticket. This invokes a series of requests to a collection of different services. First the ticket is created using a Constituent Relationship Management (CRM) system known as Tessitura. Once the ticket has been purchased and printed, it is scanned, and tested to ensure it is valid. Then the ticket is “paired” with a Pen which is handed to the visitor for the duration of their visit. If during this process there is a failure with the system that pairs the visitor’s ticket with their Pen, a problem would occur that would normally not allow the visitor services staff to issue the Pen. This could potentially cause a backup at the visitor services desk, resulting in many frustrated customers.

Instead of issuing an error, the system simply writes the log message to disk, and allows the system to carry on as if everything has worked. Later, when the Pen pairing API is back online, the system can go through these log messages and “re-play” the events that failed to work, ultimately presenting the visitor with a seamless experience.

However, this scenario doesn’t always work as planned. For example, in this scenario, if the visitor’s Pen was not successfully paired with their ticket and they went to visit their personal website, they wouldn’t see any of the things they’d collected during their visit. The experience they are expecting will eventually become available to them, ideally before they check, at some point in the future.

Diagram — 07

Diagram 07 illustrates what this might mean. In this example once can see that a Pen was used to collect object 18383769. If the visitor got home and didn’t see this object on their website as expected, they would hopefully send CHSDM an email explaining their issue. Because CHSDM have a log of this event taking place, CHSDM staff are able to press the “Re-Play This Activity” button, which would re-process this event, thus allowing the visitor to see the expected result.

Without this type of intensive logging, CHSDM wouldn’t be able to recreate the experience. In this instance CHSDM have the potential for a visitor to become temporarily frustrated, but have also ensured that it can eventually alleviate this frustration by replaying the events and producing the expected outcome.

Constituent Relationship Management

Within the Smithsonian Data Center is housed CHSDM’s Constituent Relationship Management System, Tessitura. Tessitura is an enterprise-class Microsoft SQL Server based database. Client software is installed on staff member’s desks and at the visitor services stations at the front desk at CHSDM.

The main purpose of Tessitura is to store data about each and every visitor to CHSDM. This data takes the form of address and contact information, memberships and subscriptions purchased, and tickets purchased for general admission to the museum.

Tessitura is an incredible source of information about CHSDM’s constituents, and should be treated with white gloves, making sure that CHSDM and Smithsonian follow strict protocols. It’s critical that CHSDM not only ensure the safety and privacy of its visitor’s data, but that it also maintains the privacy agreement between CHSDM and our visitors.

Since Tessitura stores Personal Identifiable Information (PII) such as a visitor’s address and phone number, this system is kept in a separate “zone” within the Smithsonian Network where every aspect of the network and servers within this zone are scrutinized and kept in compliance with Payment Card Industry (PCI) standards.

Major aspects of CHSDM data are stored in entirely different physical locations, using entirely different system architectures.

All the data and logs about visitor’s Pen activity are stored in the Amazon Web Services (AWS) cloud. This data is stored in MySQL databases, along with log files, which are stored in multiple locations and eventually processed with Logstash, which makes them neatly available to CHSDM in an ElasticSearch index, also built on AWS infrastructure.

PII data generated by Tessitura is stored in the PCI compliant zone of the Smithsonian Data Center. This data is mostly stored in a Microsoft SQL Server database.

How can CHSDM begin to make sense of all of this data if it’s in such a wide variety of forms, and living in such different physical locations? How can the museum connect activity with the Pen with its information about the visitors who are using the Pens? How can CHSDM develop an understanding about visitors who go to their websites after their visit, when that data is stored in Google Analytics and the Pen data and ticketing data is stored elsewhere? How much easier would all of this be if the data were stored all together and in one single format, allowing CHSDM to look at it all at once?

Building the Warehouse

Data warehousing is not a new concept. The idea of gathering together data-sets into a single place, using a single architecture, is pretty common. However, what is new, is the ability to build a data warehouse completely from scratch, scalable at anytime, using cloud based services, with the push of an administration console button.

For CHSDM, the staff chose to experiment with Amazon’s Redshift as its data warehouse.

“Amazon Redshift a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools.”

In practical terms, Redshift is a fork of Postgresql, a powerful, and common open source relational database system. However, Redshift turns things around a bit by re-designing the underlying architecture of Postgresql so that it acts as a columnar data-store. This means that one can use any Postgresql client (such as psql) to connect to a Redshift server.

Starting up a Redshift cluster takes all of five minutes. One simply needs to log into their AWS console, navigate to the Redshift panel and select from a few options. Like most of the offerings from AWS, once can easily scale their Redshift cluster later if they discover the need more resources. Setting up and connecting to Redshift is the easy part.

Data Transfer and Ingestion

CHSDM experienced difficulty in getting all of its data into Redhsift, getting it there safely and securely and getting it there periodically. Amazon offers a number of tools and tutorials to make this part easier, but this has been one of the biggest hurdles to overcome with using this type of system so far.

The steps are:

  1. Identify the data source
  2. Map the data source to a Redshift compatible schema
  3. Export the data to a CSV file
  4. Upload the CSV file to Amazon S3
  5. Use the Redshift COPY command to import the CSV from S3
Diagram — 08

There is a good deal of engineering to be done around the five steps listed above. Mapping data from multiple databases and log file systems is complex and requires a lot of time and typing. To automate the process defined above CHSDM staff have developed simple shell scripts that can be run on a nightly basis. These each connect to read-replica’s of live data, which mean that there isn’t an issue with the exports disturbing the live environments.

Pictured in Diagram 08 is an early prototype of a shell script for doing the steps listed above. A few details are blurred out for privacy reasons.

CHSDM staff have been experimenting with one-directional hashing of values such as user ids and personal information using bcrypt and other cryptographic techniques. These techniques work but they can be resource intensive.

Getting all the above setup is tedious and is still largely a work in progress at CHSDM.

Visualizing the Warehouse

Immediately, one can query Redshift using a simple Postegresql compliant client like psql. psql lets you connect to a Redshift cluster and query the data using standard SQL. The results are sent back in text form. It’s typically the first tool one might use to test a connection and start building basic queries.

However, it’s pretty easy to see that writing SQL on the command line and making sense of results from Redshift can be a little bit of a hassle. Additionally, while it’s pretty easy for a developer or data analyst to log into Redshift via the psql client, it’s pretty much impossible to imagine any other staff member doing the same. The next step in building a data warehouse is to attach some kind of analytics tool on top of it that makes the retrieval of data, generation and sharing of reports much easier.

There are a number of off-the-shelf solutions for working with data in a warehouse like Redshift. CHSDM have taken a close look at many of them.

  1. Periscope is a powerful set of business intelligence tools that can connect to a wide variety of data sources including Redshift. It allows you to build SQL based queries through a web interface and chart results using a series of customizable charts types. The reports can be shared with colleagues and easily updated to reflect live data.
  2. Chart.io is very similar to Persicope.
  3. ModeAnalytics.com is also very similar to Periscope and Chart.io but a little more simplistic and much less expensive. After a trial period CHSDM found Mode to do most of what they wanted to accomplish. The main feature CHSDM were interested in, being able to design an SQL statement and then export the results to Excel, seemed to work perfectly out of the box.

Below is a series of diagrams illustrating some of the reports CHSDM have generated using Mode, connected to its RedShift cluster on AWS. These are mainly first efforts and prototypes, but they illustrate the scenario CHSDM is trying to achieve.

Diagram — 09

Popular Objects — One of the first thoughts CHSDM had was to report on the most popular objects collected with the Pen. “Popular Objects” quickly became the topic of many meetings, and it is now available via a public statistics page.

Pictured in the Diagram 09 is how Mode can simply generate a table of data. This data can easily be exported to a CSV file or Excel spreadsheet.

Below, pictured in Diagram 10, is the SQL used to generate this report. It’s pretty straightforward, but the main point is the INNER JOIN on collection_objects and “q” which in this case refers to collection_visits_items. Within our current topology, these two tables are stored in different databases, making a simple join like this impossible.

Diagram — 10

Collected Objects By Department — Here, CHSDM staff have grouped collected objects by Department as seen in the Diagram 11. Since the warehouse joined this data with collection data, it’s possible to read the full text description of each department within the report.

Diagram — 11

Collected Objects By Country — Here is another report based on the same data and joined with the collection data. In this case CHSDM are able to group the collected objects by their country of origin. This type of report can be revealing, and helps CHSDM staff to see how its visitor behavior, combined with its curatorial voice start to play out and display bias towards one thing or another.

Diagram — 12

Below is the simple SQL required to make this query in Redshift via Mode.

Diagram — 13

Digital Creations — Lastly, CHSDM have created a report having to do with Digital Creations, or the things its visitors have created using the interactive tables. This report only uses the data stored in the Pen database but one can see how CHSDM used simple SQL to modify the report right in Mode so it has labels on the axis and skips unnecessary items.

Diagram — 14
Diagram — 15
Diagram — 16

Finally, the following diagrams (Diagrams 17 and 18) illustrate some of the benefits of using a product like Mode. Here one can see it is possible to easily share reports with other staff members and create a “portfolio” of reports for easy access and sharing between staff.

Diagram — 17
Diagram — 18

Further Experimentation

So far, Mode has proven very effective as a tool for visualizing the data stored in the CHSDM Redshift data warehouse. It has plenty of simple to use tools, and if anything, it makes it really easy to export data to a CSV or to Excel. Mode has proven useful as a prototyping and exploration tool. It’s easy to try out new ideas for queries, and see the results in a graph right away. This type of playful experimentation is what usually leads to discovery of more and more insight into the data at CHSDM.

There are of course many other tools CHSDM would like to explore. Microsoft PowerBI seems to offer a variety of similar tools and utilities, but CHSDM have yet to figure out how to connect up PowerBI with its Redshift warehouse. Tessitura’s T-Stats is in itself, essentially a data warehouse. It seems a little like swimming upstream to try and imagine a method of getting all of CHSDMs external data into Tessitura, but the one upside would be that it is already stored in Smithsonian’s PCI compliant zone.

Mainly, CHSDM staff are really interested in using tools like Mode to prototype and develop queries quickly. Ultimately CHSDM would like to develop code within the same administration areas of its collections website mentioned in the beginning of this report. The idea here would be that CHSDM would have all of its diagnostics and analytics tools and data available in one convenient place. CHSDM staff would be able to have easy access to these administration areas and could carefully control their design and output. So far CHSDM staff have developed simple connection code between its admin areas and Redshift and a few simple report pages, but this work is in its infancy.

Conclusions

Data warehousing can be an effective method for building analytics within an institution. The concept of collecting ALL types of data and putting it in a single place where the institution can have easy and secure access to it seems like the way forward.

Migrating data in and out of the warehouse can be time consuming and inaccurate. Steps need to be taken to ensure the data is mapped between its original source and the warehouse in a way that ensures this accuracy. As well, data sent to the warehouse needs to be securely transfered and stored. The periodic nature of these types of data transfers means that data within the warehouse will always lag behind the real-time data being generated on a daily basis.

Once the data is in the warehouse, its can be easily manipulated, visualized and used to express the answers to many questions. With the warehouse in place, and analytics tools attached, it is possible to prototype visualizations of vast amounts and types of data. This scenario ca have a positive impact on museum staff interested in developing questions around visitor behavior.

CHSDM staff are embarking on a year long study of its visitors and behavior related to the Pen and interactive experiences within the museum. Using a data warehouse to store results from these types of survey should allow CHSDM staff to better analyze results, pairing the raw data from the study with Pen data, ticketing data and more, eventually providing a clearer picture of the visitor experience from start to finish.

References

  1. Cooper Hewitt — New Experience ( http://www.cooperhewitt.org/new-experience/ )
  2. Data Warehousing — Wikipedia ( https://en.wikipedia.org/wiki/Data_warehouse )
  3. The API at the center of the museum — Cooper Hewitt Labs, Seb Chan ( http://labs.cooperhewitt.org/2014/the-api-at-the-center-of-the-museum/ )
  4. Foreign Key — Wikipedia ( https://en.wikipedia.org/wiki/Foreign_key )
  5. Near Field Communication — Wikipedia ( https://en.wikipedia.org/wiki/Near_field_communication )
  6. Payment Card Industry standard — Wikipedia ( https://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard )
  7. bcrypt — Wikipedia ( https://en.wikipedia.org/wiki/Bcrypt )

Related

Want to stay up to date with our posts? Sign up for our mailing list below.

Art, technology, museums, design, work, photography, and writing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store