By Claudia Neuhauser
Back in the days when index cards and laboratory notebooks were the primary places to record data, data acquisition was much more deliberate and slow. Hours in the library or at the bench yielded small data sets that became immensely valuable to the researcher who acquired them. Decisions about what to keep or discard were made at the time of acquisition.
Today, high throughput technologies, sensors, digitization of large collections of literary and artistic works have made data “big.” Inexpensive data storage solutions no longer require careful considerations of what to keep and what to discard. Data are no longer solely collected to address a specific hypothesis but are increasingly reused and integrated to explore new hypotheses. Novel analysis and visualization tools open new paths of inquiry.
To respond to the needs emerging from this data-rich environment and to increase informatics capabilities, OVPR established the University of Minnesota Informatics Institute (UMII) in 2014. UMII’s mission is to foster and accelerate data-intensive research across the university system in fields as diverse as agriculture, arts, design, engineering, environment, health, humanities and social sciences. Throughout the past year, UMII has built up capacity in data analysis, established a number of competitive grants programs, and brought people together within the university and with outside organization. UMII was also involved in writing a new policy on research data management that clarifies responsibilities around all aspects of data management across the university. Highlights of the policy can be found here.
In my many conversations with faculty, staff and administrators across the university system, I have learned about the difficulties engaging with complex data sets and navigating the many services the university offers to manage (big and small) data across the data life cycle. Examples include writing a data management and dissemination plan for grants, deciding among the at times overwhelming number of software packages for data analysis, or determining the most appropriate data storage solution.
To address the broad range of data challenges, UMII works with a very talented team from Design Thinking @ College of Design led by Virajita Singh. Design has much to offer to identify and address the service gaps that our researchers experience. Designers look at services through the eyes of the users. While each of the service providers focuses on their business, users experience services in their entirety and become painfully aware of gaps between service providers.
For our first joint project, we chose to look at the services connected with genomics and proteomics data. The data are primarily generated in two facilities on campus, the University of Minnesota Genomics Center and the Center for Mass Spectrometry and Proteomics. The University of Minnesota Genomics Center sequences genetic material from many different kinds of organisms, including humans, to answer questions such as which genes contribute to a specific disease or which bacteria are present in a soil sample. The Center for Mass Spectrometry and Proteomics has instrumentation to identify molecules that could be used, for instance, as biological markers for diseases in diagnostic test or as natural preservatives to increase shelf life of products.
Both genomics and proteomics data share similar analysis and data storage needs across research groups. In the fall, we invited thirty-three participants working in these areas across seventeen university units, including many of the service providers, to meet for a four-hour design thinking workshop. The group addressed the design challenge of how to create a service model that would link the various service providers to make the hand-off of data among them seamless and transparent, and would help the users navigate the many services more easily. The workshop also identified other needs, such as providing basic analysis as a service, bringing people together to share practices across data management, recommending analysis tools, and finding sustainable storage solutions.
UMII is starting to address these needs. For example, UMII hired analysts for genomics, proteomics and imaging to provide basic and intermediate level analysis to researchers. The analysts are located in university facilities where these kinds of data are produced to be right at the intersection of where the data are generated and handed off to the researchers. The UMII analysts currently process the raw data, perform quality control, and some of the initial analysis that help our researchers deal with the enormous amounts of data that the facilities produce. UMII also hired a “data wrangler” to help researchers to find solutions for their data management needs.
UMII recognized that it could help by taking on some of the routine parts of the data analysis, thus, in essence, turning some of the informatics tasks into a commodity. Through this commoditization of informatics analysis, UMII is taking over tasks that can be more efficiently delivered to users as a service, which frees up time for research and innovation.
UMII and its partners are now in the process of developing the design challenge for the next design thinking workshop on complex data sets that come up in many of the MnDRIVE projects. These data sets are typically spatial and temporal and originate from different sources. The workshop will explore the needs around developing a common platform to aggregate and analyze the data and how to sustain the various web and mobile applications that are outcomes of the MnDRIVE projects.
To learn more about UMII’s projects and services visit our website.
Claudia Neuhauser, director of the University of Minnesota Informatics Institute, is a Distinguished McKnight University Professor, Howard Hughes Medical Institute Professor and Director of Graduate Studies for Biomedical Informatics and Computational Biology. Her current research is in the area of bioinformatics and computational biology where she is developing statistical tools for genomics and other biological data.