Tides of data reveal insights about the nature of life through bioinformatics. Technological advancements seem to exponentially increase our ability to generate, store and sort information. Terms like petabytes sound immense (and they are: one-million gigabytes, or 1,000,000,000,000,000 bytes), but they only scratch the surface of the data humans generate each year.
At Baylor, the data sciences were chosen as one of five signature initiatives of Illuminate, the University’s strategic plan that is propelling Baylor toward Research 1/Tier 1 recognition. It is an interdisciplinary, rapidly growing field that can be found in most departments, both expected — computer science, business, statistics — and in those that might initially seem less than a natural fit.
“We all rely on data,” Dr. Stacie Petter, professor of information systems in the Hankamer School of Business and chair of Baylor’s Illuminate Data Sciences committee, says. “We’ve always had data, but now we’re in a world where there’s more — substantially more. With that comes opportunities and questions that Baylor can address. It’s a big topic that each discipline views through a different lens, so it’s important that we all come together — from information systems to ethics and more.”
At the heart of it all, and what elevated the data sciences as a Baylor priority, is a desire to solve problems using the latest tools available. It’s about deciphering how data can be used for the common good and recognizing potential pitfalls along the way.
Meet a few of the Baylor professors using data to address societal challenges and training a new generation to do the same.
Dr. Erich Baker, chair of School of Engineering and Computer Sciences’ Department of Computer Science and professor of bioinformatics, sees Baylor’s investment in the data sciences through Illuminate not merely as an important part of the University’s research aspirations, but an essential element in Baylor’s mission of building leaders for worldwide service.
“I’m very excited to see data science in Illuminate, and not just in my field of bioinformatics,” Baker says. “As I see the trends of bioinformatics, I look across the disciplines and see those trends everywhere else. It’s a necessary step to prepare our students for the digital age, to send them out into the world with the skill sets needed to be competitive with their peers. If you track every discipline represented at Baylor and the academy more broadly, data science covers every one of them.”
When Baker arrived at Baylor in 2002, the University was attractive for its early adoption of bioinformatics, or as he describes, “the information of biology.” As the second undergraduate major of its kind in American universities when it was established in 1999, Baylor bioinformatics advanced partnerships between Baylor’s life sciences in the College of Arts and Sciences and the School of Engineering and Computer Science (ECS). Those connections foreshadowed interdisciplinary work now taking place across campus thanks to technological advancements over the last 20 years.
To help others envision what that looks like, Baker describes a Venn diagram in his field.
“Imagine three areas, all of which have grown at Baylor over the last 20 years: computer science, statistics and math,” Baker says. “Where those three things integrate comprises data science in general. In bioinformatics, we specialize those domain fields into biology or things relevant to the life science.”
As bioinformatics specializes in the domain fields of biology and life science, other departments discover where the intersection of technology and scholarship best comes together for them. Similarly, the technological advancements that have boosted bioinformatics have changed the world and impacted every discipline on campus: cloud computing, data scalability, machine learning, affordability and more. Those combined factors have led to a revolution in the amounts of data people produce, and the amounts of data people can collect and analyze. In many circles, it’s referred to as “big data,” which Baker describes as “about a petabyte, in that range scale. Where many people may think ‘big data’ is bigger than their hard drive, it’s substantially larger than that.”
This is the world in which students will enter: cloud computing has untethered data from physical hard drives and rapidly multiplied what organizations can generate and store. Machine learning has enabled everyone from business and healthcare to the traffic app on a mobile phone to use existing data to identify patterns and predict future outcomes. And, while data has expanded, Baker says the cost of computing has, in many cases, contracted.
“My particular interests are in functional genomics and heterogenous data integration. When I came here, the cost of sequencing a genome was around $100,000. Today, we’re down to $1,000.”
To accelerate data science scholarship, an endowed chair position has been established in the data sciences through a transformative gift by Baylor benefactors Mark, BBA ’80 and Jenni, BSEd ’80, McCollum. A $3.5 million gift, made through Give Light, Baylor’s comprehensive philanthropy campaign, will establish The McCollum Family Chair in Data Sciences within Baylor ECS. The gift will help focus work in the complementary fields of bioinformatics, cybersecurity and business analytics.
“I can’t express enough how important it is, with these advancements, to have highly trained professionals thinking about where the data sits and how we access it. How do we store data? How do we integrate it? What standards can we use? What problems can be solved? Finding interesting cross-disciplinary connections will be very exciting here at Baylor as we look ahead.”
When the U.S. Department of Energy chose a multi-disciplinary team of scientists and industry professionals for a competitive $100 million grant aimed at transforming the U.S. water system, it further embedded Dr. Amanda Hering, BS ’99, into the mission of finding new methods of water delivery. Her presence on the team, known as the National Alliance for Water Innovation (NAWI), bolsters an impressive roster charged with an audacious goal: to create a circular water economy in which 90 percent of non-traditional water sources are cost-competitive within a decade through desalination.
Hering’s role is no surprise to those within the discipline. The winner of the Abdul El-Shaarawi Early Investigator Award from The International Environmetrics Society, she has built a reputation as a leading scholar focused on using her training to find solutions to a fundamental issue.
“Over 75 percent of the world experiences water scarcity at least one month out of the year,” Hering, associate professor of statistical science, says. “I don’t think you can get a more meaningful societal question than, ‘What are we going to drink today?’”
Hering, a mathematics major at Baylor, has helped grow her alma mater’s statistical research portfolio while building her longtime work at the intersection of statistics and the environment. She joined the Baylor faculty in 2016 after seven years as a full-time faculty member at the Colorado School of Mines. There, she began research into decentralized water treatment plants.
“This was not initially my area of expertise,” Hering says. “But as I was drawn in, what I’ve realized is there hasn’t been as much work done as one would think on using data-driven methods to improve operation and the water in wastewater treatment plants.”
Decentralized plants represent a different paradigm than traditional municipal water infrastructure. Smaller and closer to wastewater sources in rural and outlying communities, such plants are in the exploratory phase. A major potential strength of decentralized plants is, in the moment, also a potential obstacle to be overcome. Without a human operator on site, they require a system capable of identifying problems when they occur to protect the people they serve. With an array of potential problems — from oxygen sensor faults to failures in the membrane that separates solids in the water from further travel — a simple equipment error could lead to long shutdowns, with rivers of data to sort through. Hering and her co-researchers have developed a statistical method to diagnose the fault and help isolate the troublesome variables to get decentralized plants operational more quickly.
Building on that work, Hering’s role in the National Alliance for Water Innovation focuses on creating autonomous processes in water control systems. As the University’s site director on the team, she will lead Baylor’s contributions to the Alliance, a public-private partnership comprised of 18 leading universities and multiple labs. The five-year, $100-million grant will begin with “roadmapping exercises” in which the team will continue their work in identifying key water questions.
“There aren’t many government organizations that fund purely water research,” Hering says. “This is a once-in-a-generation effort to improve water treatment in the U.S. And if we improve it here, it will impact other countries as well through technological improvements.”
As she focuses on water issues, Hering maintains an intense interest in creating opportunities for students. Last year, she was awarded a $1.2 million grant from the National Science Foundation (NSF) to enhance data sciences education through the creation of a data sciences curriculum.
“Often, students won’t take their first real statistics course until their junior year,” Hering says. “The NSF wants to bring more data science curriculum, training and research opportunities to the early undergraduate education stage.”
The Data Science Corps at Baylor, titled Modernizing Water and Wastewater Treatment through Data Science Education and Research, is a partnership between statistical and computer sciences and is funded by the grant. It features an inaugural spring class called Introduction to Data Science. Open to all students, Hering says the NSF values the curriculum’s impact for students, with a data-centric, inquiry driven educational model. During the summer, a paid, five-week research experience for undergraduates will include labs and travel opportunities to wastewater plants. The NSF’s goal is to “introduce the principles of data science in with motivating applications” and to build a model to advance the field while enabling students to learn data science as a tool for solutions to societal problems.
Societal challenges are often felt most meaningfully at the local level. For an institution like Baylor, addressing those issues can start with neighbors close to campus.
Dr. Kelly Ylitalo, BA ’04, assistant professor in the department of public health in the Robbins College of Health and Human Sciences, studies the link between physical functioning and aging close to home. A rising star researcher who was awarded a $626,000 Mentored Research Scientist Award (K01) from the National Institutes of Health (NIH) in 2018, Ylitalo and her students can regularly be found at the Waco Family Health Center.
“We know that mobility can decline throughout the aging process, so I became interested in physical activity as a way to facilitate healthy aging and functioning,” Ylitalo says. “My partnership with the Waco Family Health Center allowed me to get involved in some of their prevention activities.”
For the community-minded practitioners at the Waco Family Health Center, broader societal questions always come back to their patients. For two of their more novel approaches to aging and health, the question is simple: are they working? The Health Center boasts a wellness center which provides exercise equipment for community use. Patients at the Waco Family Health Center may find they are prescribed, not medicine, but exercise at the facility. Or, they may be aided in efforts to make healthier eating choices through a prescription to the Veggie Van, a collaboration between the Health Center and Waco’s World Hunger Relief Farm. Ylitalo uses data science to help determine the programs’ effectiveness.
To measure the program’s impact on health, Ylitalo and her team use a data collection app which eliminates the need to enter data manually. The app, developed by Dr. Matthew Fendt, Baylor lecturer in computer science, measures blood pressure variability and dovetails with information collected through medical records, quantitative surveys and qualitative focus groups — data that would otherwise have been gathered source by source.
“We’re working with thousands and thousands of rows of data from different sources to make the best use of the resources we have available,” Ylitalo says. “Family Health Center wants to make the best use of programs like the wellness center and vegetable prescriptions. As an epidemiologist, I’m interested in what’s going on in an entire population. Together, we look at what’s taking place as a whole, using the data from individual users, so together we can improve health in the community. There is a face behind every number.”
The unabated growth of data sciences feels thoroughly modern. Yet scholars whose work relies on understandings of the past also find deep value in their application. Humanities scholars at Baylor and throughout the academy more broadly are recognizing the opportunities provided in the Digital Humanities — where technological advancements meet the study of what it means to be human. In so doing, professors in such disciplines as history, religion or English are finding their ability to analyze available materials enhanced far beyond normal human capacity.
Baylor’s commitment to the digital humanities can be found in programs in University Libraries (UL), where humanities faculty are introduced to methods that enhance their own research and advance Baylor’s R1/Tier 1 research portfolio. Last summer, 10 Baylor faculty members participated in the 2019 Fundamentals of Data Research Fellows program, a joint venture between the College of Arts and Sciences and University Libraries. Each faculty fellow applied methods like data mining and data visualization to enhance their research interests.
“University Libraries, like many colleges and schools on campus, are trying to support Illuminate as much as possible,” Joshua Been, UL director of data and digital scholarship and assistant librarian, says. “My role is trying to spearhead the library’s promotion of data sciences on campus.”
High-powered computing within the library provides a tool for Been to assist faculty like Dr. Stephen Reid, professor of Christian Scriptures in George W. Truett Theological Seminary, in accelerating his research. Reid, a Fundamentals of Data research fellow, used a technique called text mining to analyze hundreds of narratives written by freed slaves to determine their Biblical interpretations for inclusion in an essay on African-Americans and the book of Deuteronomy in the Oxford Handbook on Deuteronomy.
“For this project, I wanted not only modern scholarly opinions but to go back and find out what average writers and readers of Deuteronomy would be looking at,” Reid says. “The best place to find that would be in the so-called ‘slave narratives,’ more recently called ‘freedom narratives’ because they detail the stories of how former slaves were freed. The University of North Carolina has digitized hundreds of them from the 1700s and 1800s. In working with Josh Been, we were able to run around 500 of these digitized narratives through a computer program he designed to mine references to Deuteronomy.”
Been’s program reduced the 500 narratives to 160 referencing Deuteronomy. From there, Reid spent several weeks analyzing the remaining 160 — a process he would not have had the time to do without data sciences.
“Mining the texts narrows the data down to a manageable set,” Reid says.
Reid’s use of the data illustrates the necessity of discipline-specific expertise along with the computer program. The program narrowed down the amount of data through which he sifted, saving hundreds of hours. Armed with the more specific data, he probed connections and studied usage to determine which sources, for example, actually quoted Deuteronomy 6:4-9 — best known for the admonition to “love the Lord your God with all your heart, with all your mind and with all your spirit” — and which referenced Jesus’ quoting it in the New Testament. He then chose the connections that most advance knowledge in his field of Christian scriptures to share in the Oxford Handbook.
“What’s interesting is, I’ve written a section on Deuteronomy before and picked what I thought were the central texts,” Reid says. “None of those are actually the central texts in the freedom narratives. I thought Deuteronomy 26 would be a central text. It doesn’t show up. Data mining allows us to look at what readers of a different age saw as key, unbiased by what we think of as key texts with 21st century knowledge. This allows us to highlight people who would have otherwise been invisible as Biblical scholars; now they are visible as Biblical interpreters because we can see and compare how they use the text.”
Reid’s next project will have a significantly larger scope than his first foray into the data sciences: African American interpretations of the Psalms.
As the tools available for data collection continue to grow, Baylor has a vital opportunity to speak into the changing world — how do we wield these tools constructively? Ethical questions abound — issues of privacy, cyber security, accessibility, the commoditization of humans as ‘walking data generators’ and more prompt important discussions about data stewardship.
“There is a concern,” Petter says. “We’re using these massive amounts of data, but are we getting better personal outcomes, organizational outcomes or societal outcomes through our analysis? At Baylor, we have to bridge disciplines and contribute insights from the lenses through which we view the world to address pivotal issues, on the foundation of our Christian mission, to build good stewards of the resources we have.”
In addition to the Illuminate committee highlighting opportunities in the Data Sciences pillar, Baylor has formed a data ethics group within Illuminate’s Human Flourishing, Leadership and Ethics committee. Chaired by Dr. Jonathan Tran, associate professor of philosophical theology and The George W. Baines Chair of Religion in the College of Arts and Sciences, the committee gathers professors like Petter and practitioners from across campus to discuss issues from numerous interdisciplinary angles, with a goal of helping students and faculty grow as ethical data science leaders.
From these committees will come ideas and initiatives, undergirded by the University’s Christian mission, that build expertise and spur critical thought in Baylor classrooms and the industries graduates and researchers serve. As Baylor grows in research output and towards R1/Tier 1 recognition, the infusion of the University’s distinct Christian voice in the rarified air of high-level conversations on data usage can bridge love for neighbor with technological expertise.
“The genie is out of the bottle in regard to data,” Petter says. “It’s here to stay and impacts much of what we do. We need to be part of that conversation and driving that charge when it comes to ethics, to not use data merely as a means to any end. It’s a great place for Baylor to shine.”