Meet a Data Professional: Shanu Sushmita

WiDS Puget Sound is excited to present the next entry in our series, “Meet a Data Professional!”

“Meet a Data Professional” is dedicated to recognizing the amazing women powering the Puget Sound area’s data science community, spotlighting their journey into the field, their incredible accomplishments, and the weighty challenges that they faced along the way. This lies at the heart of WiDS Puget Sound and Data Circles’ mission of inspiring women to enter the data science field by showcasing its many incredible role models.

Do you know any marvelous women in data science? Send us a tip here!

Shanu Sushmita, Assistant Teaching Professor at Khoury College of Computer Sciences, Northeastern University

“My happy place is in my classroom”, says Shanu.

Born and brought up in a small city in India, Shanu Sushmita is a first-generation PhD from her family. She draws inspiration from Buddha as she comes from the land where Buddha found his enlightenment. As a child, Shanu was interested in mathematics. After studying computer science, curiosity and her quest to learn more took her to Glasgow, Scotland where she did her PhD.

As an assistant professor at Khoury College of Computer Sciences, Northeastern University, Shanu shares her love of teaching and engaging with students and how it helped her navigate her career from heading a data science team to going back to teaching.

Let’s get to know her better and learn about her journey.

After earning an undergraduate degree in computer science in India, Shanu worked as a research assistant while completing her MTech at IIT Delhi. Her passion for mathematics, problem-solving, research, and AI propelled her towards a doctoral degree in informational retrieval at the University of Glasgow, Scotland. While doing her PhD, she got an offer from UCLA to be a visiting research scholar where she worked on exploring ways to disambiguate author profiles in the digital libraries. Her focus was on building an optimized method for this task.

During her PhD, she investigated users’ search behavior in online health information. The goal was to examine users’ preferences for the type of search results (image, news, video, etc.). In 2012, her doctoral work was recognized as among the most interesting of the year by the ACM SIGIR newsletter

Her first job after her doctorate was at the University of Washington where she joined as a post-doctoral research scientist and worked on several projects. She led a graduate student team in a personality prediction project, where the focus was on predicting the personality type of YouTube video bloggers. During this period, she also worked on various healthcare analytics projects to find solutions to problems in healthcare settings like estimating the future healthcare cost for individuals based on their past medical and cost information. Various data mining and machine learning methods were used to predict the healthcare costs of the population and individuals based on their prior history of medical and claims records.

Additionally, she also worked on building prediction models for the risk of hospital readmission, length of stay, and mortality. The hospital readmission rate within 30 days post-discharge stands as a widely acknowledged metric for healthcare quality and expenditure in the United States. Estimating hospitalization costs with a 30-day risk assessment for such readmissions offers added value for accountable care, a global concern and cornerstone of the US government's mandate under the Affordable Care Act. Recent endeavors in data mining typically focus on either predicting healthcare costs or the risk of hospital readmission, but rarely both. In this paper, Shanu and her team introduced a dual predictive modeling approach that leverages healthcare data to forecast both the risk and cost of any hospital readmission (referred to as "all-cause"). To achieve this, machine learning algorithms were explored to make precise predictions regarding healthcare costs and the risk of 30-day readmission. Their results in risk prediction for "all-cause" readmission, when compared to the standardized readmission tool (LACE), showed promise. Furthermore, the techniques proposed for cost prediction consistently outperform baseline models, demonstrating significantly lower mean absolute error (MAE).

When a colleague approached Shanu with an offer to serve as a senior consultant for his new company, Shanu enthusiastically accepted, firmly believing that every opportunity holds value. According to her philosophy, "No learning is ever wasted," as there's always something to be gained from each experience. Thus, she joined KenSci as a senior research consultant, providing strategic guidance to the data science team. Her role involved devising machine learning solutions for various challenges in healthcare, including predicting hospital readmission risks, estimating healthcare costs, and detecting fraud in healthcare claims data.

Later, Shanu was offered a full-time position at the company where she was asked to lead the data science team. With her constant thirst for knowledge and curiosity, Shanu eagerly accepted this opportunity. Beginning with just one data scientist, she gradually built up the team to twelve members within three years. Reflecting on her experience, Shanu reveals that her greatest challenge lay in shifting her perspective to align with the client's needs. Unlike in research, where the focus is on delivering the best results possible, in the corporate world, she had to learn to manage tight deadlines and prioritize delivering results, even if they weren't perfect.

Shanu stresses the significance of enjoying data storytelling and data exploration for aspiring data scientists. She points out that storytelling is a skill that takes time to develop, as it requires hands-on experience with data-driven projects. Additionally, she underscores the importance of effectively communicating findings to both technical and non-technical audiences, highlighting storytelling as a crucial yet often overlooked aspect of data science.

According to Shanu, storytelling is not only vital but also undervalued in the field of data science. She encourages her students to embrace the challenges of working with data and to approach it fearlessly. She believes that maintaining a sense of curiosity and quest for uncovering patterns and relationships within data is essential for success in this field. Shanu advises her students to utilize various resources, such as AWS, Conferences, and Kaggle data challenges, for honing their skills. She emphasizes the limitations of classroom learning alone, stressing the importance of applying concepts to real-life problems and experimenting with algorithms across different types of data. In her view, the more practical experience one gains, the deeper their understanding of data science concepts becomes.

Shanu expresses her enthusiasm and inquisitiveness about ChatGPT and the development of large language models (LLMs) as avenues for expanding her knowledge. She built a model called MUGC (Machine Generated vs User Generated Content Detection) with her team which can detect whether a text is written by a human or ChatGPT. She, along with her team performed a comparative evaluation of eight traditional machine-learning algorithms to distinguish between machine-generated and human-generated data across three diverse datasets: Poems, Abstracts, and Essays. The results indicated that a high level of accuracy can be achieved using traditional machine learning in identifying machine-generated data, reflecting the documented need of popular pre-trained advanced models like RoBERT. They found that machine-generated texts tend to be shorter and exhibit less word variety compared to human-generated content. Furthermore, readability, bias, moral, and affect comparisons revealed a discernible contrast between machine-generated and human generated content. There are variations in expression styles and potentially underlying biases in the data sources (human and machine-generated). This study provides valuable insights into the advancing capacities and challenges associated with machine-generated content across various domains.

Not being a social media savvy person, Shanu enjoys reading conference papers and listening to podcasts and seminars to keep herself updated in the field of data science. She firmly believes that curiosity is paramount in this field, especially given the constant evolution of technology. For her, it's fascinating to learn about the problems others are tackling and the challenges they face. In addition to her passion for data science, Shanu's interest in psychology led her to explore a project on the psychological impact of music on young minds. She recognizes the profound effect music can have on children and was drawn to investigate further. As a mother, she feared how the music that her children were listening to could have significant impact on their growth and development.   Music has a profound impact on our lives, bringing people together, enhancing health and well-being, providing a creative outlet, and more. Most significantly, music influences our emotions and brain function, activating some of the most extensive and diverse networks in the brain. The amount of time that children and adolescents spend listening to various forms of music has steadily increased over the years. Consequently, the influence of music on these demographics can be significant. As children develop their personal identities, they often imitate the behaviors and language of musical role models. However, some themes presented in song lyrics raise concerns (also recognized by the American Academy of Child and Adolescent Psychiatry (AACAP). Specifically, certain themes frequently found in song lyrics can be particularly troubling; such as drugs and alcohol abuse that is glamorized, suicide as an "alternative" or "solution, graphic violence, sex which may focus on control, devaluation of women, or violence toward women. Therefore, she felt that it was important to find solutions embedded within online music platforms and virtual home assistants (Google Home, Alexa, etc.) to empower parents like her to make the right music choices for their children.

Reflecting on her journey, Shanu admits that while she didn't envision herself becoming a professor during her formative years, she always relished the opportunity to explain concepts and share her knowledge with others. Unfazed by public speaking, she embraced every opportunity that came her way, navigating her path one step at a time. After spending over fourteen years in the field of data science, Shanu finds her motivation and inspiration from the meaningful impact her work has on people's lives. To her, data science is more than just numbers—it's about telling stories that resonate with real experiences and emotions. She says, “Data Science is the art of storytelling through data”.

hb gloria