WiDS 2024 Speakers
Panel
Empowerment Journeys: Entering, Exceling, and Exceeding Expectations in the Data Science Workforce
Panel
Landscape of Data Science Across Industries
Panel
Decoding Ethics: Perspectives on Responsible Data Science
Workshop
Causal Insights From Observational Data: A Hands-On Python Workshop
2024 Speaker Bios & Abstracts
Keynote Speakers
Teresa Escrig, PhD
CEO - What Matters Academy
Dr. Teresa Escrig, PhD in AI, has led robotics research, authored 100+ papers, 3 books, and spearheaded AI projects globally. At Microsoft, she led Responsible AI and Machine Teaching. Founder of What Matters Academy, she merges tech expertise with a passion for natural, empowered living.
Morning Keynote: How to thrive in the world of AI
Woven through her own story in the world of AI for 3 decades, Dr. Teresa will reveal in this presentation how AI is reshaping our world. She will discuss how to stay agile and innovative with AI, boosting productivity and creativity in any job. She will explore how to tackle challenges, protect our planet, and ensure humanity flourishes. No tech background needed—just a readiness to adapt and grow!
Alessya Visnjic
CEO - WhyLabs
Alessya Visnjic is the CEO of WhyLabs, the AI Observability company. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI and earlier she spent 9 years at Amazon leading AI initiatives. Alessya is the founder of Rsqrd AI, a global community of AI practitioners making enterprise AI technology robust and responsible. Alessya holds a B.S. in Applied Math and an M.S. in Business from the University of Washington. Outside of work, Alessya loves chasing her two little daughters, cooking, playing Beat Saber, and traveling.
Afternoon Keynote: Towards Robust and Responsible AI
The rise of foundation models has substantially reduced the barrier to adopting AI and enabled use cases which seemed like science fiction just 10 years ago. Now, the key challenge for AI practitioners is the ability to operate these AI-powered applications with transparency and control that is necessary to deliver a lasting, positive customer and business impact. In this presentation, Alessya will review key risks associated with operating AI applications and how to take advantage of the emerging AI tooling ecosystem to mitigate these risks. She will discuss the role of each AI practitioner in facilitating robust and responsible AI adoption.
Invited Speakers
Vandana Mohan
Data Science Director - Currently on sabbatical / pursuing personal projects in SpECial Education
Vandana is a Data leader, fantasy writer and mom. She has built and scaled data science and engineering teams at Meta and Shopify, and enjoys mentoring data professionals and advising companies on instrumentation, analytics, ML and data org building topics. She has a PhD from Georgia Tech and is currently exploring personal projects around applying AI in the domain of special ed.
The Data Art of Good Goals
Setting goals for product and business initiatives is one of the key ways that Data Scientists add value to their teams. This talk will explore a framework for how Data Scientists can take more ownership of this process and up level their thought leadership on their teams at the same time.
Catherine Nelson
Freelance Data Scientist & Author
Catherine Nelson is a freelance data scientist and writer. She is currently working on the forthcoming O’Reilly book "Software Engineering for Data Scientists”. Previously, she was a Principal Data Scientist at SAP Concur, where she delivered production machine learning applications and developed innovative new features. She is also co-author of the O'Reilly publication ”Building Machine Learning Pipelines", and she is an organizer for Seattle PyLadies, supporting women who code in Python. In her previous career as a geophysicist she studied ancient volcanoes and explored for oil in Greenland. Catherine has a PhD in geophysics from Durham University and a Masters of Earth Sciences from Oxford University.
Fireside Chat
In this informal chat, Catherine will answer questions on her journey from studying ancient volcanoes to writing data science books, via deploying production machine learning models! She’ll discuss her latest book, ‘Software Engineering for Data Scientists’ and give you some recommendations for applying software engineering best practices to data science code. She’ll also answer your questions on her move into data science.
Tessa Barton
Research Scientist—Databricks
Tessa is a research scientist at Databricks working on retrieval augmented generation. Previously she was at the New York Times using computer vision for sports journalism. She has a masters degree in Computer Science from Brown University and has worked in data science roles at Meta, SpaceX, Tinder and Snapchat.
Training Efficient Open Source Large Language Models
What does it take to train a Large Language Model like ChatGPT? This talk will go over the training and design of DBRX, Databricks’ 132 billion parameter flagship language model. We will also chat about all the ways in which Data Science helps out with training Large Language Models.
Zhamilya Kruger
Data Scientist - Nordstrom
Zhamilya Kruger is a data scientist who has cultivated a breadth of experience at Nordstrom over the past six years, contributing her analytical skills to several areas within the company. Her journey has taken her through Product Management, Search and Browse Optimization, Finance, and Strategic Analytics, where she has consistently applied her knowledge to support data-informed decisions.
Talk Title: Advancing Retail Fraud Prevention with Apache Kafka and Apache Flink: A Real-Time Event-Streaming Approach
The retail sector is increasingly vulnerable to a variety of sophisticated fraud schemes that can lead to financial loss and erode consumer trust. This session will examine the escalating problem of retail fraud, identify the key challenges retailers face, and offer a comprehensive overview of an innovative solution leveraging the distributed event-streaming capabilities of Apache Kafka in conjunction with the real-time stream processing power of Apache Flink. The session will provide an overview of the system's workflow, and a clear understanding of how the combined strengths of Apache Kafka and Apache Flink are shaping the future of real-time fraud prevention in the retail space, offering scalable, efficient, and adaptable solutions to this persistent industry challenge.
Madison Swain-Bowden
Senior Data Engineer - Automattic
Madison is a Senior Data Engineer & former Team Lead out of Seattle and an avid Python user/organizer. She is currently sponsored by Automattic to work on the open source project Openverse, and has worked at Ookla (Speedtest.net), the Allen Institute for Cell Science, and the Broad Institute. In her spare time she can be found baking, building digital tools to help those battling oppression, contributing to open source, walking her dog, reading queer fiction, or playing video games.
As easy as breathing - manage your workflows with Airflow!
Apache Airflow is an open source workflow management tool that's been called "cron on steroids". For a career data engineer, this tool has been central in my success at orchestrating and maintaining data pipelines. But Airflow's applications have grown far beyond the intent for which it was originally built. What was once a machine learning training engine is now a tool I've used extensively over the last 6 years. I've used it across 3 jobs, in several different roles; for side projects and critical infrastructure; for manually triggered jobs and automated workflows; for IT (Ookla/Speedtest.net), science (Allen Institute for Cell Science), the commons (Openverse), and liberation (Orca Collective). In this talk, I'll be sharing a brief overview of what Apache Airflow is and how it might be able to help manage *your* workflows too! As an Airflow user and contributor for the last 6 years, I've seen how this tool can quickly become the hammer for every nail you see. Part of what makes Airflow powerful is that you can define its workflows in pure Python; this means you can leverage all of the clever language features and libraries Python has to offer when setting up a job. No more pesky and repetitive YAML files (GitHub Actions) or domain-specific languages (Jenkins). Use the language and libraries you're familiar with while getting automatic retries, error handling, control flow, and so much more.
Marie Wang
Bioinformatics Scientist - Pfizer
Marie Wang is a Bioinformatics Scientist at Pfizer, one of the world’s premier biopharmaceutical companies. She applies rigorous statistical testing and machine learning methods to clinical data to understand the mechanism of action for drug candidates. Prior to joining Pfizer, Marie got her PhD in Neuroscience and worked for several academic research institutes on predicting Alzheimer disease outcomes using ML.
Novel semi-supervised clustering algorithm drastically improves consistency and interpretability in cancer drug development
Single-cell RNA sequencing is an emerging, state-of-the-art technology revolutionizing genomic analysis in cancer treatment. The primary tool used in its downstream analysis is unsupervised clustering, which helps to detect and visualize groups with common features and is leveraged more universally in the biomedical field to group cells based on their genetic and proteomic profiles. However, many common clustering methods suffer from inconsistency and interpretability problems. For example, clustering outcomes are heavily dependent on algorithm choice and are sensitive to variations in input and outliers. Additionally, it can be challenging to determine an appropriate number of clusters and label for each cluster. These issues are especially problematic for biological data in which the input data by nature has significant batch-to-batch variation, and being able to interpret the clustering labels or cell types is crucial to understanding the biological processes. Although advances in clustering methodology have helped optimize areas such as high dimensionality analysis and outlier detection, inconsistency and interpretability remain key analytical challenges. Here, we want to share a novel semi-supervised clustering method which addresses both problems. Originally developed by the Satija group at MIT, the algorithm constructs a reference clustering map through supervised learning using biologically measured data, anchoring future clusters to the reference map. In our case, we applied the model to classify and detect different cell types in cancer based on their gene expression profiles. Because the model can effectively control for the variability caused by the batch-to-batch effect, we were able to compare and pool a variety of data sources originating from different groups and environments. The reference map generated by supervised learning also provided a reliable way to label each cluster with a cell type, which drastically improved the cluster interpretability for analysis and presentation. Collectively, it has offered us a better understanding of the underlying biological processes, improving future cancer treatment medication. While our application was limited to drug development, there is no reason the semi-supervised approach cannot be applied more holistically to a variety of domains.
Mayuree Binjolkar
Research ScientistMeili - Technologies
Mayuree is a Research Scientist at Meili Tech working on AI for in-vehicle health monitoring. She has a Ph.D. in Transportation Engineering and Masters in CS and Intelligent Transportation from the University of Washington. Her expertise bridges AI, transportation, and HCI, focusing on enhancing driver decisions.
Trustworthy Automation: A Case Study in Explainable Generative AI for Driver Decision Making
There is a need for real-world driving data to understand how people drive in complex traffic situations. This presents a significant challenge to improving driver and road safety while developing more trustworthy vehicle automation technology. Therefore, this study addresses this gap by creating models that generate realistic driving scenarios to deepen understanding human driver decision-making using limited datasets. The models used in this research are based on explainable generative artificial intelligence, a combination of Generative Adversarial Networks (GANs) and explainable AI (xAI). This approach improves transparency and trustworthiness in understanding how the models operate. The goal is to simulate typical, rare, and critical driving scenarios, capturing a wide range of driver actions under various traffic conditions.
Sara Riker
Data Science Manger - Nordstrom
Sara Riker is a Manager of Data Science and Analytics at Nordstrom. Starting on the salesfloor, she worked in various manager roles in stores before earning her Masters in Analytics from American University and moving to Seattle as a Data Analyst in 2017. In addition to her experience solving problems for retail merchandisers with data, she is a mentor and invests in growing the careers of data professionals, particularly those from frequently overlooked and marginalized backgrounds. Her background has taken her from coast to coast and she is a passionate dog mom who would love to hear about your dog as well.
Womansplaining the Journey: Empowering Women to Thrive in Data Science Careers
In this talk, we will explore the challenges women face in building successful data careers and navigating the persistent issue of mansplaining. As a woman data science manager, I will share personal experiences, insights, and practical strategies for empowering women in this male-dominated field.
The session will begin by examining the dynamics of mansplaining and its impact on women's confidence and professional growth. We will identify common instances of mansplaining in data science workplaces and discuss the underlying biases that perpetuate these behaviors. By shedding light on this issue, we aim to create awareness and drive change towards a more inclusive and equitable industry.
Furthermore, we will provide actionable advice on how women can thrive in their data science careers by building resilience and asserting themselves. We will explore effective communication techniques, strategies for establishing credibility, and methods for navigating challenging professional situations. By equipping women with these tools, we aim to empower them to navigate mansplaining and create a supportive environment that values their contributions.
Join us for an engaging discussion on how we can collectively address mansplaining and foster an inclusive culture that enables women to thrive in their data science careers. Together, we can work towards breaking down barriers, empowering women, and creating a more diverse and equitable data science community.
Yuanjie (Tukey) Tu
PhD candidate - University of Washington
Yuanjie (Tukey) is a PhD candiate in Transportation Engineering at University of Washington. She mainly works on research projects that aim to advance sustainability outcomes by employing statistical and deep learning models to investigate diverse aspects of transportation behavior, from autonomous vehicle ownership to ride-sharing, and biking patterns. Recently, she's also delved into the application of deep learning models for predicting electric car charging demand and data visualization.
Towards sustainability: leverage deep learning in electric vehicle (EV) charging demand prediction
With the accelerating global transition towards sustainable energy, the demand for Electric Vehicles (EVs) has surged, necessitating advancements in EV charging infrastructure. Please join me for an exciting tour of leveraging deep learning in electric vehicle (EV) charging demand prediction! I will present an application we developed, utilizing deep learning to predict EV charging demand and corresponding energy savings, a crucial aspect in optimizing energy distribution and promoting sustainable transportation. Our application explores various deep learning models, including multiple linear perceptrons (MLP), convolutional neural networks (CNNs), long short term memory (LSTM), and a transformer, to analyze historical EV charging data alongside external variables influencing charging behaviors. I’ll present the results from different deep learning models and how to turn them into practical solutions. I will also present some fun visualizations of our results! Our application is open to all and takes advantage of publicly available data. In summary, it can serve as a tool for policymakers and/or urban planners in anticipating peak usage periods, optimizing resource allocation, and minimizing strain on the power grid!
This paper is coauthored with Mayuree Binjolkar.
Jyoti Vasudev
Senior Software Engineer - Microsoft
Jyoti holds an engineering position at Microsoft and is furthering her education in Data Science at the Harvard University Extension School. Her professional journey has been dynamic, transitioning from engineering to MBA and CFA, with over a decade of experience in the tech industry. Currently, she’s immersed in the realm of artificial intelligence, contributing to its widespread integration. Outside her professional and academic pursuits, she dedicates her time to her four-year-old daughter. Like all working mothers, her days are meticulously organized, a lifestyle embraced with enthusiasm.
Address Magic: OpenAI’s Fixer
Large language models (LLMs) have revolutionized the field of task automation. They can perform many tasks that require human expertise and research by using effective prompts. However, most of the current workflows that rely on optimal automation use SQL SPs and fuzzy lookups, which are deterministic methods. This means that they can only produce partial solutions in some cases. For instance, if a source system sends wrong address information, such as putting a company name in the city name field, a deterministic lookup will fail to correct it. This would need human intervention to search for the company name and find the right city name. But this problem can be solved by using LLMs.
Meltem Gurcay-Morris
User Researcher - Microsoft
Meltem Gurcay-Morris, PhD is currently a user researcher in AI Platform at Microsoft, which builds products to aid data scientists and developers in creating AI applications for businesses. Meltem's work focuses on understanding and improving user experiences with respect to implementing AI solutions responsibly, and creation, evaluation, and monitoring of generative AI solutions. Meltem received her MA and PhD in psychology with a specialization in judgment and decision making processes from the University of Pennsylvania, where her research focused on mathematical models of moral judgment and improving individual and group forecasting processes.
Ethical Implementation of Generative AI in Business Use Cases: A Practical Guide for Innovating Responsibly
In the past year generative AI applications have emerged as powerful tools capable of creating new, previously unseen content. However, with great power comes great responsibility. This talk will delve into the ethical implications and responsibilities associated with the development and deployment of these applications.
We will begin by defining generative AI and its various applications in business scenarios our teams have seen today. We will then explore the concept of Responsible AI, discussing its importance in ensuring the ethical use of AI technologies. We will highlight the potential risks and challenges posed by generative AI, such as the creation of deepfakes, the potential for bias in generated content, and issues related to data privacy.
The talk will also cover practical advice for measuring and mitigating risks in generative AI applications based on enterprise use cases. We will discuss strategies and current best practices for incorporating ethical considerations into the AI development process, such as transparency in AI decision-making, robustness against manipulation, and respect for user privacy. Specifically, we will talk about how businesses experiment with their generative AI solutions (e.g., prompt orchestration); how they evaluate their solutions to go to production (e.g., LLM-based metrics, red-teaming, use of synthetic data for testing); and how they continue monitoring the health of their solutions (e.g., performance metrics, data science metrics). We will also explore the importance of cross-functional collaboration in addressing these challenges, emphasizing the need for input from ethicists, legal experts, social scientists, and user experience professionals.
Finally, we will present case studies of responsible generative AI applications, demonstrating how businesses approach turning responsible AI principles into actionable items as they create solutions for their use cases. We will conclude with a discussion on the future of Responsible AI in the context of generative applications, considering both the opportunities and challenges that lie ahead.
This talk aims to equip data scientists with the knowledge and tools to develop generative AI applications responsibly, ensuring that these powerful technologies are used in a manner that respects user privacy, promotes fairness, and benefits society as a whole. Join us as we navigate the ethical landscape of generative AI, fostering a future where AI serves as a force for good.
Kimberly Glock
Data Scientist - Surgical Science
Kimberly serves as a Data Scientist at Surgical Science, where she is an integral part of the Research and Development Data Science team based in Seattle. Her expertise is primarily channeled into constructing machine learning models to support multiple projects across the company. Prior to her tenure at Surgical Science, she honed her skills in data science through roles at SAIC and Arrow Electronics. Kimberly's academic foundation is solidified by a Master's degree in Data Science from Lipscomb University. Outside the professional realm, she is a devoted mother and an avid volunteer. Kimberly's passion for the outdoors is evident in her enthusiasm for skiing, mountain biking, mountaineering and canyoning—activities that she fervently engages in during her leisure time.
Synthetic Data for Instrument Segmentation in Surgery (Syn-ISS)
Synthetic data is increasingly important and relevant in today's data-driven landscape. It addresses privacy concerns by providing a means to generate data that mimics real-world information without exposing sensitive personal details. This makes it particularly valuable in fields like healthcare, where data privacy is paramount. Additionally, synthetic data can be used to fill gaps in datasets where real-world data is scarce or biased, enabling more comprehensive and unbiased AI training. It also allows for the testing and validation of systems in a controlled environment, enhancing model robustness and accuracy. Furthermore, synthetic data is instrumental in scenarios where gathering real-world data is impractical or too expensive, thus accelerating research and development across various industries. Our simulators and expertise in surgical simulation enables us to generate synthetic data that significantly enhances AI applications in the medical field. This contribution is pivotal in advancing AI-driven innovations in surgery. By harnessing advanced algorithms and state-of-the-art simulation technologies, we can produce high-quality synthetic data that closely mimics real-world surgical scenarios.
The core of this presentation revolves around the Synthetic Data for Instrument Segmentation in Surgery (Syn-ISS) challenge, hosted at MICCAI 2023 in Vancouver, Canada. The Syn-ISS challenge highlights the innovative use of semantic image segmentation algorithms and synthetic data derived from our state-of-the-art surgical simulators. Specifically, the challenge focuses on segmenting surgical instruments within synthetic data images. We had 12 participating teams compete. The dataset consisted of 3600 synthetic images generated from our FlexVR simulator. The winners were chosen based on a composite score of rankings, based on two weighted metrics: Dice Similarity Coefficient (DSC) and the Hausdorff Distance (HD). This challenge and participants showcased that synthetic data can be used in medical AI, benefiting medical education for humans and machines, ultimately improving patient outcomes.
Kelley Hall
Data Scientist - Tableau
Kelley Hall is a Data Scientist at Salesforce working on the Tableau Global Sales Operations team where she uses ML to enable data driven decision making within the sales organization. Her projects range from sales forecasting to discount recommendation. She received her PhD from the University of Washington, focusing on slow slip earthquakes in the Pacific Northwest. In her free time, Kelley coaches Ultimate frisbee for the University of Washington Women’s team and enjoys the outdoors with her pup Gus.
MLOps for the Lonely Data Scientist
As the lone data scientist on a very small team, it can be very difficult to know or implement best practices on code productionalization. Many aspects of MLOps are often controlled by larger data engineering teams that you are your company may not have access to, but there are still tools and practices that we can implement in our day to day work. In this talk, we will explore implementing practices such as version control, continuous integration and continuous deployment (CI/CD), and using development and production environments. This is an opportunity to borrow from the software engineering development cycle and make our data science work more resilient to change or failure.
Kate Ostbye
Director, Enterprise Data Science and Machine Learning - Pfizer
Kate Ostbye is the Director of Data Science and Machine Learning at Pfizer, leading AI/ML solution delivery for R&D and co-leading a coding CoP and a local women's resource group. Kate holds a BS in English and Anthropology from UW-Madison, and an MPH in Epidemiology and Biostatistics from JHSPH. Her work spans academic and industry sponsors, multiple programming languages, and all stages of clinical trials. She has contributed to multiple standards developments in her field including PHUSE, R Consortium, and CDISC.
Empowering Women in Career Progression: A Comprehensive Evaluation of the Principal vs Manager Data Science Pathway
In this engaging talk, we aim to provide women data science professionals with an in-depth understanding of two crucial career advancement paths: the Principal role and the Managerial role. We will start by explaining the distinct responsibilities, skills required, and potential influence of each role, with a specific focus on the experiences and challenges faced by women in these positions. We will draw from real-world case studies and industry trends to highlight the unique advantages and potential hurdles in each role. Next, we will delve into a detailed discussion of the pros and cons associated with both roles. This will encompass aspects such as job satisfaction, work-life balance, compensation, and growth opportunities, all from a woman's perspective. The objective is to equip participants with the necessary insights to make informed career decisions that align with their professional goals and personal aspirations. In the final segment, we will facilitate an interactive dialogue, providing attendees with the opportunity to share their perspectives, ask questions, and learn from the experiences of other women professionals. This workshop is designed to be a strategic tool for women in charting their career path in data science, helping them understand whether a Principal or Manager role best aligns with their professional ambitions and personal needs.
Bhavana Raj Nagaraj Srinivasappa
Senior Data Developer - Labcorp Drug Development of America
Bhavana is a skilled professional with over ten years of experience in IT, specializing in managing data, ensuring quality, and handling Big Data in various sectors like banking, finance, healthcare, manufacturing, and insurance. She earned her master's degree from Kingston University London, enhancing her expertise. Besides her job, Bhavana enjoys helping others succeed in the data field. She mentors projects related to data science on online platforms, guiding aspiring professionals. She is also passionate about teaching the younger generation about data and financial literacy, aiming to empower them.In her leisure time, Bhavana finds relaxation in practicing yoga and indulging in books, either by reading or listening.
Multi Class Skin Cancer Detection Using Deep Learning Techniques
Skin cancer poses a significant health threat, necessitating early and precise detection as it often spreads to different body parts. This research introduces a groundbreaking automatic classification system for skin lesions, employing a non-invasive approach and harnessing the power of deep learning algorithms, enhanced by transfer learning techniques.
Our project focuses on advancing skin cancer detection using neural networks, particularly by implementing cutting-edge transfer learning models such as DenseNet201, EfficientNet B7, ResNet50, and ResNet152V2. Tailoring network layers, leveraging specific activation functions, and optimizing input data size are integral to this approach. The proposed models amalgamate various parameters to elevate the identification and classification of skin lesions. Additionally, we explore different hyperparameters like optimizers, batch size, and epochs to identify the optimal model. Our research utilizes the HAM10000 dataset, encompassing seven diverse types of skin cancers, ensuring robustness through image augmentation and resampling techniques for dataset balancing.
A comparative analysis against state-of-the-art transfer learning models validates the performance of our suggested models. Notably, our proposed DenseNet201 transfer learning model achieves a remarkable test accuracy rate of 84%, with a 69% recall for multi-class skin lesion classification. ResNet50 and EfficientNet B7 follow closely with an accuracy of 83%, ranking as the second-best achieved accuracy using the Adam optimizer for both models. Impressively, the recall values for 'nv' and 'vasc' lesion classes from the DenseNet201 model are 92% and 94%, respectively. The effectiveness of our model is further validated during deployment and prediction.
In conclusion, this project significantly contributes to dermatology by showcasing the effectiveness of deep learning, specifically transfer learning models, in substantially improving the accuracy of identifying and classifying skin lesions. Our results demonstrate the potential of early detection of cancerous skin lesions. This paper not only presents the achievements of our work but also outlines future prospects for further advancements.
This submission is part of my MSc Data Science program, and I am enthusiastic about presenting this topic at the conference, sharing the approach, findings, and the progress achieved throughout the implementation.This work can be used in the conference as 20 minutes presentation or Code Workshop also can be arranged.
Riya Joshi
Data Scientist 2 - Microsoft
Riya is a Data Scientist at Microsoft who specializes in NLP and machine learning. She holds a Master’s degree in CS from the University of Massachusetts, Amherst, which she completed in May 2022. Before joining Microsoft’s US team, she worked as a Data Engineer in India. She is passionate about building data and AI-driven products and solutions that can benefit people and society. She enjoys hiking, dancing and working out in her spare time.
Finetune LLMs
Newer smaller LLMs like Llama2 and mistral can be finetuned easily and provide more power over prompting. They utilize a new paradigm of fine tuning called parameter efficient finetuning (PEFT). Prompting has shown to have many restrictions and can limit the capabilities of LLMs, with PEFT even smaller models are performing very well.
This talk will introduce the audience to what PEFT is and how one can finetune LLMs to build custom solutions. This talk is really beneficial for audience who want to learn NLP in the LLM era.
Tanisha Jauhari
Student
Tanisha is a student from the San Francisco Bay Area. She has done machine learning research, primarily focusing on bias in generative artificial intelligence systems. Tanisha is passionate about supporting girls and women in STEM, and she serves as a 2024 ambassador for Women in Data Science Worldwide.
Race and Gender Bias in Generative AI Models
In this study, we set out to measure race and gender bias prevalent in text-to-image (TTI) AI image generation, focusing on the popular model Stable Diffusion from Stability AI. Previous investigations into the biases of word embedding models—which serve as the basis for image generation models—have demonstrated that models tend to overstate the relationship between semantic values and gender, ethnicity, or race. These biases are not limited to straightforward stereotypes; more deeply rooted biases may manifest as microaggressions or imposed opinions on policies, such as paid paternity leave decisions. In this analysis, we use image captioning software OpenFlamingo and Stable Diffusion to identify and classify bias within text-to-image models. Utilizing data from the Bureau of Labor Statistics, we engineer fifty prompts for professions and fifty prompts for actions in the interest of coaxing out shallow to systemic biases in the model. Prompts include generating images for ‘CEO’, ‘nurse’, ‘secretary’, ’playing basketball’, and ‘doing homework’. After generating twenty images for each prompt, we document the model’s results, which show biases do exist within the model across a variety of prompts. For example, 95% of the images generated for ‘playing basketball’ were African American men. We then analyze our results through categorizing our prompts into a series of income and education levels corresponding to data from the Bureau of Labor Statistics. Ultimately, we find that racial and gender biases are present yet not drastic for all cases.
Megan Ebers
Postdoctoral Scholar—University of Washington
Megan R. Ebers is a postdoctoral scholar in applied mathematics with the NSF AI Institute in Dynamic Systems at the University of Washington. In her PhD research, she developed and applied machine learning methods for dynamics systems to understand and enable human mobility. Her postdoctoral research focuses on data-driven and reduced-order methods for complex systems, so as to continue her work in human-centered research challenges, as well as to extend her research to a broader set of technical challenges, including turbulent flow modeling, natural disaster monitoring, and acoustic object detection.
Data expansion to improve accuracy and availability of digital biomarkers for human health and performance
Advances in deep learning and sparse sensing have emerged as powerful tools to enable and expand human motion tracking. Motion tracking and analysis is essential for monitoring disease progression, guiding rehabilitation treatment, evaluating sports performance, and informing assistive device design. Biomechanists traditionally characterize motion, such as gait, by measuring biomechanical variables like joint kinematics, kinetics, and spatio-temporal parameters. Certain biomechanical variables have been established as biomarkers that correlate with meaningful outcomes, such as knee adduction angle for ACL injury or step width variability for aging/fall risk. In the US, with 1 in 7 individuals having a mobility disability and 1 in 2 adults living with a musculoskeletal condition, monitoring human motion 'in the wild' is vital for observing individuals' natural functionality and lifestyle. For motion to be observed in natural or uncontrolled environments, sensing devices must be portable, unobtrusive, reliable, and accurate. However, for sensing data to be meaningful, measurements must be converted to and contextualized as personalized biomechanical outcomes, a challenge not yet overcome in natural environments. Here, we present a deep learning algorithm -- originally developed for full state-space reconstruction of complex dynamical systems -- for personalized human motion tracking. Using this algorithm, we learn a mapping that transforms a low-dimensional sensor input into the full state-space dataset. By using as few as one sensor, we demonstrate that it is possible to reconstruct a comprehensive set of measures that are important for tracking and informing mobility-related health outcomes. As a concrete example, most smartwatches and smartphones contain an IMU (inertial measurement unit) sensor that monitors movement and is currently used for simple measures like daily step count or gesture control. We have demonstrated that our deep learning algorithm can use this single sensor to reconstruct not just the body segment where the sensor is worn, but the motion and—in some cases—the physiological state of the body. The basic premise of our approach that makes this powerful transformation possible is the leveraging of sensor measurement time histories to inform the mapping from low to high dimensional data. By expanding our datasets to unmeasured or unavailable quantities, this work can impact clinical trials, robotic/device control, and human performance. Additionally, this methodology may enable more efficient and cost-effective remote monitoring of patients, reducing the need for frequent visits to clinical settings. Overall, our work represents a major advance in personalized human motion sensing and has the potential to transform the way we monitor and manage movement-related health outcomes.
Andrea Urban, PhD
Data Scientist—Puget Sound Energy
Andrea Urban is an astronomer-turned-data scientist. When she isn't chasing the next total solar eclipse, she enjoys looking for patterns in data and building bespoke machine learning models.
Data Science at an Electric Company: Building and Validating an Electric Vehicle Detection Model
Washington has recently passed two pieces of legislation that impact electric companies in the state: the Clean Energy Transformation Act (CETA) and the Zero Emissions Vehicles Law. They require that the state’s electricity supply be free of greenhouse gas emissions by 2045 and for all new vehicles sold in the state to be zero-emission vehicles by 2035, respectively. In addition to moving away from coal-fired power plants, CETA states that utilities must consider the equity impact of these clean energy investments on vulnerable populations and highly impacted communities. To address this, utilities are developing energy efficiency incentives and community-based distributed energy resources. The changing nature of our state’s electric usage patterns due to these investments in community programs and an increase in electric vehicle (EV) adoption will put new stresses and strains on our electrical grid that we need to understand. In this talk I will focus on the model we developed at a Washington utility to detect EVs in order to understand their impact on our electrical grid.
In the coming decades, most electric vehicles are expected to be charged at single-family residences in the evening hours, rather than public charging stations at all hours. In order to prepare for the increased load on our power grid during peak times, electric companies need to know when and where the charging is happening. Building an EV charging detection model is difficult because the expected population of EVs is around 1-5%; we have a highly imbalanced problem. Starting with a relatively small labeled dataset, we built an EV detection model using novel step-detection features in time-series data. Using a random forest classifier, we are able to achieve accuracy, precision, and recall metrics of over 80%. In order to validate our data with what we expect among our population of customers, we compared our results to aggregated data from the Department of Licensing as well as survey results from our customers.
Overall, our stakeholders are satisfied with our model and its ability to predict which customers are charging an EV. I will discuss how we work with our stakeholders to understand which metrics we need to optimize for in order to help them prioritize their maintenance work. I will also briefly discuss next steps for this model.
Kasia Rachuta
Data Science Tech Lead—Square
Kasia is a Data Science Tech Lead at Square, where she collaborates with her team to drive data-informed decision-making. Her expertise spans various domains, including identity verification, sales analytics, ecommerce, and infrastructure. Prior to her current role, she gained valuable experience working at Medium and FiveStars. In these positions, Kasia conducted numerous A/B tests to facilitate informed product decisions and actively contributed to company-wide A/B testing initiatives. In her free time, she indulges in her passions for travel, scuba diving, and reading.
Overcoming challenges and pitfalls of AB testing
This session will go beyond an overview of what A/B testing is. It will cover how to work with cross-functional partners to set up a test and analyse it. Finally, I will talk about how to make a decision based on the test results. I will go into depth about the most common challenges and pitfalls that I have experienced throughout my career and how to avoid making the most common mistakes. After the talk, you will know what to do when someone asks you to analyse an experiment you haven't designed, how to deal with partners asking for 'directional data' and how to work successfully with engineering to ensure each test is set up correctly.
Akriti Chadda
Applied Scientist—Microsoft
Akriti is an accomplished applied scientist with a strong focus on search and relevance. She possesses a diverse skill set, having earned an undergraduate degree in biomedical engineering and a master's in computer science. Her expertise lies in developing advanced algorithms for search engines, and she constantly strives to deliver exceptional results. In her free time, she can often be found engrossed in memoirs and biographies, fascinated by the stories of people's lives and the lessons they offer. She also has a love of lo-fi music and to keep herself energized, she relies heavily on her love of coffee, which she consumes in copious amounts.
Revolutionizing Search: The Integration of Generative AI and the Technical Challenges Ahead
In the rapidly evolving landscape of search engine technology, the integration of Generative AI has marked a paradigm shift from traditional keyword-based algorithms to advanced, intent-driven models. This talk aims to dissect this transformation, elucidating how models like GPT-4 are not just enhancing search engine capabilities but are redefining them. We begin by exploring the genesis of this change—the shift from simple keyword recognition to the complex understanding of user intent. This is a journey from linear algorithms to AI models that comprehend context, semantics, and the nuanced intricacies of human language. The talk will illuminate how these AI-driven engines are now capable of predicting user intent, thereby delivering search results that are not only accurate but also contextually relevant, making information retrieval more intuitive and efficient. However, this innovation is not without its challenges. The core of this discussion will pivot to the myriad technical hurdles encountered in blending Generative AI into existing search architectures. We'll delve into the computational demands these models impose, addressing the need for substantial processing power and advanced data handling capabilities. This segment will also cover the obstacles in adapting to the rapid pace of AI technology evolution, ensuring that search engines remain not just relevant but cutting-edge. Another crucial aspect is data privacy and security—paramount in an era where user data is both vital and sensitive. We'll examine the strategies to safeguard user privacy while leveraging AI for personalized search experiences. Furthermore, we'll address the challenge of linguistic dynamism—how AI models cope with the ever-changing nature of human language and the implications this has for search accuracy and relevance. This talk aims not only to highlight the revolutionary impact of Generative AI on search engines but also to provide insights into the practical solutions and strategies being developed to surmount the associated technical challenges. It's designed for an audience deeply entrenched in data science and technology, offering a blend of high-level understanding and technical detail that will resonate with professionals in the field.
Mari Pierce-Quinonez
Senior Principal Data Scientist
Mari leads the Data Science team for Slalom Seattle, where for the last 7 years she has been helping companies across the region expand their data science capabilities across the spectrum, from getting started with their first statistical analysis to full-fledged MLOps solutions. Mari loves identifying the right DS challenges to tackle, which are both intellectually stimulating for data scientists and move the needle for business stakeholders. Prior to becoming a Data Scientist, Mari worked in international development and holds a dual master’s in urban planning and agricultural development from Tufts University, and taught data science at Galvanize.
Is it worth it? Let me work(shop) it!
With the latest GenAI hype wave, many executives are asking data teams, "can't we just use AI to do this?". Execs and business leads often don't know enough about traditional ML or Generative AI to assess the utility of these tools, bogging down data scientists with unrealistic requests. This session provides data scientists with a foundation of questions to ask over-eager execs to evaluate and prioritize ML use cases through a series of workshops.
Depending on the maturity of the data science practice and the expertise of the business lead in question, we've found three different types of workshops valuable in helping to educate and inspire:
Use Case Workshop: to identify business pain points and brainstorm connections to ML/GenAI solutions.
Prioritization workshop: Optional follow-on to the use case workshop to identify the highest ROI use cases
Requirements workshop: deep dive into a specific problem to identify the core users, the proposed solution, and the expected impact.
Attendees will receive a sample workshop agenda, templates, and tips for effective virtual and in-person facilitation.
Anastasiya Usenko
Anastasiya Usenko is an early career data scientist in the field of applied deep learning research, with bachelors degrees in computer science and linguistics. At PNNL, she has worked with reinforcement learning, graph neural networks, and causal inference modeling, among others. Her current research interests lie in graph analytics and model interpretability.
Rachel Wofford
Rachel Wofford is a Data Scientist at PNNL. Her research and interests involve reinforcement learning, adversarial machine learning, and development of big data analytics in the radio frequency and cybersecurity domains. Rachel holds an MS from Oregon State University and a BS from Whitworth University, both in mathematics.
Reinforcement Learning for Model Bias Analysis
With the broad-scale application and massive growth of artificial intelligence (AI) and machine learning (ML) in all aspects of society, a question persists as to the robustness of these systems. In most cases, methods of investigating trustworthiness and explainability in ML models have focused on reactive methods designed to detect when a model has erred. These avenues of investigation are also limited to standard interrogation methods, which may be inadequate for sufficiently novel model architectures or data modalities. We, on the other hand, are developing a proactive method to anticipate possible failure states by simulating a unique and optimal adversarial attack using reinforcement learning (RL). We explore RL as a technique for evaluating model biases and robustness and propose an RL Optimizing Bias Elimination and Robustness Tool (ROBERT). The expected outcome of ROBERT is to learn how biases in a model can be exploited under potential adversarial attack.
In developing ROBERT, we train an image classification model on the MNIST dataset and construct an RL environment that perturbs input images which are then passed into this classification model. The reward of our system is designed to correlate with the impact of the perturbations on the model’s ability to correctly classify the image, with model error translating to higher reward, therefore teaching ROBERT the classifier model’s weaknesses. We validate ROBERT by means of a test wherein we train multiple image classification models with differing architectures and analyze ROBERT’s chosen actions to identify probable model biases. Additionally, we observe how extendible these methods are to the black box adversarial case, which requires less information from the model to perform a successful attack. In conducting this experiment, we develop a novel RL-based methodology aimed to identify unseen points of weakness and bias in existing image classification models.
Palak Bansal
Palak Bansal is an accomplished data science professional committed to promoting diversity and inclusion in technology. Currently pursuing her Master's degree in Data Science at New York University, Palak has over three years of experience in both software and data science projects. Palak has presented her work at various conferences and is currently doing active research on the intersection of generative AI and causal inference.
Ridhika Agrawal
Ridhika recently graduated from New York University with a Masters in Data Science. During her time at NYU she worked with Dr. Viral Acharya, built an LLM-evaluation tool at PayPal, and conducted research in CausalML. Currently, she is the Data Scientist and Engineer at Atalan Tech, a health-tech startup predicting clinician burnout. Ridhika is a creative thinker, an eager learner and aims to use her experience to drive impactful change.
Hoa H. Duong
Hoa is a data science professional whose interests lie in the intersection between data science, economics, and business. Hoa earned her B.A. in Mathematics and Economics with honors and worked as an Analyst and Researcher at NERA Economic Consulting, where she led teams to implement quantitative and econometric analyses. Hoa is currently pursuing her Master’s degree in Data Science at New York University and has worked as a Data Scientist Intern at Amazon, where she spearheaded a cross-functional project to provide long-term probabilistic forecast for one of Amazon core businesses. During her time at NYU, she also contributed significantly to various research projects, such as using Generative Adversarial Nets for Causal Inference. Outside of professional endeavors, Hoa is part of a scholar group to mentor students from her home country and advocate for diversity and inclusion in technology. She also particularly enjoys taking long walks with her pup Levi.
GANs for Causal Inference:
Harnessing Conditional Independence
This interdisciplinary talk introduces the listeners to the power of Generative AI in the field of Causal Inference and its subsequent applications in Economics and Political Science. Our rigorous year-long research aims to develop a state-of-the-art Causal Inference technique: CausalGANs. Generative Adversarial Networks (GANs) is a popular deep learning method which dominates the field of image generation. We harness the essence of GANs to create, from scratch, a causal inference technique which modifies the architecture of GANs to solve the fundamental problem of missing counterfactuals in Causal Inference. In this thorough research, we set up a new framework, develop the notation, write mathematical proofs, and produce robust results by running over 200 parallelised experiments for each different set of parameters on High Power Computing. The GANs algorithm simultaneously trains two models: a generator and a discriminator. The generator's objective is to find a data-generating process that generates fake data emulating the distribution of real data and the discriminator's objective is to distinguish the real data from the fake data. This adversarial nature makes this framework a minimax game between its two components; the competition in this game drives both generator and discriminator to improve their methods until the simulated samples are indistinguishable from the observed samples. At the core of the GANs algorithm is the search for a neural network model that can generate fake data, whose distribution is independent of the labeling of real versus fake data. Independence restrictions of this kind are front and center in causal inference models, where the distribution of potential outcomes under treatment and control, conditional on contextual variables, are independent of the realized treatment. This makes the GANs apparatus a good method for causal inference, where instead of pitting real versus fake data, we now strive to get distributions of potential outcomes for treated and non-treated as close as possible. The ongoing research involves the development of the method, proof of its validity, and conducting empirical experiments. We confirm several intuitions as we test different aspects of the method, CausalGANs, with a robust evaluation strategy and compare it against traditional and other state-of-the-art methods in causal inference. We were able to empirically verify the mathematical theorems defined for the framework:
1) We can recover the parameters of the data-generating process through this adversarial framework;
2) The minimum of the loss function is attained close to the true data parameters;
3) The minimizer provides the best estimator of the propensity score. Through this framework, we successfully obtain the treatment effects.
Thus, the success of this method revolutionizes the field of economics through practical applications such as policy development, which often seeks to find the causal effect of interventions.
Hands On Workshop
Huibin (Mary) Hu
Sarah Shy
Ganga Meghanath
Data Scientist—Microsoft
Mary is a data scientist at Microsoft focusing on large scale client-side experimentation. She is also a co-founder of the Women in Data Science Community at Microsoft which has over 600 members now.
Data Scientist—Microsoft
Sarah is a data scientist at Microsoft where she works on applications of causal inference and builds ML models to power intelligent Windows features. Prior to joining Microsoft, she conducted research in the area of astrostatistics. Sarah is also passionate about mentoring newcomers to data science and trying new food recipes on the weekends.
Data Scientist—Microsoft
Ganga is a Data Scientist in the Experimentation for Windows (EFW) team, focusing on extracting causal insights from data. She has a Master's and Bachelor's degree in Electrical & Computer Engineering from Purdue University and IIT Madras respectively. Her prior experience includes a role as a Data & Applied Scientist in the AdQuality team at Bing Ads.
Causal Insights From Observational Data:
A Hands-On Python Workshop
Causal analysis is a powerful tool for understanding the mechanisms and effects of interventions in complex systems. While A/B experimentation is the gold standard for extracting causal insights, there are situations where experimenting isn’t possible—such as when the feature was already released or ethical restrictions prevent us from experimenting on certain populations. In such cases, we must rely on observational data,
which pose many challenges for causal analysis, such as confounding, selection bias, and unmeasured variables. In this workshop, we will introduce the basic concepts and methods of causal discovery and causal inference as we guide the audience through a hands-on step-by-step causal analysis using common Python causal libraries, including DoWhy and EconML. We will provide a toy dataset for illustration. An internet-connected device is required.
Panels
Empowerment Journeys: Entering, Exceling, and Exceeding Expectations in the Data Science Workforce
Join us for an insightful panel discussion, where we’ll explore career development, facing adversity and overcoming challenges as women in the fields of data science and AI. Our diverse panel of accomplished professionals will guide the attendees from the initial steps of launching their careers through rising to leadership positions and excelling amidst competition. Panelists will talk about the triumphs and obstacles encountered along their journeys, and share what their own version of empowerment looks like. We will delve into common challenges facing women in these fields, including equitable compensation, recognition, promotions, maintaining work-life balance, and navigating double standards. Whether you're contemplating a career in data science or seeking advancement in your current role, we invite you to join us for an honest & inspiring conversation with our amazing lineup of data science experts. They will share tips, strategies, and real-world anecdotes from their own journeys in tech & academia, providing invaluable insights and guidance for every stage of your career.
Widad Machmouchi
Widad Machmouchi is a Principal Data Science Manager in the Ads Marketplace Management team at Microsoft. She leads the Experimentation and Metrics team for the Microsoft Ads Platform, focusing on defining trustworthy metrics, building scalable experimentation frameworks, and enabling leadership to make timely data driven decisions. She collaborates with Engineering, Product and Business teams to address complex data science problems across users, advertisers, and publishers. Widad has a long career in metric design and experimentation and enabled multiple products like Bing, Microsoft Office suite, and Visual Studio to make data-driven decisions and produce business insights. She is passionate about developing intelligent products that give users agency over their data while maximizing the utility they receive. Widad holds a PhD in Theoretical Computer Science from the University of Washington, Seattle and is a co-founder of a technology hardware start-up.
Diala Ezzeddine
Diala Ezzeddine is a product manager at DeepLearning.ai where she works on creating courses on Large Language Models with leading Generative AI startups and tech firms. She is a former data science VP at Tao media and Lecturer at Sacred Heart University, Seattle University, and the University of Washington Bothell. Diala holds a PhD in computer science specialized in machine learning from Lumiere University, France. Her passion for learning and lifting others up defines her career. She aims to elevate and inspire!
Madison Swain-Bowden
Madison is a Senior Data Engineer & former Team Lead out of Seattle and an avid Python user/organizer. She is currently sponsored by Automattic to work on the open source project Openverse, and has worked at Ookla (Speedtest.net), the Allen Institute for Cell Science, and the Broad Institute. In her spare time she can be found baking, building digital tools to help those battling oppression, contributing to open source, walking her dog, reading queer fiction, or playing video games.
Iswarya Murali
Iswarya Murali is a Principal Data Scientist at Microsoft, leading Generative AI and Machine Learning initiatives to empower customers and leadership. She has previously worked at Google to predict and mitigate credit card risk and fraud and was a member of an early stage analytics startup.
Bernease Herman
Bernease Herman is a data scientist at WhyLabs and a PhD student and research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics. Her academic research focuses on ethical evaluation metrics for machine learning.
Decoding Ethics:
Perspectives on Responsible Data Science
This panel will explore the range of ethical issues that occupy the data science terrain. The diverse group of experts will bring insights from both academia and industry to confront issues of data bias, sustainability, and individual rights. The discussion will delve into how societal biases have infiltrated cutting-edge applications of data science, and strategies to address these biases. Panelists will reflect on concerns regarding the industry’s impacts on the environment, privacy, and intellectual property, as well as how data science can be utilized for social good.
Rujira Achawanantakun
Rujira is a senior data scientist at Nordstrom, specializing in fraud detection within the Identity Trust Analytics team. With a background spanning over seven years in AWS networking and Intel Research, she brings a wealth of experience to her current role. Rujira holds a Ph.D. in computer science, with her research focused on bioinformatics. During her academic tenure, she adeptly applied machine learning and language models to tackle challenges in this field. Transitioning into industry, Rujira continues to leverage her expertise, applying these techniques to solve real-world problems across various sectors. Passionate about utilizing technology to drive positive change, she remains committed to advancing the field of data science and its applications.
Ariana Mendible
Ariana Mendible is an Assistant Professor at Seattle University, teaching in the MS Data and engaging in research in quantitative justice.
Alicia Shen
Alicia Shen is a machine learning scientist at Expedia where she builds machine learning applications to mitigate risk and fraud. Beyond her role of optimizing algorithm performance, she has also championed projects to make those ML applications more equitable and transparent, all while maintaining a keen eye on the bottom line. Before joining Expedia, Alicia received her PhD from University of Washington where she was a Data Science for Social Good fellow. She has published in journals such as Nature on topics such as gender disparities in politics and academia.
Sarita Singh
Sarita Singh works as an Associate Teaching Professor with Northeastern University, Seattle. With a PhD in Information Security, she has more than 25 years of work experience and has worked for universities and organizations in various countries across the globe. Her areas of research include Cybersecurity, AI and Computer Science education.
Landscape of Data Science Across Industries
Join us for a discussion of career insights and everyday experiences from talented women applying data science in various sectors. This panel discussion brings together a diverse group of seasoned data science professionals from big tech and beyond, including biomedical research, entertainment, and retail. Our panelists will shed light on the challenges and opportunities within their industries to innovate and solve problems with data science. Attendees will gain invaluable insights into commonly used tools and technical approaches, as well as glimpses into the daily roles of these data science experts.
Whether you're curious about the nuances of data science in different sectors, seeking direction for skill development, or pondering the next step in your career, this panel promises a broad exploration of the data science landscape, illustrated through the practical experiences of outstanding women in the field.
Rebecca Hadi
Rebecca is a Senior Data Scientist at Nordstrom working in the Digital AI space. She has over 10 years of experience in data science and analytics in the retail and healthcare industries. She holds a bachelor's degree in mathematics from the University of Washington, and a master’s degree in applied mathematics from Johns Hopkins University.
Apurvaa Subramaniam
Apurvaa is a Senior Data Scientist in the ads team at Spotify. Prior to Spotify, she worked at Instacart and Amazon where she worked in multiple teams on a variety of data science/analytics problems such as experiment design, product analytics, predictive modeling and causal inference in several different product areas such as advertising, supply chain and strategy. She has a Masters in Analytics from Northwestern University and a Bachelors in Computer Engineering from Nanyang Technological University, Singapore.
Shaili Guru
Shaili Guru is a principal product manager at T-Mobile with eight years of experience. She has a diverse educational background, including a Bachelor of Science in Biology and a Technology Management MBA. Before joining T-Mobile, she led product and innnovation teams at Nike and Disney focused on machine learning and computer vision products. Shaili is committed to making the technology industry more inclusive for women and underrepresented groups, and she serves as a product advisor at Greenscale and a board member at To The SHE Power, a non-profit organization providing mentorship and career services to women in transition.
Monica Gerber
Monica is a data scientist and public health professional interested in using open source tools to understand and promote population health. She began her career in data science after earning a MPH, focusing on biostatistics and epidemiology. She currently works in the Data Science Lab at Fred Hutchinson Cancer Center, where she leads the Translational Analytics team. In her free time, she’s devoted to rock climbing, bouldering, observing the wild turkeys in her front yard, and living a mostly unquantified life.
Sonakshi Pandey
Sonakshi Pandey is a Data Analytics Leader at Google Cloud. She has over 8 years of experience in the cloud computing industry, working for leading cloud platforms such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). She is a well-recognized thought leader in cloud computing and has published over 10 articles on various cloud solutions. She has also been featured as a Top Women in Cloud influencer for 2023 by Whizlabs. Her blogs on cloud technology have been published on various platforms including Dice, Thrive Global, AWS, and Google Cloud. In her current role at Google Cloud, Sonakshi helps organizations migrate and build their data platforms on the cloud. She has a deep understanding of the cloud computing landscape and helps organizations achieve their business goals using data analytics on Cloud. Sonakshi is a passionate advocate for women in technology. She is a mentor to many young women who are interested in pursuing a career in Cloud Computing. She has spoken at 10+ events and conferences on careers in cloud computing. Learn more about Sonakshi's journey and her work - https://www.thesonakshipandey.com/
Shruti Kamath
Shruti is the Senior Director of Machine Learning and AI at Mozilla, where she spearheads initiatives to incorporate Generative AI and machine learning functionalities into the Firefox browser. Her extensive experience encompasses leading and scaling machine learning and engineering teams, with a specialization in Recommender Systems, Personalization, and Search technologies. Shruti's industry expertise spans across various sectors including real estate, financial services, ecommerce, and browser technologies. Prior to her role at Mozilla, she held prominent positions as Director of Machine Learning at organizations such as PayPal, Zillow Group, and Chewy.