This year our Conference will be held virtually due to COVID-19. There will be five Zoom tracks from where content will be running simultaneously. To see all the great content we have for you, read through the descriptions below and then visit our Schedule page to see how it all lays out and make your choices! Start your Registration here to get signed up for the Conference and learn about the ways you can network virtually! Once registered you will receive further updates by email and can always stay informed here at the website and through Data Circles social media channels.
General Stream
WiDS Ambassadors
Mahnaz Akbari
Bio:
Mahnaz studied Math and Computer Science and has over two decades of experience in IT in different data related roles. She has shifted towards Machine Learning and data science past several years. She strongly believes in women’s potentials and in a better world by acknowledging that. She founded Seattle Women in Data Science group in 2017 with the goal of providing a safe platform for women in data science to thrive together, also to promote this field among female techies. The overwhelming response to this group affirms the need for such groups.
Mengyuan Liu, PhD
Bio:
Mengyuan is a data scientist at SAP Concur where she works primarily on building machine learning engines to support one of the company’s core products: ExpenseIt, an app that allows automatic recognition of key fields in receipt images. Before joining Concur, Mengyuan obtained PhD in bioengineering from University of Washington, specializing in applying machine learning and computer vision technologies to medical imaging.
Shivani K. Patel
Bio:
Shivani is a Data Scientist at SAP Concur where she works on developing machine learning algorithms for the ExpenseIt feature. As of Fall 2019 she is also an adjunct lecturer at Northeastern University, Seattle campus, where she teaches probability and statistics for the Master's in Data Analytics Engineering Program. After completing a bachelor's degree in speech and hearing sciences, she pivoted and completed a post-bach degree in math and continued on to earn a master's in statistics. Shivani is passionate about equity in public education and supporting women in technology. She serves on the Renton Schools Foundation Board where she focuses on elementary STEM education and she is an Ambassador and Lead Organizer for the Women in Data Science Puget Sound Conference. In her spare time Shivani enjoys dancing with Jhimiki&Maatal (a Bharatanatyam dance team), curling up with a good book and watching Friends and Parks & Recreation on repeat.
Keynote Speaker
Rebekah Bastian
Author, Blaze Your Own Trail, CEO & Co-Founder - OwnTrail, Former VP Community & Culture - Zillow Group
Rebekah Bastian’s Keynote Presentation video from the Conference can be found here.
Rebekah Bastian is a writer, artist, tech executive, mentor, wife, mother and aerial acrobat. She has held leadership roles including vice president of product and vice president of community and culture at Zillow, and CEO & Co-founder of OwnTrail.
Rebekah built OwnTrail with the goal of creating a self-guided mentorship and coaching platform built from a collection of women’s life paths. Through micro acts of mentorship, women can inspire each other and create solidarity around our shared experiences. The result is a powerful tool for understanding the many paths to and from the major milestones in our lives, helping a diverse range of women see people that look like them in places they aspire to, and embracing the fact that there is no one right path. In line with this theme, Rebekah published her first book, Blaze Your Own Trail, in February 2020 with Berrett-Koehler Publishers.
Rebekah serves on the Board of Directors of Bellwether Housing and the Advisory Board for the University of Washington School of Mechanical Engineering. She is also an advisor to technology startups, a respected thought leader and community partner. She writes articles in multiple publications including Forbes.com and is a frequent speaker at conferences and community events. She has been recognized in the Puget Sound Business Journal 40 Under 40, the Inman 33 People Changing the Real Estate Industry and the Female Founders Alliance Champion Awards. Rebekah earned her Masters of Mechanical Engineering from UC Berkeley and Bachelors of Mechanical Engineering from the University of Washington.
Workshops Track A
Technical & Career Development
Kate Hertweck, PhD
Workshop: More than code: Professional assets in data science careers
The video from Kate’s workshop can now be found here.
Bio:
Kate Hertweck is the bioinformatics training manager at Fred Hutchinson Cancer Research Center, where they lead development and implementation of courses on reproducible computational methods through fredhutch.io and facilitate collaborative communities of practice through the Coop. Kate’s graduate training at University of Missouri in genomic evolution of plants was followed by a postdoctoral fellowship at the National Evolutionary Synthesis Center (NESCent) at Duke University, where they began working exclusively in computational biology. Kate then spent four years as an assistant professor teaching bioinformatics, genomics, and plant taxonomy before transitioning to biomedical research training. Kate has been involved in The Carpentries, a non-profit organization that teaches reproducible computational methods, since 2014, serving as a leader in community governance as well as instructor trainer. When not being an overenthusiastic instructor, Kate likes to spend her time doing fiber arts (knitting, crochet) and enjoying all things science fiction.
Abstract:
In a field like data science, it’s easy to focus on technical skills: lines of code, programming languages, algorithms, and data types. While it’s important to have proficiency at tasks related to these skills, it’s often other attributes that enable job satisfaction and advancement. This workshop focuses on identifying and developing these non-technical skills, such as communication, adaptabillity, and project management. These skills may represent previous educational and career achievements you can easily identify, like training in a specific scientific domain or experience as a manager. Other skills may be hidden and not as straightforward to articulate, such as planning and organizational capacity. We’ll use break-out groups and facilitated discussion to assess the skills you possess and those you’d like to develop, and help you connect them to your specific career goals. You’ll leave this workshop able to articulate the assests you already possess that complement your technical skills, as well as a plan to help you develop other non-technical skills that can aid in your career progression.
Alexandra Shumway
Sponsor Workshop: Data Science on AWS: Principal Component Analysis Workshop
Bio:
Alexandra Shumway is a Cloud Architect at 1Strategy where she’s the resident data/ML expert on her Seattle-based team. There, she takes pride in listening to customers’ needs and crafting well-architected, secure, and scalable solutions that help her customers achieve their goals. She’s worked on projects ranging from architecting, building, and hydrating data lakes, to building, training, and deploying models with Amazon SageMaker. She’s also worked on projects such as cost optimization for RDS, S3, and EC2, containerization of applications, Windows application migrations, development of Infrastructure as Code solutions, and evaluation of architecture through the Well-Architected Program. In addition to her architect work, Alexandra works to advance women in tech in a variety of contexts, from improving hiring practices to being involved with local women-in-tech groups. She particularly enjoys speaking at meetups on machine learning- and data-related topics, and is an active mentor/sponsor for other women on the 1Strategy team. Prior to 1Strategy, she received her master’s degree in Information Systems Management from Brigham Young University. When not at work, she hones her educational chops via teaching Sunday School at her church and is an active volunteer with Motley Zoo Animal Rescue. Aside from herding teenagers and cats, she also enjoys playing video games, hiking, and skiing in the mountains near Seattle
Abstract:
This workshop will include a short presentation about AWS’s data-related services. The hands-on portion will involve altering a Jupyter notebook to perform Principal Component Analysis. We will briefly cover what PCA is and why we are using it. The 1Strategy team will work through the workshop together with attendees and be there to help with any questions or issues that arise.
Emily Miller
Workshop: Actionable Ethics for Data Scientists
The video for Emily Miller’s workshop at the Conference can be found here.
Bio:
Emily Miller is a Data Scientist at DrivenData, where she helps mission-driven organizations leverage the power of data science and machine learning to maximize their impact. She is passionate about using data for social good and has previously worked at the Bill & Melinda Gates Foundation, Stanford Center for International Development, and Brookings Institution. She holds a master’s in International Development from The New School and a data science certificate from Metis.
Abstract:
It’s time to make data ethics more practical and actionable. In this interactive workshop, Emily will demonstrate how to use deon, a command line tool that allows you to easily add an ethics checklist to your data science projects. The goal of deon is to enable teams to integrate structured discussions of data ethics, and provide concrete reminders to the developers that have influence over how data science gets done. Emily will explain the rationale behind building an ethics checklist and walk through the content, illustrating with concrete examples the times where overlooking an item on the ethics checklist has caused harm. In stories of improperly hashed NYC taxi data, congressional distortions of Planned Parenthood data, and racial disparities in Amazon Prime delivery areas, she’ll cover a diverse set of issues that can come up in the course of data science work.
However, this isn’t just a story about what goes wrong. In the second half, participants will roll up their sleeves and dive in to the trade-offs and nuance as they navigate a set of data ethics scenarios. In a two-phased case study on public and private sector uses of personal health data, participants will practice working through the checklist and examining the ethical implications of their choices.
Come learn how to jumpstart the ethics conversation all data teams should be having.
Tech Talks Track A
Cecilia Aragon, PhD
The Hearts and Minds of Data Science
The video from Cecilia’s talk can be found here.
BIO:
Cecilia Aragon is Director of the Human Centered Data Science Lab, Professor in the Department of Human Centered Design & Engineering, Founding Co-Director of the University of Washington Data Science Master’s Program, and Senior Data Science Fellow at the eScience Institute at the University of Washington (UW) in Seattle. In 2016, Aragon was the first Latina to be named to the rank of Full Professor in the College of Engineering at UW in its hundred-year history. She earned her Ph.D. in computer science from UC Berkeley in 2004, and her B.S. in mathematics from the California Institute of Technology. Her research focuses on human-centered data science, an emerging field at the intersection of human-computer interaction (HCI), computer-supported cooperative work (CSCW), and the statistical and computational techniques of data science. She has authored or co-authored over 100 peer-reviewed publications and over 130 other publications in the areas of HCI, CSCW, data science, visual analytics, machine learning, and astrophysics. Recently, she and Katie Davis co-authored the book Writers in the Secret Garden: Fan-fiction, Youth, and New Forms of Mentoring (MIT Press 2019). Her memoir Flying Free: My Victory over Fear to Become the First Latina Pilot on the US Aerobatic Team will be released by Blackstone Publishing in September 2020. In 2008, she received the Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the US government on outstanding scientists in the early stages of their careers, for her work in collaborative data-intensive science. Aragon's research has been recognized with over $27M in grants from federal agencies, private foundations, and industry, and has garnered six Best Paper awards since 2004. Her interdisciplinary background includes over 15 years of software development experience in industry and NASA, and a three-year stint as the founder and CEO of a small company. She has also been a test pilot, aerobatic champion, and medalist at the World Aerobatic Championships, the Olympics of aviation.
ABSTRACT:
Extraordinary advances in our ability to acquire and generate data are transforming the fundamental nature of discovery across domains. Much of the research in the field of data science has focused on automated methods of analyzing data such as machine learning and new database techniques. However, the human aspects of data science, including how to maximize scientific creativity and human insight, how to address ethical concerns, and the consideration of societal impacts, are vital to the future of data science. Human-centered data science is a necessary part of the success of 21st century discovery. I will discuss promising research in this area, describe ongoing initiatives at the University of Washington in Seattle, and speculate upon future directions for data science.
Smrati Gupta, PhD
Design Principles for Personalization with Ethics
The video from Smrati’s talk can be found here.
Bio:
I am the leading Data Scientist at Microsoft Xbox driving the personalization and Recommendation engines for Microsoft Store and all Xbox Surfaces. I have about 7 years of experience in designing recommendation engines in different domains like enterprise software, cloud service selection besides Gaming. In addition, I hold the experience to lead Academic and Industrial consortiums in Horizon 2020 projects funded by the European Union to build secure multi-cloud applications that rely on AI-driven Decision Support Systems. I am a regular public speaker in universities and Industry.
Abstract:
We are always talking about the cool stuff that AI can do to change the world, but there is a strong element of AI that rests within the realms of the humans who create it. We foster our biases, our limitations, our understandings into AI-driven features in our products, making these products as biased, noninclusive and unfair. Since the world relies on these products, we face the domino effect of biased AI fostering our society with biases. This talk aims to throw some ways in which we need to make tangible and conscious efforts towards ensuring the User experiences are not driven by a biased mindset.
Widad Machmouchi, PhD
6 Lessons In 6 Years as a Data Scientist at Microsoft
The video from Widad’s talk can be found here.
Bio:
Widad Machmouchi is a Principal Data Science Manager in AI Platform at Microsoft where she works in the A&E group focusing on A/B experimentation and success measurement. Widad develops tools and techniques that enable teams to make data-driven decisions, like A/B experimentation, metric development, and user behavior modeling. She works with multiple teams like Bing, VSCode and Azure Machine Learning, applying these techniques to drive user growth. Widad holds a PhD in Theoretical Computer Science from the University of Washington, Seattle and is a co-founder of a technology hardware start-up.
Abstract:
In this talk, I share some of the lessons I have learned as a data scientist in Bing and AI Platform, working on measurement and A/B experimentation. I discuss how to build your technical skills as an data scientist, how to communicate and work with partner teams, and how to grow your career. I provide some tips and tools to achieve your goals, along with what I could have done better in the past few years.
Kasia Rachuta
First Steps to Transition from SQL to Pandas
The video from Kasia’s talk can be found here.
BIO:
Kasia works as a product analyst at Square; she previously worked at Medium and Fivestars. She has a master’s in theoretical physics from University College London. Kasia is entirely self-taught and learnt data science skills through a combination of online courses and internships. In her spare time, she enjoys volunteering for women-related organizations and diversity causes, scuba diving and traveling. Kasia is also a San Francisco PyLadies organizer.
Abstract - First Steps to Transition from SQL to Pandas:
This talk will discuss my experiences of trying different options when wanting to use both SQL and pandas: the SQL Jupyter extension, Python SQL module, connecting to a database through Python and finally, ‘translating’ all calculations into pandas. I will touch on the advantages and disadvantages of all of these methods and I will then dive deeper into slicing and dicing pandas DataFrames, performing joins, unions, aggregations and more advanced calculations such as window functions and rolling averages. All of the calculations shown will use pandas, one of the most common data science libraries. Attendees will gain an overview of the ways of data munging and SQL plus pandas solutions, such as using the Jupyter extension, connecting to a database through Python or performing a number of operations in pandas. In particular, they will learn how to perform the most commonly used SQL functions in pandas, such as joins and unions, aggregations as well as more complicated calculations such as rolling averages or window functions.
Allison Sliter
Presenting Data to Non-analysts: How to Make an Impact on All Kinds of Audiences
The video from Allison’s talk can be found here.
BIO:
I spend my days working as a data scientist and my nights arguing with my superlative daughters or planning PyData PDX meetups and networking lunches for women in data science. I knit, I run, I read science fiction novels and when the sun finally comes out, I ride my bike.
Abstract:
Data scientists work hard to develop our skills to uncover the secrets in data. Too often, though, that distances us from our audience’s perspective. Using lessons from psychology, journalism, and even comparative literature, this talk will show you how to cut through the jargon and make an impact. It will discuss some specific presentation design principles that can make sure you get through to your busy, sometimes distracted audience of engineers, domain experts, management, and the folks from sales and marketing so they can act on your insights. I draw from psychology, linguistics, journalism, and even Joseph Conrad to deliver specific suggestions to make sure that when your audience leaves after your presentation, they are armed with the right conclusions and can make the best decisions for your organization.
Nazli Dereli
Adversarial Attacks: A Real Threat to Our Machine Learning Systems
The video from Nazli’s talk can be found here.
BIO:
Nazli Dereli is an experienced Data Scientist with a demonstrated history of building end-to-end data products. She worked on real-time classification of users and detection of abusive actors in Abuse Prevention team at Ticketmaster for 5 years. In this position, she heavily focused on adaptive abusive behaviors, adversarial attacks and design of evolving ML systems to fight back against bots, brokers, automated systems, ticket scalpers etc. The data products she worked on are being used to protect music fans during ticket sales by Hamilton, Taylor Swift, Ed Sheeran, Twenty One Pilots, Bruce Springsteen and many more. She is a keen learner and published researcher with a M.S. in Computer Science from UC Santa Barbara. Currently exploring new meaningful areas of interest for her data science career while investing in her writing career.
Abstract:
Autonomous vehicles confusing stop signs with yield signs or authentication systems mistakenly giving access to malicious attackers... Adversarial attacks can trick any state-of-the-art ML system to seriously compromise our security. We'll discuss these attacks and solutions with real-world examples.
Catherine Nelson, PhD
Practical Privacy-preserving Machine Learning
The video from Catherine’s talk can be found here.
BIO:
Catherine Nelson is a Senior Data Scientist for Concur Labs at SAP Concur, where she explores innovative ways to use machine learning to improve the experience of a business traveler. Her key focus areas range from ML explainability and model analysis to privacy-preserving ML. She is also co-author of the forthcoming O'Reilly publication “Building Machine Learning Pipelines", and she is an organizer for Seattle PyLadies, supporting women who code in Python. She has been recognized as a Google Developer Expert in machine learning. In her previous career as a geophysicist she studied ancient volcanoes and explored for oil in Greenland. Catherine has a PhD in geophysics from Durham University and a Masters of Earth Sciences from Oxford University.
Abstracts:
What if we could build accurate machine learning models while preserving user privacy? There’s a growing number of tools to help, from federated learning to encrypted ML. In this talk, I’ll review what works, what doesn’t work, and where these tools fit in a machine learning pipeline.
Deveeshree Nayak
What is Security in Data Science?
The video from Deveeshree’s talk can be found here.
Bio:
I am passionate about teaching CyberSecurity subjects especially the Security perspective of Data and Information Quality aspect of it. I began my career as an Information Security analyst and has been involved in various roles related to Cyber Security before joining UW Tacoma. I grew up in India before moving to the U.S. for further studies in Information Systems and Criminology from the University of Memphis in Tennessee. I have been a member of Anita Borg Institute, IEEE, Women in Engineering, Women in Cybersecurity and Women in Data Science. I encourage and help people to pursue their careers in the STEM field.
Abstract:
In this talk, I will be focusing on the importance of Cyber Security in data science. As we all know data is power in the present time and with data we have the potential to predict our future. Data security in data science plays a vital role and we require Cyber Security practitioners who have solid domain knowledge on data risk assessment, vulnerability management, network security, pen-testing, identity management, and other subject knowledge of information security. In this talk, attendees get to learn the security perspective of data and how they can pursue a career in security while continuing their passion for data science.
Workshop Track B
Career Development
Arushi Prakash, PhD
Workshop: Crafting a Compelling Data Science Resume
The video for this workshop can be found here.
Bio:
Dr. Arushi Prakash is a data scientist at Zulily.com, an e-commerce company that sells clothing, footwear, toys, and home products, based in Seattle. At Zulily, she helps build recommender systems that power the website and email marketing campaigns. She entered the field in 2019, after finishing a doctorate degree in Chemical Engineering from the University of Washington, Seattle.
Abstract:
Whether you are applying for positions in data science, analytics, or engineering, you have likely created a resume. In our experience, technical resumes like these tend these often lose the bigger picture – the business value that you created, the passion that you brought into the project, or your leadership that steered projects in the right direction. In this workshop, we will help you strike the right balance between relevant technical skills and storytelling while steering clear from resume writing mistakes. So, join us with a copy of your latest resume, a pen, and an open mind!
Jennifer Hay
Workshop: Crafting a Compelling Data Science Resume
Bio:
Jennifer Hay writes resumes, LinkedIn profiles, and cover letters for a broad range of IT professionals, using a collaborative and iterative process that starts with storytelling. While I believe that the end solution is always important, I also like to hear about the journey - basically, the good, the bad, and the ugly. In those stories, I often find the unique characteristics and strengths that distinguish my clients. My background is in data and information management and I stay current by writing exams for eLearningCurve. If there is a subject area about data, information, or analytics, then I’ve probably written an exam. Data Geeks Unite!
Abstract:
Whether you are applying for positions in data science, analytics, or engineering, you have likely created a resume. In our experience, technical resumes like these tend these often lose the bigger picture – the business value that you created, the passion that you brought into the project, or your leadership that steered projects in the right direction. In this workshop, we will help you strike the right balance between relevant technical skills and storytelling while steering clear from resume writing mistakes. So, join us with a copy of your latest resume, a pen, and an open mind!
Liz Martinez
Workshop: Presenting Your Best Self In Your Job Search
The video from this workshop can be found here.
Bio:
Prior to her current role as the Career Services Manager at Galvanize, Liz started her career as a technical recruiter and an avid appreciator of technology. Upon moving to New York, she created and managed career development programs for two companies, in the fields of health care and financial technology. At Galvanize, she aids students and alumni of the Data Science and Hack Reactor programs throughout all stages of the job search. Liz is enthusiastic to help others become their own biggest fans. Outside of work, she is an eager crafter, dog mom, and Zumba teacher.
Abstract:
The job search entails much more than refining your technical skills. It is your job to sell those skills, in addition to your goals & your personality. This workshop will focus on putting your best foot forward via behavioral interviewing and your career portfolio, specifically GitHub and LinkedIn. Liz will offer advice for enhancing your profiles on LinkedIn and GitHub, in addition to providing rubrics to evaluate your profiles. For each platform, she will demonstrate how to use the rubric by grading the profile of a volunteer from the audience. Next, she will discuss strategies for behavioral interviewing, and how to prepare for any question you might be asked. She will demonstrate preparation approaches with the help of another volunteer. Liz will set aside time to answer questions after each topic that is covered. **While we listed the audience as ‘all’, we want to clarify that this will not include evaluating the coding aspects of Github, but rather the presentation and clarity.
Trupti Shah, Moderator
NEU Panel: Navigating the Different Fields of Data: A Higher Education Perspective
The video from this panel discussion can be found here.
Bio:
Trupti Shah is a Strategic Analytics Associate at Northeastern University Seattle where she works on different analytics projects to help the leadership make data-driven decisions. She recently graduated with a master’s in Data Analytics from Northeastern University Seattle majoring in Evidence-Based Management and holds a bachelor’s in Computer Engineering from the University of Mumbai. Prior to pursuing her Master’s, she had the opportunity to work in Workflow Architect and Developer roles. As someone with a keen eye for details and a passion for data, she thrives on turning data patterns into business solutions. She is passionate about performing deep-dive analyses to identify emerging trends, pain points, and opportunity areas in customer experience (CX) that influence decision making and business optimization.
Description:
The panel will focus on identifying the different definitions of data analytics and discuss the need for a diversity of skills in data; from data collection to data engineering to statistical analysis to computational design to algorithm development. The all-women panel consists of instructors from three graduate data-related programs at Northeastern University – Seattle and will discuss the differences between data science, data analytics and business analytics. Our panelists are leading experts in the data field and have been invited to share their experience and educational journey in data analytics.
Amanda Welsh, PhD
NEU Panel: Navigating the Different Fields of Data: A Higher Education Perspective
Bio:
Dr. Amanda Welsh is a Professor of the Practice to the Analytics & Enterprise Intelligence Domain. In addition to teaching, she focuses on further expanding our close collaboration with our industry partners in several different industries and serving as Faculty Director for the Leadership and Project Management programs in campuses across the University.
Prior to joining Northeastern, Dr. Welsh served for 25 years in the intersection of big data and media, founding two data-driven start-ups, Integrated Media Measurement Inc. (IMMI) and Garageband.com, as well as working as a Media Research Scientist at Google, and most recently as EVP, Data Science for The Nielsen Company where she designed and ran a global data-sharing program. Dr. Welsh has published numerous articles on data collection including a book on consumer data tracking and privacy.
In addition to her business responsibilities, Amanda is active in the non-profit world as Executive Director for The Foundation for Scholarly Culture and served on the Board of Directors for Raising a Reader, a national literacy/family engagement program for 10 years. She earned her Ph.D. in linguistics from Harvard University.
Adrienne Slaughter, PhD
NEU Panel: Navigating the Different Fields of Data: A Higher Education Perspective
Bio:
Adrienne Slaughter is an Assistant Clinical Professor in the Khoury College of Computer Sciences at Northeastern University-Seattle. Prior to joining the faculty at Northeastern, Dr. Slaughter worked at multiple startups as both a data and software scientist. Adrienne became engaged in data science through her work with Personal Health Informatics: studying how people interact with analytics about their personal health.
Shivani K. Patel
NEU Panel: Navigating the Different Fields of Data: A Higher Education Perspective
Bio:
Shivani is a Data Scientist at SAP Concur where she works on developing machine learning algorithms for the ExpenseIt Product. She holds two bachelor’s degrees (speech and hearing sciences and math), as well as a master’s in statistics from Oregon State University. Shivani is passionate about equity in public education and she is a member of the Renton Schools Foundation Board where she focuses on supporting an equitable curriculum of elementary STEM education. Shivani believes that movements like WiDS are invaluable in supporting the career development of women in technology which is why she is a Regional Ambassador for the WiDS conference.
Tech Talk Track B
Weikun Hu
Receipt Classification Using Word Embedding Models (Natural Language Processing)
The video of Weikun’s talk can be found here.
BIO:
Weikun Hu is a data scientist intern at SAP Concur where she works on building machine learning framework for Expenselt product, which allows automatic recognition of key fields in receipt images. Currently, she is a master student in applied mathematics at University of Washington. She holds a bachelor’s degree in mathematics and statistics.
Abstract:
The data science team at SAP Concur is responsible for the machine learning infrastructure for Expenselt product, which is a mature product and being pushed to more international markets. In this talk, I will go through a project focusses on OCR text in foreign languages (mixed English and foreign language), and the specific challenges of natural language processing faced in production environment.
Rachel Wagner-Kaiser, PhD
Teaching Computers to Read: Natural Language Processing and Deep Learning Techniques for Parsing Documents
The video of Rachel’s talk can be found here.
Bio:
Rachel received her PhD in astronomy examining chemical differences in ancient star clusters living in the nearby universe, combining the power of the Hubble Space Telescope and Bayesian statistics. After graduation, she joined KPMG Digital Lighthouse, where she has worked as a consultant and data scientist since 2017. She specializes in using natural language processing and deep learning to help companies unlock their unstructured data to solve a variety of business problems and drive value through automation. She loves to travel, eat good food, and hike cool new places (and ideally, all three at once).
Abstract:
You have a million contracts scanned and stored on your company server from decades of doing business. To prove compliance, you need to know the termination clause, renewal terms, and expiration date for each of those million documents. What are your options? You could hire 100 people to each read 50 contracts a day for a year – or, teach a computer to read the documents for you! Companies often struggle to automate this process and transform their thousands or millions of documents into tangible benefits. I will discuss the challenges of extracting information from documents as well as strategies to overcome them, such as custom word embeddings, sequence labeling, B-I-O tagging, and bi-directional LSTM model architecture. With effective sampling techniques and data augmentation, the required human effort can be minimized to obtain a sufficient sample size and create performant models that unlock value.
Meghamala Sinha
Causal Inference from Experiments and Observations
The video of Meghamala’s talk can be found here.
BIO:
Meghamala Sinha is a PhD candidate at Oregon State University. She is majoring in Computer Science and minoring in Biological Data Science. Her research interest is Causal Inference and its application to data-driven areas like Machine Learning, AI, Intelligent Systems and Computational Biology. Her work centers around using fundamentals of Causality to differentiate true cause-effect relationships from mere associations in data and building a more robust and reliable inference model.
Abstract:
Causal Inference is an important paradigm for data analysis in the fields of medical science, economics, engineering, humanities etc due to its utility in action planning, diagnosis, predictive applications. To increase statistical power for learning a causal network, data are often pooled from multiple observational and interventional experiments. However, if the direct effects of interventions are uncertain, multi-experiment data pooling can result in false causal discoveries, losing the very purpose of its application. For example, in medical science, a false positive result giving an erroneous indication that a particular disease is present (when it isn’t) can result in unnecessary medical tests and panic. To resolve this issue, I will discuss a novel data integration method, “Learn and Vote” to combine information from multiple interventional experiments with observations to learn more accurate causal networks which reduces the detection of false positives.
Wen Qin
How to Run a Trustworthy Online Controlled Experiment and Get Insights?
The video of Wen’s talk can be found here.
Bio:
Wen Qin is a Data Scientist on Microsoft's Analysis and Experimentation team for 2 years, focusing on A/B testing. She mainly works with Microsoft Teams on scaling trustworthy experiments to build the culture for experimentation. She also works on several areas to improve trustworthiness of experiment in general, such as checklists for experimentation, metric design, sample ratio mismatch. Prior to Microsoft, she spent half a year at Wayfair as a Data Scientist Intern, working on recommendation models for the personalization of marketing emails.
Abstract:
How much can a feature boost the revenue? Will a feature hurt product performance? Online controlled experiment (a.k.a. A/B Testing) helps answer the critical questions. However, doing it correctly is challenging. If you search on the internet or talk with an expert, you can find many tips about how to run an experiment. Experiment starters can easily get confused about what steps to take. Advanced experimenters may have a list to go through, but if there are critical check points missing, it can lead to invalid results and incorrect decisions. I will talk about the checklists for running trustworthy experiments. The work is based on the experience of my team with more than 10 years focusing on experimentation and collaborating with majority of Microsoft products to resolve real-world problems
Melissa Santos, PhD
Time-to-Event Analysis for Non-Medical Applications
The video of Melissa’s talk can be found here.
BIO:
Melissa has been working with computers and data since 2000, in fields from security to marketing to geography. She has a PhD. in Applied Math and considers herself both a statistician and a data scientist. Currently, she is a data analyst at Pingboard, helping understand the customers and how they use the product.
Abstract:
How do you estimate the time until an event, especially if the event might never happen? The statistical methods for this come from studying time from disease diagnosis to death, but we can use these methods for much more cheerful data. For example, how long does a subscription customer continue to pay you? How long does it take from someone commenting on your open-source code to becoming a contributor? How long does it take from the user being seen the first time to them becoming a paid customer? Kaplan-Meier survival curves are non-parametric estimates of the time to an event. They make no assumptions about the distribution of the time to the event, and they handle samples of various ages that may or may not have made it to the event. As well as the theory of these, we’ll dive into how to calculate them directly in SQL. To finish, I’ll share some ways we’ve been using Kaplan-Meier curves to make decisions at a Software as a Service company, especially using them to compare groups.
(Leah) Aria Fredman, PhD
Using All(-ish) Data: Validating your Data Usage Decisions
The video of Aria’s talk can be found here.
Bio:
Aria is a senior data scientist at Gideon Health, a startup in stealth mode, where she works at the intersection of product and UX to develop technology improving people's lives. Before joining Gideon, Aria worked as a data scientist at iSpot.tv, when she helped determine the efficacy of television advertisements. Aria completed her social (experimental) psychology doctoral dissertation on the retention and mental health of online gamers; she has years of experience in designing, implementing, and analyzing experiments, and she has utilized both statistical and data science methodologies to deliver insights.
Abstract:
This talk focuses on the space between exploratory data analysis and modeling, concentrating on validating decisions to add and/or delete data. Emphasizing casual inferencing in the absence of randomization, the talk will examine how and why this situation may lead to needing noisy data and matching methodologies, as well as some potential pitfalls to avoid.
Gwen Spencer, PhD
Network Science: From Beautiful Mathematics to Driving Real-World Decisions
The video of Gwen’s talk can be found here.
Bio:
Gwen is an Operations Research Scientist at Convoy and a returning Seattle native. After a math major at Harvey Mudd College, Gwen earned her PhD in Operations Research at Cornell. During her time as an academic, Gwen's research program bridged applied and pure topics in Mathematical Modeling, Algorithms, Data Science, Stochastic Optimization, Network Science, and Theoretical Computer Science. After two years in an interdisciplinary postdoc (joint between environmental economics and computer science), Gwen was on the Mathematics and Statistics faculty at Smith College for 4.5 years. Smith is a women’s college in MA where 40%+ of students have at least one major in STEM. Gwen has had an awesome transition to industry. She feels lucky to have found an early-stage startup with a lot of high-ownership opportunities and the ability to contribute to fundamental problem formulation.
Abstract:
Networks provide a powerful modeling tool to capture spatial heterogeneity and connectivity, and challenges become even more meaty when uncertainty is in the mix. At Convoy, I create algorithms to maintain a balanced flow of long-haul trucks that is crucial to sustaining supply chains in North America. Moving from a clean mathematical model to an automated real-time system that eats noisy data for breakfast has been an awesome journey. I’ll motivate what is hard about our rebalancing problem (e.g. where we have to make high impact decisions with partial information) and mention contrasts with other balancing/rebalancing problems like bikeshare (e.g. Jump, Lime) and carshare (e.g. Uber, Lyft).
Victoria Hunt, PhD
Simulation of the US Electric Grid for Renewable Energy Integration
The video from Victoria’s talk can be found here.
Bio:
Victoria Hunt, PhD, is a data scientist for the Clean Energy team. In this role, she researches and implements simulation and analysis methods for the team’s US grid simulation framework. She is keenly interested in policy, and in supporting renewable energy policy though data visualization and data storytelling. Victoria’s passion for policy is also reflected in her pursuits outside of her role on the Clean Energy Team; she currently is a city council-member for the city of Issaquah, and in this role serves on several regional boards and commissions.
Abstract:
Please join me for a whirlwind tour of how the US electricity sector works, how we model it with high temporal and spatial resolution, and how we analyze our findings. I will present on a highly detailed and realistic simulation of the US electric grid, which we use for exploring strategies to integrate renewable energy under future conditions. Our model is open access and exclusively uses publicly available data from multiple sources. I will provide an overview of our algorithms used to mimic power system operation, optimizing generation and dispatch of electricity and minimizing costs at hourly time intervals across an 82,000 node system. I will also share a preview of our in-development web interface, which will serve as a flexible and customizable tool for policymakers to quantitatively study energy policy impacts, and will include a full set of research-grade features for engineers and researchers