Virtual Student Seminar

12:00 - 12:45 PM, Wednesday, September 22, via ZOOM

FCD Students Attitudes and Behaviors Survey Report Automation

Kate Hansen

My internship was at the Hazelden Betty Ford Foundation Butler Center for Research. As part of my internship, I was tasked with developing an automation procedure for FCD's Students Attitudes and Behaviors Survey (SABS) report. The SABS report offers insights into 6th-12th graders drug use as well as their perceptions of the drug use of their peers. I used Microsoft VBA for the development of these automation strategies and successfully implemented them for 23 of the 98 pages of the SABS report. The automation strategies I developed will provide a foundation for the Butler Center for Research to continue this work.



       Internship Experience at KLA

Emily Vodovnik



This past summer I had an opportunity to work as an intern for a company called KLA. KLA develops equipment and services for the electronics industry. I was able to work from home for this internship. I used Python and a few other programs for this internship. I had to manage/manipulate restricted data and create visualizations that were used to make improvements and save money for KLA. This internship was a great introduction into the professional world, and I enjoyed applying what I’ve learned in school.

Virtual Poster Session

12:00 - 12:50 PM, Wednesday, April 21, via Gather.Town

Poster #1

Name: Neil Callahan



Title: Predicting Success of College Running Backs in the NFL

Abstract:

This poster will look at relationship between National Football League (NFL) draft picks from National Collegiate Athletic Association (NCAA) football programs and the success of these players in the NFL.  For this project data was collected on running backs who were drafted from 2005 to 2020. The goal was to build a model to predict whether players would be successful in the NFL. I used four variables to predict the outcome. Various predictive models were built but ultimately the best model was a Naïve Bayes model that was around 80% accurate at correctly classifying busts and successes. Career yards turned out to be the most important factor in making predictions while BMI was the least important. The four distribution graphs compare the variables against the outcome and helped in making decisions about cutoffs when classifying the players as busts or successes.  

 

 

Poster #2

Name: Joe Kulas



Title: Will Minor League Baseball Players Make it to the Major Leagues?

Abstract:

Many minor league baseball players never make it to the majors, especially given that there are many more players in the minor leagues than there are spots available on major league rosters. The goal of this project was to use predictive modeling to investigate which factors predict whether current minor leaguers will make it to the majors in the future. I collected data on minor league baseball statistics for current and former professional baseball players. Using this data, I implemented multiple different prediction methods and used the misclassification rates to determine which model performed the best. The random forest model was found to be superior to the other methods. A few of the most important factors for predicting whether pitchers make it to the majors are strikeouts, games played and hits allowed, and batter’s games played, at-bats, and hits. Finally, this best model predicted that only about 120 of the thousands of current minor leaguers would make it to the majors in the future.

 

Poster #3

Name: Evan Rondeau



Title: Impacts of Data on Direct Marketing

Abstract:

Many businesses employ an analytics team to help them gain insight into industry trends and make decisions regarding workflow and revenue.  What benefits can this offer to a business that does not employ such a team?  My project will show the effect of a short-term internship and the effect this work had on a marketing campaign surrounding a webinar series.

 

Poster #4

Name: Thomas Veenker



Title: Analyzing and Predicting the Success of Reddit User Submissions

Abstract:

For this project, I examined user submissions to Reddit, a popular social news aggregation website, to determine what factors generated community approval and lead to higher visibility.  To obtain the data, I created a unique Reddit API, learned basic programming in Python, and taught myself how to web scrape Reddit in Python via the use of API wrappers.  After scraping 25,000 user submissions from Reddit, I analyzed the data to ascertain the effects of certain parameters (e.g., keywords, sentiment, length, submission time/date) on the “success” of a Reddit submission, created a regression model to predict said “success” of any user submission, and developed a general strategy to maximize the potential visibility of a user submission.  My research has promise for both advertisers and individual users who want to broadcast to a larger audience on Reddit. 

 

Poster #5

Name: Benjamin Winters



Title: eSports Predictive Analysis - A Study of Hearthstone Tournaments

Abstract: This poster will analyze and discuss how certain factors influence game outcomes in a tournament setting for the digital collectible card game Hearthstone. The main forms of analysis that will be used are logistic regression and decision trees in order to determine significant factors and to make predictive analysis. Features under consideration of analysis will be mainly in-game factors specifically geared towards players going first, concepts around mana, mana being the medium with which players can interact with the game, and different ways in which cards can influence the state of play. Finally, the outcome of interest with which the scope of this study will be viewed is the end result of games, that being winning or losing.


Poster #6

Name: Rebecca Barter



Title: Survival Analysis

Abstract: For my study, I was interested in looking into biostatistics and more specifically survival analysis. My main goal was to learn about the statistical methods that can be applied to survival data.  I obtained data that contained information on the heart failure of patients along with several other covariates that affected the length of survival for these patients.  I learned about and applied methods such as Kaplan-Meier and Cox Proportional Hazards.

 

Student Seminar

12:00 - 12:50 PM, Wednesday, April 14, via ZOOM

Data Engineering: Extract Transform and Load (ETL)

N’Dri Diby

Moving data from one place to another is an important step for a company that relies on its own data for decision making. Data are coming from different sources and it is necessary to bring data into one place to help businesses become more productive. Over the summer I worked for a financial technology company called Spave. As a Data Analytics Engineer intern, I built up a data pipeline to transport raw data, transform data per business logic, and load the data into the target database which enabled software engineers to display information in front of the app. In this presentation, I will go over the different steps I took to build the ETL data pipeline.

       Quality Assurance Analyst at Fastenal

Benjamin Garling

As part of the Quality Assurance team at Fastenal my focus has been working with the Contract Management team where we test the Contract Management application. The primary focus of my talk will be on what it is like to work for a big corporation, how testing programs makes you write better code yourself and some of the ways I have applied my classwork on the job. I will also speak briefly on some challenges that come with working for such a big company and ways to get around some of the hurdles.

Student Seminar

12:00 - 12:50 PM, Wednesday, April 7, via ZOOM

Predictive Modeling for COVID-19

Aaron Schram

My capstone featured a dataset from Kaggle that was created in order to test a Long Short-Term Memory (LSTM) Neural Network method for creating predictive models based off of B-cell data. The goal of the B-cell data was to use machine learning to determine a reliable method for predicting epitope regions antigens that these B-cells can map onto. For this project, I explored various methods of supervised learning to understand to process of creating a predictive model for a binary categorical response.

       National Hockey League (NHL) Data Analyses

Rochelle Ziemann

Predicting different performance statistics and finding differences amongst the players is becoming more popular in all sports. I chose to investigate the sport of hockey because it is my favorite sport to watch, plus Minnesota is considered the state of hockey. Included in this presentation will be analyses for differences between players and teams and finding what performance statistics best predict whether or not a team will make the playoffs.

Student Seminar

12:00 - 12:50 PM, Wednesday, March 24, via ZOOM

Graphics Reporting and Data Visualization

Yingshi (Dennis) Lew

Data visualization and effective data storytelling are key components in conforming to today’s increasing demand for clear and accurate information. This project highlights the importance of using data, research, and storytelling to shed light on social issues across the United States, as well as globally. For my exploration, I focused on using Tableau Prep and Python to tidy and explore the data in a systematic way through the means of exploratory data analysis. In addition, the outcomes were also visualized using Tableau and Python’s Seaborn package. Tableau and Python were compared regarding their efficiency and effectiveness of producing visualizations. The reporting method used for this project is a blog site that showcases both the visualizations and the outcomes. The intent behind this mode of graphical reporting is so that people can read and learn about the prevalent social issues. Throughout this project, the outcomes illuminate the racial and gender disparities embedded in the various social issues that were highlighted. Though much work remains to be done to alleviate these problems, more and more people today are using data to help spread awareness and offer data-driven solutions.

       Kriging and ArcGis Pro: A Geostatistical Introduction

Andrea (Dre) Lo Biondo

ArcGis Pro is one of the most common packages that are utilized in the GIS (Geographic Information Systems) field, offering plenty of tools to analyze and interact with data. Some insight about its main features will be provide, along with a brief description of kriging and its application to earthquake data.