Unless you are the new messiah, all your life you’ll be searching for answers. It’s a bit like swimming in the sea with no direction towards the shore. This metaphor can also apply to DataFest, a data analysis competition to be held at Vassar in April where students will determine unexpected conclusions from a data set they’ll be seeing for the first time. They all come from a wide range of backgrounds but are all in the pursuit of the unknown. What’s true in life, is apparently sometimes also true in statistics.
From Friday, April 8 to Sunday, April 10, calculator keyboards will be crunching around the clock in Kenyon Hall for DataFest 2016. The event is organized by Associate Professor of Mathematics and Statistics Ming-Wen An and Assistant Professor of Mathematics and Statistics Monika Hu. Around 50 students will be divided into approximately 10 teams, competing to discover insights into a large unfamiliar dataset. At the end of the weekend, judges will determine prizes for the winning groups based on team presentations on their findings from this data. The prizes include data-related books and American Statistical Association (ASA) student memberships. DataFest is made possible by the generous support of the President’s Office and other offices and departments across campus as well as external sponsors, including Google and DataCamp.
Now in its first year at Vassar, DataFest started at UCLA in 2011. This year, there are 20 DataFest events being held at colleges and universities across the country.
“We wanted to bring this exciting data analysis event to Vassar students, so last fall we started brainstorming how to make it happen. We had both heard about DataFest from our colleagues at other schools and one of us (Monika Hu) was actually involved in organizing DataFest 2015 at Duke, so we had basic ideas about how DataFest works,” An and Hu said in an emailed joint statement. “DataFest gives our students the opportunity to practice their data analytic skills outside the classroom and use real-world data from industry.”
In the two weeks prior to DataFest, there will be three workshops covering R and MATLAB, two high-level computer programs that can be used for data analysis. On Friday, the first evening of DataFest, the dataset will be revealed to the students for the first time and over the next 48 hours the teams will work together to analyze the data. Past sources have included eHarmony and the Los Angeles Police Department.
Samantha Levy ‘16 is leading a workshop called “Graphics in R.” She explained the goal for the series: “Since the competition includes short presentations by each group, it is important that at least one member of each group, although preferably more, is comfortable creating polished graphics. I’ll be showing students how to use R’s built-in ‘base graphics’ to perform exploratory data analysis, which is typically the process of creating a visualization that allows one to quickly see some of the important aspects of a large dataset before beginning formal analysis or modeling.”
All spots for DataFest have been filled, but anyone can feel free to show up, especially for the final presentations on Sunday at 1 p.m. More information on the full schedule is on the DataFest Vassar website. The initial number of participants was originally much smaller but expanded due to the additional interest from students and funding received.
“In the past, we’ve hosted statistics-related events on campus, but these have typically involved inviting a guest statistician to give a lecture series or lead a workshop. DataFest is different in a number of ways, but most notably, in that student participation and real-world data are the primary focus,” Hu said about the unique opportunity DataFest provides. “Overall, participating in DataFest involves scientific reasoning, critical thinking, teamwork, communication and more. Such an experience is definitely worth highlighting to students’ future graduate schools and/or employers.”
Commenting on the unique experience DataFest provides, Levy replied, “I cannot think of any similar events that I have been involved in, however I believe that DataFest is different from other events in its interdisciplinary nature. DataFest is interdisciplinary both within departments at the school as well as with outside organization. It also allows for an opportunity for those from the community that are familiar with data analysis or data science to act as judges and consultants to teams of students.”
With the growth of technology and the expansion of information, Data Science is becoming more important than ever. Taking unknown data with no previous knowledge is essentially exploring life with no known path until the end.
For everyone involved, it will be a learning experience and a great opportunity for students and faculty who appreciate data to collaborate on what they love. “In general, I hope that everyone who participates in DataFest, myself included, will walk away with a better understanding and appreciation for data. More specifically, I hope that it will help people better appreciate the power, importance and reach of data. Having a basic understanding of data science, as it relates to data compiling, data visualization and data interpretation is becoming a vital skill in nearly every field. By participating in DataFest, we will be able to work on one of the many questions that Data Science is currently looking to address.”