Students given opportunity to work with real world data

For the second time ever, from Friday, April 8 to Sunday, April 10, DataFest 2016 will be going on in Kenyon. Around 50 students will compete to discover insights into a large unfamiliar dataset. Photo courtesy of Vassar College
For the second time ever, from Friday, April 8 to Sunday, April 10, DataFest 2016 will be going on in Kenyon. Around 50 students will compete to discover insights into a large unfamiliar dataset. Photo courtesy of Vassar College
For the first time ever, from Friday, April 8 to Sunday, April 10, DataFest 2016 will be going on in Kenyon. Around 50 students will compete to discover insights into a large unfamiliar dataset. Photo courtesy of Vassar College

Unless you are the new messiah, all your life you’ll be searching for answers. It’s a bit like swimming in the sea with no direction towards the shore. This metaphor can also ap­ply to DataFest, a data analysis competition to be held at Vassar in April where students will determine unexpected conclusions from a data set they’ll be seeing for the first time. They all come from a wide range of backgrounds but are all in the pursuit of the unknown. What’s true in life, is apparently sometimes also true in statistics.

From Friday, April 8 to Sunday, April 10, cal­culator keyboards will be crunching around the clock in Kenyon Hall for DataFest 2016. The event is organized by Associate Professor of Mathematics and Statistics Ming-Wen An and Assistant Professor of Mathematics and Sta­tistics Monika Hu. Around 50 students will be divided into approximately 10 teams, compet­ing to discover insights into a large unfamiliar dataset. At the end of the weekend, judges will determine prizes for the winning groups based on team presentations on their findings from this data. The prizes include data-related books and American Statistical Association (ASA) student memberships. DataFest is made possi­ble by the generous support of the President’s Office and other offices and departments across campus as well as external sponsors, in­cluding Google and DataCamp.

Now in its first year at Vassar, DataFest started at UCLA in 2011. This year, there are 20 DataFest events being held at colleges and uni­versities across the country.

“We wanted to bring this exciting data anal­ysis event to Vassar students, so last fall we started brainstorming how to make it hap­pen. We had both heard about DataFest from our colleagues at other schools and one of us (Monika Hu) was actually involved in organiz­ing DataFest 2015 at Duke, so we had basic ideas about how DataFest works,” An and Hu said in an emailed joint statement. “DataFest gives our students the opportunity to practice their data analytic skills outside the classroom and use real-world data from industry.”

In the two weeks prior to DataFest, there will be three workshops covering R and MATLAB, two high-level computer programs that can be used for data analysis. On Friday, the first eve­ning of DataFest, the dataset will be revealed to the students for the first time and over the next 48 hours the teams will work together to ana­lyze the data. Past sources have included eHar­mony and the Los Angeles Police Department.

Samantha Levy ‘16 is leading a workshop called “Graphics in R.” She explained the goal for the series: “Since the competition includes short presentations by each group,  it is important that at least one member of each group, although preferably more, is comfortable creating pol­ished graphics. I’ll be showing students how to use R’s built-in ‘base graphics’ to perform ex­ploratory data analysis, which is typically the process of creating a visualization that allows one to quickly see some of the important as­pects of a large dataset before beginning formal analysis or modeling.”

All spots for DataFest have been filled, but anyone can feel free to show up, especially for the final presentations on Sunday at 1 p.m. More information on the full schedule is on the DataFest Vassar website. The initial number of participants was originally much smaller but expanded due to the additional interest from students and funding received.

“In the past, we’ve hosted statistics-related events on campus, but these have typically involved inviting a guest statistician to give a lecture series or lead a workshop. DataFest is different in a number of ways, but most nota­bly, in that student participation and real-world data are the primary focus,” Hu said about the unique opportunity DataFest provides. “Over­all, participating in DataFest involves scientific reasoning, critical thinking, teamwork, com­munication and more. Such an experience is definitely worth highlighting to students’ fu­ture graduate schools and/or employers.”

Commenting on the unique experience DataFest provides, Levy replied, “I cannot think of any similar events that I have been involved in, however I believe that DataFest is different from other events in its interdis­ciplinary nature. DataFest is interdisciplinary both within departments at the school as well as with outside organization. It also allows for an opportunity for those from the community that are familiar with data analysis or data sci­ence to act as judges and consultants to teams of students.”

With the growth of technology and the ex­pansion of information, Data Science is becom­ing more important than ever. Taking unknown data with no previous knowledge is essentially exploring life with no known path until the end.

For everyone involved, it will be a learning experience and a great opportunity for stu­dents and faculty who appreciate data to col­laborate on what they love. “In general, I hope that everyone who participates in DataFest, myself included, will walk away with a better understanding and appreciation for data. More specifically, I hope that it will help people bet­ter appreciate the power, importance and reach of data. Having a basic understanding of data science, as it relates to data compiling, data vi­sualization and data interpretation is becoming a vital skill in nearly every field. By participat­ing in DataFest, we will be able to work on one of the many questions that Data Science is cur­rently looking to address.”

Leave a Reply

Your email address will not be published. Required fields are marked *