Introduction - If you have any usage issues, please Google them yourself
We analyze Hadoop workloads from three different research clusters from an application-level perspective, with two goals: (1)
explore new issues in application patterns and user behavior and (2) understand key performance challenges related to IO and
load balance. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions,
and tools as well as significant opportunities for optimization. We see significant diversity in application styles, including some
“interactive” workloads, motivating new tools in the ecosystem.