Strata Data Conference 2017

T4G’s Umair Khan visits the Big Apple to learn about the latest in all things Big Data.

Every year, O’Reilly and Cloudera host the Strata Data Conference in a number of cities around the world, including New York City.

The conference is nearly a week long and includes training sessions, tutorials, case studies, keynotes, and an expo for leading innovators. The topic? Big Data and how to harness it in every industry vertical.

This was my first time attending the Strata Data Conference. Held in a huge venue with an almost overwhelming number of attendees, it was a great experience.

Brand Logos

On the first day, I attended FinData Day, where a bunch of FinTech companies talked about using machine learning and deep learning to disrupt the financial industry in United States. Their talks ranged from machine learning ethics (like how not to “play” or spoof the market) to laws, regulations, and consequences if you or the algorithm you wrote gets caught spoofing the system.

The next day, keynotes were thought provoking. They were a testament to how far we have come from systems of record (databases) to using data science to help solve complex problems like the preservation of ecosystems, climate change, and helping endangered species. For example, one of the talks was around how machine learning and AI is used to sift through thousands of images every day to detect and tag the endangered snow leopard in Himalaya range.

You can watch any of the keynote addresses on YouTube (and I highly recommend you do!).

New Tools for Visualization

In between sessions and whenever I had time, I visited booths where vendors were showcasing amazing new tools and technologies.

Two of the most fascinating technologies demonstrated at Strata were Zoomdata and Looker.

Zoomdata touts itself as the world’s fastest visual analytics solution for big and streaming data, with the ability to consume billions of rows in seconds. Some of the demos and benchmark they showed looked promising. Gartner has also just named them a Visionary in their Magic Quadrant for BI & Analytics Platforms. This is certainly a tool I want to explore more.

Looker is a new data analytics platform that gives users the ability to discover and share data. What I found cool about this platform was the wide catalog of charts, sleek reports and fast dashboards. The best part was the ability to use D3 charts within Looker, which are amazing.

The Birth of the ML / DataOps Engineer

A key role in the Big Data space that a lot of people talked about and something that was new for me at Strata was the role of Machine Learning Engineer (ML Eng) or Data Operations (DataOps) Engineer.

I talked to a few developers in various breakout sessions to understand more about this role and how it gets defined. Interestingly, many people don’t understand the specific responsibilities of this role and think that it’s simply a data engineer who can code or data scientist who can debug some code. It’s not quite as neat and simple as that, but there’s definitely a grey area of overlapping responsibilities between data engineers and data scientists.

I will define this role as:

An engineer who can code, debug and optimize machine learning models as well as work in refining, enriching, and profiling the data pipeline.

Some of the responsibilities of an ML Engineer are:

  • Refine/clean the data provided to the machine learning algorithm
  • Add new features to the model
  • Deploy the model across different environments
  • Monitor the model and data
  • Debug the machine learning model in case of exceptions

If the points listed above describe your day-to-day tasks, then you’re probably a Machine Learning Engineer!

Data Ops Brings Flexibility Focus

Ref [Machine Learning Success: The Key to Easier Model Management by Ellen Friedman]

Final Thoughts

While a trip to New York is always fantastic, it’s super fantastic when you get to meet smart people and learn about new technologies. It was a great experience to hear more about advancements made in the area of Big Data AI and data science, since nearly every industry vertical is investing and adopting data science to solve large scale problems.

It was exciting to see some of the great work being done in the data science and advance analytics space. And some of it left me feeling like I was stepping out of the DeLorean!


Umair Khan

Umair is a Senior Consultant at T4G for everything data, including Hadoop, Big Data, and related technologies. His expertise span from architecting multi-node Hadoop clusters to building batch/real-time data pipelines. He loves helping businesses drive valuable insights from data using advance analytics, data science, and common sense.

When not at work, Umair loves spending time with his family. When not making progress on his  “100 books to read in a lifetime” project, Umair can be found attending various Toronto user groups on the newest and coolest technologies.