Big Data | Business Review Webinars

William Vambenepe leads the product management team responsible for Big Data services on Google Cloud Platform (BigQuery, Dataflow, etc.) William was previously an Architect at Oracle and before that a Distinguished Technologist at HP.

He holds an engineer degree from Ecole Centrale Paris, a Diploma in Computer Science from Cambridge University and a Master of Science in Engineering Management from Stanford University.

What do you enjoy most about your role at Google Cloud Platform?

What I most enjoy is seeing customers bloom on the platform. Here’s an analogy: when my daughter went from elementary school to middle school, she quickly became much more mature and independent. It felt like she grew two years in a few weeks. I see a similar almost-instantaneous maturity happening with many customers on Google Cloud. They first come with an infrastructure-driven approach, and a mindset of scarcity. Often their very first foray in the Cloud replicates on-prem patterns. But very quickly it clicks. Why would they need a long-lived shared Hadoop cluster when Dataproc can provision one from scratch in under 90 seconds, for which they pay by the minute? “Everybody gets a pony” as one Spotify engineer once said describing their use of Hadoop on Google Cloud. Similarly, we see many Data Warehouse customers who spent years strategizing what data to keep and which should be removed to make room in their Data Warehouse. Enters BigQuery, where storage is at most $0.02 per GB per month, and $0.01 after 90 days. Suddenly there is no need to ever delete data, at least not for cost reasons. And they can spend their mental energy thinking about how to use the data, not how to maintain systems and optimize limited storage capacity.

When the mental switch happen, we see a whole new pattern, with much more ambition, much more experimentation (the cost of experimentation is negligible, both in terms of Cloud resources consumed and, more importantly, time spent on the investigation because you can jump straight to the core of the task with no setup time). We see customers who had internalized that batch processing was the natural order of the universe move to stream processing (why wait for your results, when Dataflow makes stream execution as easy as batch).

The next round of discussions with these customers is not about infrastructure, or prices, it’s about them sharing what they’ve achieved and sharing what they plan to do next. For example, to start using Google’s Machine Learning services. In the span of a few months, they transition from IT as a necessary burden to IT as a power tool.

What is the most effective GCP product for managing Big Data?

Our product portfolio is designed as a set of complementary and well-integrated products. Almost no real-world task requires just one product. In this context, a service is not “more effective” than another, in the same way that a shovel is not “more effective” than an umbrella. They do different things. But if I interpret “effective” to mean which one is the most uniquely effective (most innovative, most differentiated from the competition), then I’d say it’s Google Cloud Dataflow, our fully managed (“no-ops”) service for data processing pipelines.

It distinguishes itself on two aspects. First, it’s about what it can do (functional aspects). Google Cloud Dataflow implements the groundbreaking Dataflow Model, which provides developer with a powerful model to create data processing pipelines that can run in either batch or stream mode. And it incorporates the management of event delivery delays so that programmers only need to worry about declaring how they want events grouped, without having to worry about managing state themselves to account for out-of-order and late arriving events.

Note that Google open sourced its implementation of the Dataflow Model and contributed it to Apache as Apache Beam (the name comes from concatenating the “B” of “batch” with the “eam” of “stream”). So, Google Cloud Dataflow runs Beam pipelines, but they can also run on other Apache engines like Apache Spark and Apache Flink.

So why would you run these pipelines on Google Cloud Dataflow? That’s where the second key aspect comes in, the operation aspect. By that, I mean the fact that running pipelines in Google Cloud Dataflow means that the user is free from any operational concern. All they have to do is submit the pipeline they wrote. Period. No need to deploy anything, to scale, to patch, to guess the needed capacity, etc. Dataflow will automatically provision the needed resources, and auto-scale so the pipeline execution is performant without costing any more than it needs to.

How can Google help businesses with digital transformation?

Google can help business not just with digital transformation, but also with the transformation to using Artificial Intelligence. And the good news is that the latter is a natural continuation of the former. Step 1 remains the digitalization of the business, where Google helps by providing fully-managed data storage and processing services so that you don’t have to have a full staff of tech wizards to manage the digital infrastructure of your business. Data ingestion is easy, and in many cases completely automated, e.g. if you use Google Analytics Premium and want to import the data into Cloud. For other cases, Google has developed a rich partner ecosystem to provide the right data integration infrastructure. Once the data is in Google Cloud, all processing systems are “serverless” meaning that customer don’t need to worry about operating servers. This allows people to very easy run analytics (using their favorite tool, e.g. Tablea, Microstrategy or Looker) on the data.

But, as mentioned above, in addition to opening the door to easy and powerful analytics this digitalization of the business on Google Cloud also puts businesses directly in position to take the next step and apply Google’s unique Machine Learning capabilities to their business challenges.

Learn more from William and a variety of other industry thought leaders at the upcoming Big Data Week London Conference. View more details here.

roy-wagemans Roy Wagemans is the Product Marketing Manager at IFS. With an international career spanning more than 25 years with a degree in business from Henley Management College and a Master Degree in Systems Analysis, Roy has the ideal background for translating the technicalities of products into clearly defined messages to the market and our customers.

Why did you decide to present your webinar on trends in analytics?

Analytics has been a topic of discussion for a long time now and pretty much all companies have analytics capabilities in some way, shape or form. With IoT adoption set to take off and the knock-on effect this will have on Digital Transformation efforts, I think we are at an inflection point and need to review whether we are getting the maximum ROI from the data we are storing and analyzing. With the use of machine learning and prescriptive analytics set to increase, we need to ensure our Master Data Management is in order and the processes and systems we use for analytics are rationalized. Otherwise we will run the risk of just adding an additional layer of cost to our data management expenditure.

What insights will the audience gain by attending this webinar?

IFS is not a technology vendor, so we will not be talking about the ins and outs of developments in analytics. We’ll give pragmatic insight into the application of advanced analytics in business systems. Being able to apply machine learning to data is one thing, it is what you do with the results that, in the end, makes a difference on business outcomes.

How did you get into the industry?

In 1997 I started working on projects to manage heterogeneous database landscapes and have lived in the world of data ever since. Most notably, in 1999, I started leading development activities related to Off-board Prognostics and Health Management (i.e. predictive analytics) for the F-35, Joint Strike Fighter. In later roles I moved from program management to put more focus on the hardware needed for cloud based and workstation based analytics for companies such as Exasol, Synerscope and R Systems.

What do you enjoy most about your role?

I enjoy staying up to speed with the state of the art and developing product positioning and marketing messages for the products our R&D department develops. We have a varied, globally available product portfolio so there are always new products to introduce and improvements to be made.

What motivates you?

The speed with which the digital world is changing, and keeping up with it.

Join Roy in the upcoming webinar ‘5 Key Trends in Analytics‘ where experts from Microsoft, IDC and IFS provide new information and insight on big data and analytics. Register now!

Tag Archives: Big Data

Roy Wagemans, IFS Product Marketing Manager