All Blogs

Why Bring Waterline Data’s ML-Enabled Data Catalog to Hitachi Vantara’s DataOps Portfolio?

Alex Gorelik

March 26, 2020


As you may have heard, Hitachi Vantara recently announced the intent to acquire the assets of Waterline Data. Today, that deal has become official and, as Waterline’s founder, let me say we’re super excited about our strategic role in furthering Hitachi Vantara’s vision to become the world’s preferred digital innovation partner.

Waterline technology fits extremely well with Hitachi’s Lumada portfolio of products, technologies and solutions that provide a competitive edge through data insights for customers in any industry – from manufacturing and energy to transportation and retail. It helps to accelerate execution against Hitachi Vantara’s commitment to enable DataOps for analytics, governance and operational agility.

By infusing our powerful infrastructures with business semantics, the entire data service pipeline can be automated with policies and rules, eliminating the need for tedious hands-on management traditionally needed for such tasks.

So says Ganesh Thondikulam, Executive Director at Kaiser Permanente Information Technology: “Providing analysts across our organization with access to the right data, at the right time is crucial to how we do business, but ensuring this is enabled with appropriate data governance remains a top priority.”

Going one step further, the marriage of Hitachi Vantara’s breadth of data capabilities with Waterline’s metadata expertise is a match made in heaven – Here’s why.

Data Catalog: What’s All the Fuss About?

The market for data catalogs is “red hot.” Organizations struggle with data sprawl – data that is distributed across multiple data centers, clouds and edge devices. Data sprawl is expensive to manage, creates significant compliance risks, and impedes innovation. A data catalog brings much-needed sanity to data sprawl by inventorying and classifying data assets. It provides a necessary metadata layer to support critical information projects, from data preparation to governance and compliance management. Here are a few specific areas in which we set ourselves apart from others in the space:

  • Fingerprinting: Waterline Data specifically provides a unique, patented technology called “data fingerprinting.” The basic idea is that data fields carry unique fingerprints, just like we all have our own fingerprints. Based on similarity between fingerprints for different fields, we can tell whether these fields contain the same information. Once a field is given a tag – a business term that describes its contents, Waterline uses AI and ML to automate the discovery and classification of other fields with similar fingerprints and suggest semantic tags for them. The analysts and data stewards can then accept or reject these suggestions, which trains the catalog to become more accurate in its classifications.
  • Collaboration: The combined knowledge of crowds is a powerful thing. When looking for a restaurant, we value feedback from people who have eaten there before. When selecting a book to read, we look for recommendations from people who like the same books we like. Aligning to this trend, Waterline’s data catalog contains built-in social media-style collaboration to enable ratings, reviews, discussions and the ability to ask questions of experts or the community at large. It also provides recommendations on what other datasets have been used together or viewed by the same users. Add it all up and Waterline creates a compelling collaborative experience that draws from the wisdom of the community to make finding and understanding data much, much easier.
  • Edge to Core to Cloud: Data landscapes are increasingly distributed and diverse. The vast majority of organizations have two or more cloud systems deployed in addition to their operational applications, SQL databases, Hadoop data lakes and Snowflake data warehouses. Waterline Data supports 15+ systems out of the box and integrates with even more using open APIs – from edge to core to cloud. Everything in Waterline Data Catalog is exposed through REST APIs for easy integration with other tools in the ecosystem. Built to scale to the entire enterprise, Waterline Data Catalog uses highly scalable and high-performing components such as Spark, Solr and agent-based native or Kubernetes container-based architecture.

Lumada Data Services + Waterline Data Catalog = A Winning Combination

Waterline Data technology with Hitachi Vantara’s Lumada Data Services portfolio forms a common metadata framework from edge to core to cloud. The catalog complements and integrates with the products in the Lumada Data Services portfolio, including Pentaho, Lumada Edge Intelligence, Lumada Data Lake, Lumada Data Optimizer for Hadoop, and Hitachi Content Intelligence. The integration adds strong and differentiated capabilities supporting data discovery, data self-service, and data compliance initiatives such as GDPR and CCPA. Combined with Lumada’s policy-based automation capabilities, this will enable a wide range of new digital solutions and outcomes across industries.

Real Business Outcomes

So, what are the business benefits of data catalogs? Consider data lakes, for example. Many data lakes don’t deliver on expectations and turn into data swamps; million-dollar investments without justifiable ROI. Take the case of one pharmaceutical company: Before deployment of Waterline Data AI Catalog, it took four weeks to identify and act on a specific information asset. With Waterline Data AI Catalog, now the customer can identify data assets in minutes or seconds, all while protecting sensitive and classified data. Using the advanced data catalog technology, you can effectively turn the data swamp into a data trove.

I’ll leave you with one final thought. Industry research estimates organizations that offer a curated catalog of internal and external data to diverse users will realize 2x the business value from their data and analytics investments than those that do not. That’s 2x the cost savings, 2x the customer service, 2x the new revenue streams. Image what you could do with that?!

This is the power of data catalogs, and that’s why we’re so excited about this acquisition. Contact us to learn more, and let’s discuss what the Data Catalog can do for you.