Learn how to ingest SAP HANA metadata to Google Data Catalog and extend it with your user needs

Google Data Catalog has community-maintained tools like the open source connectors, to ingest metadata from different data sources:


Scripts and Terraform automation to help you ensure best practices in Google Data Catalog

Disclaimer: All opinions expressed are my own, and represent no one but myself…

There’s extensive documentation on what IAM Roles are available for Google Data Catalog. But when you are getting started with your data governance journey, you probably have wondered what kind of access controls are needed and who should be granted them in your organization…

  • What end user should be able to discover my data assets?
  • Who should be able to classify and add tags to them?
  • And finally, be able to create templates and set standards for the data classification process?

This can get really complex, so…


Best practices on two approaches, with code samples!

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: GitHub.

If you want to hear more about some of the Data Catalog connectors use cases, please check the official documentation:


From design decisions to step by step execution learn how to ingest Apache Atlas metadata in Google Data Catalog doing full and incremental runs.

Image created on Canva.

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available on: github.

If you missed the latest post talking about how Apache Atlas and Data Catalog structure their metadata, please check a-metadata-comparison-between-apache-atlas-and-google-data-catalog.

The Dress4Victory company

In this article, we will start by creating a fictional scenario, where we have the Dress4Victory company. They help their users getting the best deals when buying clothes, and over the years they have grown from a few servers to several hundred servers.


Learn how your metadata is structured on both systems.

Image created on Canva.

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available on: github.

If you missed any of the latest posts on how to ingest metadata into Data Catalog, please check the following: Looker, RDBMS, Tableau, Hive.

The one million dollar question

A Data Catalog is usually defined by a collection of metadata, combined with data management and search tools. This enables organizations to quickly discover, understand, and manage all their data.

Now here’s the one million dollar question.


How to Create Data Catalog tags by inspecting all your BigQuery data with Cloud Data Loss Prevention.

Background by Kelli Tungay on Unsplash

Do you know what kind of sensitive data your organization holds? Are you keeping track of every change applied across all your tables and columns? Are you confident to answer questions an auditor may have on data regulations?


Code samples with a practical approach on how to incrementally ingest metadata changes from an on-premise Hive server into Google Cloud Data Catalog

Background by JJ Ying on Unsplash

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.

The Challenge

Entering the big data world is no easy task, the amount of data can quickly get out of hand. Look at Uber story, on how they deal with 100 petabytes of data using the Hadoop ecosystem, imagine if every time they would sync their on-premise metadata into a Data Catalog, a full run was executed, that would be impractical.

We need a way to monitor changes executed at the…


Code samples with a practical approach on how to ingest metadata from an on-premise Hive server into Google Cloud Data Catalog

Background by JJ Ying on Unsplash

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.

The Challenge

We hardly find large organizations storing all their data at the same place, sometimes it’s because of compliance or even a strategic reason. That leads to many customers having their data assets spread across multiple silos, dealing with data that resides on hybrid clouds and/or on-premise environments.
By taking a metadata management perspective, we need to enable data discovery in a centralized place, no matter where the data is.

Data Catalog


Code samples with a practical approach on how to ingest metadata from on-premise Relational Databases into Google Cloud Data Catalog

Background by JJ Ying on Unsplash

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.

Google Cloud Data Catalog Team has recently announced the product is GA, with the feature to accept custom (aka user-defined) types: data-catalog-metadata-management-now-generally-available! This brand new feature opens up scope for integrations and now users can leverage Data Catalog’s well-known potential to manage metadata from almost any kind of data asset.

The Challenge

We hardly find large organizations storing all their data at the same place, sometimes it’s because of compliance or…


Tooling for environment tasks to fly high with MySQL, PostgreSQL, and SQLServer, Cloud SQL instances.

Background by Chandler Cruttenden on Unsplash

I’m a huge fan of any kind of automation, for those who know me, they know this to be true, and recently I had to work with Cloud SQL. So to reduce toil, I looked for ways to automate simple tasks such as:

  • Create the Database
  • Generate Schemas and Tables
  • Connect to the Database
  • Clean up the Database
  • Delete the Database

There are many ways to achieve that, and I’ve chosen to use Terraform, Python, Docker, and gcloud CLI.

Why them you ask? For the simple reason that I love them :)

So bear with me, and let’s get the…

Marcelo Costa

senior software engineer & google cloud certified architect and data engineer | love to code, working with open source and writing @ ciandt.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store