Google Data Catalog has community-maintained tools like the open source connectors, to ingest metadata from different data sources:
Disclaimer: All opinions expressed are my own, and represent no one but myself…
There’s extensive documentation on what IAM Roles are available for Google Data Catalog. But when you are getting started with your data governance journey, you probably have wondered what kind of access controls are needed and who should be granted them in your organization…
This can get really complex, so in this blog post, we will start by looking at the access controls on top of metadata, which is Google Data Catalog playing field. …
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: GitHub.
If you want to hear more about some of the Data Catalog connectors use cases, please check the official documentation:
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available on: github.
If you missed the latest post talking about how Apache Atlas and Data Catalog structure their metadata, please check a-metadata-comparison-between-apache-atlas-and-google-data-catalog.
In this article, we will start by creating a fictional scenario, where we have the Dress4Victory company. They help their users getting the best deals when buying clothes, and over the years they have grown from a few servers to several hundred servers.
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available on: github.
If you missed any of the latest posts on how to ingest metadata into Data Catalog, please check the following: Looker, RDBMS, Tableau, Hive.
A Data Catalog is usually defined by a collection of metadata, combined with data management and search tools. This enables organizations to quickly discover, understand, and manage all their data.
Now here’s the one million dollar question.
Do you know what kind of sensitive data your organization holds? Are you keeping track of every change applied across all your tables and columns? Are you confident to answer questions an auditor may have on data regulations?
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.
Entering the big data world is no easy task, the amount of data can quickly get out of hand. Look at Uber story, on how they deal with 100 petabytes of data using the Hadoop ecosystem, imagine if every time they would sync their on-premise metadata into a Data Catalog, a full run was executed, that would be impractical.
We need a way to monitor changes executed at the Hive server, and whenever a Table or Database is modified we capture just that change and incrementally persist it in our Data Catalog. …
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.
We hardly find large organizations storing all their data at the same place, sometimes it’s because of compliance or even a strategic reason. That leads to many customers having their data assets spread across multiple silos, dealing with data that resides on hybrid clouds and/or on-premise environments.
By taking a metadata management perspective, we need to enable data discovery in a centralized place, no matter where the data is.
Last year Google Cloud announced their metadata management…
Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available at: github.
Google Cloud Data Catalog Team has recently announced the product is GA, with the feature to accept custom (aka user-defined) types: data-catalog-metadata-management-now-generally-available! This brand new feature opens up scope for integrations and now users can leverage Data Catalog’s well-known potential to manage metadata from almost any kind of data asset.
We hardly find large organizations storing all their data at the same place, sometimes it’s because of compliance or even a strategic reason. That leads to many customers having their data assets spread across multiple silos, dealing with data that resides on hybrid clouds and/or on-premise environments.
It’s also widely known that the Relational Databases ecosystem, has many different vendors, making it hard to work with each one since we end up dealing with so many distinct features. …
I’m a huge fan of any kind of automation, for those who know me, they know this to be true, and recently I had to work with Cloud SQL
. So to reduce toil, I looked for ways to automate simple tasks such as:
There are many ways to achieve that, and I’ve chosen to use Terraform, Python, Docker, and gcloud CLI.
Why them you ask? For the simple reason that I love them :)
So bear with me, and let’s get the automation going! …
About