Getting to grips with data in today’s ‘on-demand’ culture – Snowflake style

Snowflake on-demand data
Image by ra2studio | Bigstock.com

It’s almost a cliché: that crises have the habit of forcing leaders to meet longstanding challenges with renewed urgency. In recent months, conversations concerned with recovery strategies with Asia-based leaders always now always include various digitalisation imperatives.

Newly released results of a McKinsey global survey of some 800 executives is the latest study confirming that we are seeing a disruption of workplace changes arising from increasing automation, digital transformation and other trends linked to the digital economy.

Geoff Soon, managing director for
Snowflake in South Asia

According to Geoff Soon, the managing director for Snowflake in South Asia, “COVID-19 has disrupted almost every industry across the globe. To regain footing, businesses must develop agility by tapping on their most important asset – data,” he says, when talking with Disruptive Asia recently.

Since setting up its Singapore operations in late 2018, Snowflake’s cloud data platform has been made available for Amazon Web Services (AWS) and Azure in a bid to meet the region’s digital transformation upsurge. This strong growth is reflected in reports of the company’s forthcoming IPO.

Attributing the company’s rapid growth to putting customers first, Soon’s customers are spread across a spectrum of industries in Southeast Asia, including the transportation, retail, telecommunications, and finance sectors.

He affirmed the company is ‘on a mission to persuade businesses to unlock the value in data more effectively to secure growth – especially vital during the ongoing Covid-19 pandemic.’

Hurdles to data adoption

The golden chalice in today’s crisis-charged times is the ability to generate real-time insights and make data-driven decisions depends on having a reliable technology that can load, transform and integrate structured and semi-structured data.

To confidently participate in these data exchanges when they emerge, companies must modernise the way data is managed. Currently, complex data sharing and exchange methods combined with costly and inflexible computing platforms make it difficult for organisations to collaborate and leverage their enterprise data.

“Many organisations are still emailing multiple spreadsheets, conducting batch processes with file transfer protocol (FTP), extract, transfer, load (ETL) software and using application programming interfaces (APIs). Unreliable, not secure and not scalable, these data-sharing methods waste away valuable time, money, and resources,” Soon explains.

In addition, the time required to extract data from traditional platform delays the value shared data provides. “Moreover, as shared data is in its static version, it becomes immediately stale. Thus, every time data changes, data extraction and the trader processes must be repeated.”

He detailed other hurdles, which include:

  • That shared data set is often much larger than originally scoped, which poses problems in the data extraction process. “A scripting language to automate the breakdown and extraction process is needed, which may require additional IT assistance.”
  • Sensitive company data needs to be encrypted, masked or redacted to limit exposure to breach. “We all know of stories of companies facing huge losses due to compromised data. Securing company assets require significant IT resources and many companies are struggling to address this issue.”
  • Cleaning the data, harmonisation of data and ensuring data integrity are all critical processes to make the most out of enterprise data. “Due to differences in data formats and sources, data import processes may encounter glitches and the data extracted is not as clean as anticipated.”

No quick fixes

Tapping the potential offered by frontier technologies such as Artificial Intelligence (AI) and machine learning (ML) machine learning initiatives relies on feeding the right data at the right time to the correct models.

“Data is added and prepared multiple times during each stage of the machine learning (ML) cycle, often with different data requirements. Success in ML is predicated on getting the right data in the right condition into the right analytic platforms to generate business results,” Soon said.

“Whenever they broaden or extend the scope of the data set, they have to wait for data engineers to load and prepare the data. This causes a delay and introduces significant latency between iterations.”

Turning to the security, Soon said that Snowflake’s secure data sharing capabilities allow data exchange through Snowflake’s ‘unique’ services layer and metadata store.

Without actually copying or transferring files between accounts, shared data does not take up any storage in a consumer account and, therefore, does not contribute to the consumer’s monthly data storage charges.

Single source of truth

The only charges to consumers are for the compute resources such as virtual warehousing used to query the shared data. Using Snowflake’s s data sharing capabilities, consumers can immediately use and query the shared data, at the highest performance profile.

“One use case that I can share is Rakuten Rewards. The company has established a highly scalable enterprise data hub for internal use with Snowflake. As a result, wait time for data is shortened, the data sets are available earlier in the day, and IT teams can detect problems quicker,” Soon added.

While query performance can vary depending on the size of the cluster and the exact type of job, Rakuten Rewards has seen a 95% faster run time with Snowflake compared to its previous data warehouse.

Snowflake’s cloud data platform allows organisations to consolidate their data from data warehouses, data marts, and data lakes into a single source of truth that powers multiple types of analytics and data science applications. Teams can easily collaborate and share governed data, internally and externally without having to copy or move files in different places.”

He explained that raw, structured, and semi-structured data is easily discoverable and immediately accessible for data science workflows, with native support for data file formats such as JSON, AVRO, XML, ORC, and Parquet. “The capability to use one set of tools to manage both structured and semi-structured data shortens the data discovery and preparation cycle.”

“Furthermore, Snowflake’s cloud data platform is a highly extensible, multi-region and multi-cloud platform that powers all types of data workloads,” he said. “COVID-19 has disrupted almost every industry across the globe. To regain footing, businesses must develop agility by tapping on their most important asset – data. The ability to generate real-time insights and make data-driven decisions depends on having a reliable technology that can load, transform and integrate structured and semi-structured data.”

To confidently participate in these data exchanges when they emerge, companies must modernise the way data is managed.

“We have seen how the established ways of doing businesses have been upended by the crisis,” said Soon in his conclusion. “For instance, restaurants had to learn how to take online orders and partner with the Grabs and the Gojeks of the world so that they can get their food delivered.”

“Now, competitive differentiation is really about how well data is being captured and used. Companies must take a closer look at their current data management strategies. Doing so keeps the business a step ahead and ensures that the organisations stay competitive in an ever-changing market.”

Be the first to comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.