The financial services industry has always relied on data and accurate record-keeping. 在本文中(基于我的数据管理峰会主题), I'm going to look at the latest data trends and how they're affecting (or about to affect) financial organizations like yours.

1. 数据量

十年前, former Google CEO, Eric Schmidt, commented that "every two days we produce as much content as was produced by all of mankind for the 20,2003年之前的000年." 今天, the rate of (primarily unstructured) data creation is far, far greater.

相应的, 管理每天产生的艾字节的大数据, we need better tools — particularly around automation and the cloud — because there's way too much data to cope with manually. Automation also involves artificial intelligence (AI) and machine learning, which help organizations automate intelligent decisions based on data.

2. 不断变化的社会

COVID-19 has been a significant contributor to societal change, 这一发展也对数据产生了影响. Our response to the pandemic showed that remote working is possible on a scale that would have been inconceivable a decade ago. 现在我们有了网络, 带宽, tools and capabilities to make the disparate workforce a force for good.

For many office-based workers, going virtual didn't change the basics of their working day too much. We live our lives digitally anyway, generating more and more data each day. And as we spend more of our lives online, data privacy becomes a priority. Privacy and security have to be built seamlessly into all of our data tools and technologies. 也, the increasing importance and visibility of data are encouraging the regulators to pay more attention.

3. 改变加速度

随着数据量的增加, escalating technological advances are driving new and more extensive transformations in organizations.

在这里, we're not just dealing with vast and monolithic datasets; there's a hoard of smaller and more detailed sources of information in so-called "small and wide" datasets. 再一次, this drives the evolution of flexible tools and designs to cope with big monolithic datasets and small, 宽的.

掌握加速变化, 我们需要使数据管理本身自动化, deploying metadata tools that can help us manage data at scale (e.g.、数据编目、数据沿袭等.).

事实上,我们需要尽可能地自动化. 有些元素仍然很难操作——人工智能, 特别是机器学习和模型管理. 虽然有许多先进的工具, it's still not the easiest part of the process to automate fully. So, this situation drives a need for both standardization and flexibility.

数据应该驱动工具,而不是反过来. 例如, 在21世纪初, 数据行业处理的是高度结构化的SQL数据库, which brought a certain rigor to the way we collected the data. 我们的方法是, "the data we ingest or create has to fulfill the following needs and requirements, 按照这个预定义的模式."

Organizations don't dictate the structure of the data so much anymore — the data itself dictates the form. 所以现在, 我们收到大量的非结构化数据, which we need to figure out what to do with and determine how to extract the information and value from it. Consequently, data tools are becoming more flexible to help us achieve this.

4. 无处不在

数据无处不在. It seems like almost everything can generate the stuff — your doorbell, 你的自行车, 甚至你的跑鞋.

The processing of all this data can now occur just about anywhere. 在智能设备, 物联网的, 边网络, 容器和api, 越来越多地, it's more practical to process the data wherever the data sits.

Therefore, we shouldn't tie data tools to a particular location. 几年前, this resulted in the development of various container-related technologies so that users could process anything anywhere, 相对容易. 今天, this drives us toward using fabrics — distributed and interoperable collections of tools and services — rather than one specific tool or cluster.


Organizations make the data management transformation journey for a variety of different reasons.

越来越多地 监管要求 比如GDPR正在推动数据系统的改进, with substantial penalties for failure to manage data correctly.

生产力 ——旨在让数据发挥作用——是另一个显著的驱动因素. 超过60% 企业数据无法用于分析, 在潜在的业务洞察力和实际的业务洞察力之间造成了差距. 在许多机器学习的概念证明, far more time is spent interfacing with the correct data than doing valuable work. So, 我们可以做任何事情来提高数据生产率, whether by improving data lakes or enhancing AI-based data interaction, 会对利润有帮助吗.

治理 这是一个越来越受关注的领域吗. 确保我们对数据掌握得很好是至关重要的. 例如, it's becoming critical for organizations to back-up data and be able to recreate the state of the data at any point in the past. These actions lead us to developments like the most recent ISO/ANSI SQL:2016 database standard or Amazon's Quantum Ledger Database (QLDB), 这几乎给了你“数据时间旅行”的能力.

The financial services industry is placing greater emphasis on AI and machine learning governance, 确保公正的, 非歧视和公平的人工智能. Regulators are following this trend (legislation is on the way).

传统技术 而它的替代是改变的最终驱动力. 仍然有大量的传统技术存在, 特别是在金融服务业. 大型机, 例如(不是所有的都是遗产, 当然), often with tooling and architectures that haven't been updated for several years. 事实上, you need strict and coherent processes and strategies to work with legacy data, 确保QA融入到你的流程中. Any new fabrics and architectures must be engineered for future expansion.


Let's look at the data strategies emerging from financial institutions and driving the evolution of financial data spaces. Some are generic; some are more specific to the financial industry — such as strategies for compliance with new data regulations.

Most organizations will follow a similar path toward data maturity, analytics and AI:
  • 数据管理:整合和管理 
  • 数据民主化 
  • 数据可视化:自助式分析 
  • 全企业范围的AI,机器学习和决策支持
在初始数据管理阶段, 组织应该将其数据合并到一个地方. It's much cheaper to connect to and work with one location than multiple, diverse data sources. Teams can then curate data on an ongoing basis with automated tools.

在适当的时候,会有一个数据民主化的过程. Anyone in the organization who needs the data should be able to access and use it in their tool of choice, 无论是Excel, 可视化包或其他东西. Easy access also acts as an enabler for self-service data analytics and visualization with packages like Power BI, Cognos和表.

下一步是自助服务(可视化)分析. Success depends on the quality of the data model to which your visualization package is attached. It's vital to get the right data model (or data environment) for a successful roll-out of self-service visual analytics. 换句话说, 如果用户要创建指示板, 他们希望底层的数据模型是正确的, 很容易理解,并且“按照罐头上写的去做”.如果情况不是这样的话.g., 数据不正确或标签不正确), user trust will be lost and regaining lost trust is a long process.

最后, the data foundation is essential for implementing AI and machine learning, 自助服务视觉分析也是如此. 简单地说, if an organization cannot create a reliable self-service model with underlying data models that are correct and substantial, then the chances of building an enterprise-wide AI and machine learning capability are slim.


A data fabric is a single environment consisting of a unified architecture with services or technologies running on top of that architecture. Stacks from many different providers now describe themselves as data fabrics. But the basic idea is to try and centralize things so that they're easier to govern and manage, 这样就可以减少不必要的服务的复制.

目标是使数据价值最大化, reduce the knowledge gap as much as possible and accelerate ongoing digital transformation.


我们如何交付这个数据转换? Organizations are outsourcing more and more of the data infrastructure and fabric. 十年前, 移动到Azure, AWS or Google Cloud Platform was regarded as state-of-the-art innovation. 现在, 云平台只是另一种服务, 基础设施和结构可以操作, 商品化和外包, 很容易.

On the other hand, the amount of insight and intellectual property (IP) generated is also increasing. And firms are controlling this knowledge in-house, much tighter than in the past.

这是交付的两种方法. Organizations are keeping a tighter rein on data insights but are happy to outsource their data infrastructure.


The structure of the data management delivery has three key components:
  1. IT交付 
  2. 数据和模型交付 
  3. 法规和遵从性交付
IT交付 是我们大多数人都很熟悉的领域吗. 越来越多地 this is moving to agile models, combined with DevOps processes.

数据和模型交付 mainly concerns your analytics model — an area that's maturing rapidly. The new issues are about how you manage and deliver data, AI and machine learning strategies. 例如, 监管机构对数据版本管理的压力越来越大, so you can reproduce training results you had before machine learning and audit any data changes. 也, you need to be able to explain how you arrived at your machine learning models and if you tested them for things like discrimination bias.

法规和遵从性交付 也在迅速发展. There's a swathe of new regulations coming out in 2022, including new EU regulations. So, it's vital to manage data privacy and security in a compliant manner, and be auditable.


We can tie all this together with a target operating model that considers:
  • 人 
  • 流程 
  • 技术 
  • 治理
需要更广泛的技能. 仅仅说你是DevOps专家已经不够了. You need to train for regulatory issues in different jurisdictions and conditions. And to know that authentication and authorization requirements are correct, 要敏锐地意识到司法问题等等.

流程 必须整体而不是孤立地考虑. So, 如果你有一个数据分析交付项目, 你必须超越模型, 准确度和混淆矩阵的各种信息技术, 法规及遵从性方面. 我相信你已经意识到了, you cannot deliver an analytics project involving consumer data without a significant number of compliance checks and issues around data governance. 您应该将这些方面考虑到整个项目中.

技术 transformation should be focused on increasing commoditization. Organizations need to concentrate on technologies that add the most business value and outsource everything else. 探索如何廉价而有效地外包元素, and make sure the interface between in-house and outsourced components is seamless and secure.

治理 是至关重要的,失败的惩罚在增加吗. Key areas to focus on include data quality assessments, cataloging, management, lineage and so on.


简而言之, successful modern data management relies on integration across a much broader range of disciplines.

And the constant theme is that there's going to be change — and lots of it.


要了解更多最新的数据管理方法, 联系美高梅网投 继续讨论.

For more 银行与资本市场 insights, see 美高梅网投 on LinkedIn


Director and Practice Lead for AI, Machine Learning and Big Data, 美高梅网投
Paul leads the Data and Analytics practice for 银行与资本市场 Consulting EMEA. His role spans all advisory and consultancy in data and analytics, and delivery of all projects from PoCs to large multiyear projects. He has worked 与美高梅网投 for 3 years and has over 20 years' industry experience.