Home > Data Analytics > Big Data Analytics

Big Data Analytics

To support data analysis, Taipower has also been working on the development of big data platform.

1. Introduction

Within Taiwan Power Company's (TPC, also called Taipower) system, automatic meter reading (AMR) data are sent out every 15 minutes. These data are invaluable for understanding customer behaviors when combined with data on customer attributes, weather, and economic activities as represented by economic indexes. But this goal can only be achieved through the use of a data warehouse with massively parallel processing (MPP) features so that the data can be collected, stored, and computed efficiently, then analysis can be conducted smoothly.

TPC has managed a variety of domain data which have been used on information technology (IT) and operational technology (OT) systems. This chapter will focus on the “Customer Service Big Data Platform” which has been installed and has been running for a period of time. The data operated on the platform are mainly high-tension AMI data and billing system data. Furthermore, a larger scale system named “Meter Data Management System (MDMS)” which includes all AMI customers’ complete data will go online in the near future. In these system establishment projects, TPC's enterprise-level big data computing center is also being built at the moment.

The Customer Service Big Data Platform stated below adopts the idea of a cloud-based analytic platform on a basis of a MPP data warehouse system and can be scaled out in both computing units and storage volume. An x86-structured server approach was introduced into the platform to reduce costs. For satisfying the various statistical and data exploration service needs of the power generation, distribution, and sales departments, the platform provides cloud-based visualized data analysis, exploration, and modeling services. This allows TPC's staff from different departments to access these services on the intranet in their offices. In addition to retrieving AMR data, the platform also collects information on customer attributes, bills, weather, economic indexes, and power-generation data. This may help to conduct analysis, modeling, and trend forecasting for target issues.

2. Platform Architecture

The right picture shows the data sources employed for the platform, along with various frequencies and formats associated with each. The platform receives raw data files via FTP and then parses them in the Hadoop data repository illustrated in the lower middle of the picture. Next, the raw data files are transformed into suitable formats and loaded into the structured MPP database. Then, the visualized statistical analytic tools can use the database and analyze the data quickly. Finally, results are presented through browsers. Additionally, the data in the database can be accessed by SQL and other statistical analytic tools (ex. R Language).

In order to use the limited space available in the database efficiently, out-of-date data are stored in the Hadoop repository only and archived in text-file format. There is no old data in the database. If a request is made for old data, the data can be available to the statistical analytic tools via the external table linkage technique provided by the database. Below is layout of the Visual Analytic Tools Server, Structured MPP Database, Hadoop Repository Server, and FTP Server.

Layout of the Visual Analytic Tools Server, Structured MPP Database, Hadoop Repository Server, and FTP Server

3. Use Cases

(1) The 24-hour Power Load in the Power-generation Peak Days

The platform allows for the quick determination of the 24-hour power load on specified dates (usually the top ranks of power-generation peak days) of all kinds of high-voltage electricity customers, which helps the coordination of power-generation cost allocation and planning for power price modifications when combined with the power load model of low-voltage electricity customers.

(2) Locating the Potential High-tension Electricity Customers Suitable for the Applicable DR Programs

TPC has tried to divided the high-tension electricity customers into 5 clusters with the statistical K-Means algorithm. These clusters are described as below and illustrated in the right picture:
[1] Customers that consume power all day but mainly from 08:00 to 16:00.
[2] Customers that consume power at night and mainly from 20:00 to 05:00.
[3] Customers that consume power mainly in the daytime peak hours (8:00~17:00).
[4] Customers that consume power mainly from 08:00 to 20:00.
[5] Customers that consume power all day (00:00~23:59).

Not only customer clustering, but also the probability forecasting model of each customer’s acceptance for applicable Demand Response (DR) programs was implemented with the statistical Logistic regression. In order to promote DR programs to customers efficiently, customers’ behaviors must be understood first through the analysis of power-consumption information.

Moreover, the probability of individual customer’s acceptance for applicable DR programs were calculated by different clusters, industries, and areas, which may help the sales department to engage the precision marketing.

(3) An Example of Researches with The Platform: The Impact on Customer Behaviors through Different TOU Price between Peak and Off-peak (2018)

This research analyzed the users’ power consumption behaviors regarding the price gap between peak rate and off-peak rate, including the sensitivity in the rate adjustment range, through the users’ historical power consumption data and an 1-year experimental survey with 50 high-tension customers and 100 low-tension customers, then the effectiveness of the peak load suppression was being calculated. It provided TPC’s sales and planning department the insight into simulating the impact on suppressing the power load at peak through different price gap. It also helped TPC to evaluate the outcome of power rate adjustment.

We have separated the TOU customers into 4 contracts, which are extra high-tension, high-tension, low-tension lighting (operating), and low-tension (non-operating). After inputting the elasticity value, number of TOU customer, and average usage at peak of each customer contract, we can raise the peak rate (1%~10%) from the drop-down list at the upper right corner, then the suppression at peak will show in the right column of the lower right table. Below illustrates TPC’s analytic system for the relationship of the suppression at peak and rate adjustment and the comparison line chart of the peak load suppression due to operating the gap between peak and off-peak TOU rates by customer contracts.

4. Future development toward Artificial Intelligence Applications

TPC has planned 5 phases to develop the big data Applications through big data analytics. Now, TPC is facing the phase V, real-time decision. TPC will go no keeping refining the power usage model of different industries. By introducing Internet of Thing (IoT), GPUs, and more powerful software tools (ex. AI modeling packages) into the platform, TPC can refine the lists of potential demand response (DR) customers from the phase II and get more appropriate recommendation. TPC also can proceed analyses of the relationship of customers’ power load and household electric appliances and further develop models to detect customer cheating. In the future, TPC expect to provide its customers with more new services by big data analytics, such as real-time pricing, real-time personal reporting, and more.

The Plan of Enterprise-level Platform

1. Origin of the plan
The management of all aspects of our company's businesses has become more and more complex with the deployment of smart grids, the popularization of smart meters, and the vigorous development of various renewable energy sources. Each business unit developed its own information system and stored operational data separately based on its operational needs. As a result, the dramatic growth in the amount of data will lead to a broken chain among upstream and downstream systems. The exchange of data becomes more difficult across different systems and it is necessary to build an enterprise-level platform with a forward-looking vision. Therefore, our company plans to implement a data governance mechanism to integrate the storage of various structured, semi-structured and unstructured data. It also integrates the enterprise data model and data analysis tools to build a cloud-based platform. The platform enables users to get the most value out of data, to provide grid management and demand side operations to ensure power quality, to improve grid operation efficiency, to increase customer satisfaction, and to increase the company's profitability and competitiveness.

2. Project goal
The project will build a platform to support the data-driven strategy for the industry group. It includes the following five specific goals:

(1)To integrate company-wide operational data in order to establish enterprise-level data model
The plan is to build a company-wide data warehouse to connects the upstream and downstream systems of the smart grid data with an automated processing mechanism to collect various operational data, such as high/low voltage smart meters, transmission/distribution scheduling data, and enterprise resource planning system (ERP) transactions. It establishes enterprise-level data model and combines the data sharing framework to form a smart enterprise integrated structure, based on data analysis application and visual presentation, in an easily understood dashboard to improve decision-making efficiency.

(2)To train data scientists and promote the application of Artificial Intelligence (AI)
The project aims at training data scientists on professional platforms and the actual participation in various activities such as data extraction from the heterogeneous database, data cleaning, data enrichment, and decision-making model building. The benefit of applying the strategy of "learning by doing" is the development of the unique core skills combined with the domain knowhow of data analysis.

(3)To implement projects in order to analyze and solve the company's major business issues
Internal analysts will review the company's enterprise-level data and figure out major business issue topics such as stable power supply, renewable energy, energy saving, and environmental sustainability with the assistance of external consultants and the analysis experience of foreign power industry. By combining multiple strategies, the company's issues can be solved, and the purpose of training can be achieved at the same time.

(4)To implement data governance transparency and improve data activation
In the past, each unit developed its own business application system and the information exchange between them was limited. The same data was stored across multiple units and might even had different definition. This plan follows the guideline of the National Development Council: “transparent data governance.” This plan establishes a company-wide data sharing platform, automated data extraction-transformation-loading (ETL) rules, a high-quality data model, metadata, and data dictionary. The search and access of data are no longer difficult. The development of information systems or data analysis projects can use the existing data without the cost of rebuilding data. In the meantime, analysis outputs are also recycled to the platform through standard operation, forming a good cycle of data activation.

3. Major Tasks

the architecture diagram of enterprise-level analysis environment

The plan is to build an enterprise-level environment for analysis, to establish data sharing standard, to centralize various operational data, to provide various auxiliary tools for data analysis, and to help each department accelerate the speed of analysis. The architecture diagram is shown in Figure 5-9, divided into five major blocks: enterprise-level data model, data exchange platform, data storage platform, data analysis platform and data sharing platform. The function of each block is described as follows:

(1)Enterprise-level data model
When the internal system of an enterprise becomes more and more complex, it means that it becomes more difficult to analyze data from a macro perspective. The need to build an enterprise-level data model becomes essential particularly. The project plans to establish an exclusive data model by studying successful cases of foreign power industry.

In the process of building a data model, not only the data relationship between each system can be clarified, but business and information needs are also gradually converted into metadata and data dictionaries, are collected.

The completeness of data model shows a good grasp of the overall picture of the company's data. Based on this model, you can quickly find the data required for analysis and understand which parts are still missing. It also collects external information in other professional fields such as population distribution, meteorology, and industrial distribution data, etc.

(2)Data exchange platform
The most serious problem that many companies face is how to collect data from multiple data sources. The storage type, data format, the meaning of data and the name of data are different, and they must be cleaned and converted before being loaded into the final storage

The purpose of the data exchange platform is to implement this series of processes: Extract(E), Transform(T), and Load(L). There are two variants: ETL and ELT. The biggest difference is only the execution location during data transformation. ELT is more commonly used when applying to the field of big data. It is assisted by system resources in the data warehouse so that it achieves the effect of high-speed processing.

Either ETL or ELT can be developed in any programming language, and program conversion and connection settings lead to management problem. As a result, more and more companies use commercial tools to help overcome difficulties. The commercial tools integrate complex data conversion and workflow to ensure that they can be processed in the correct order, which is carried out through an easy-to-learn user interface.

(3)Data storage platform
With the growth of amount in data, the traditional architecture faces many challenges, such as the inability to deal with non-structured data effectively, the time-consuming process of converting data from source through ETL to data warehousing, high expansion cost, and the limitation of vertical expansion architecture.

Currently, the mainstream of data warehousing system structure is Massively Parallel Processing (MPP), which can distribute tasks in parallel to multiple nodes for processing. Each node has its own independent resources such as buses, memory, and hard disks, etc. The biggest feature is that it does not share resources, and it works collaboratively through network connection. It can reduce a lot of manpower in management by adding nodes directly into cluster without changing the original structure and settings.

According to the application for different analysis purposes, the data can be further subdivided into different data marts. Each mart contains a specific range of data, and the data can be further pre-processed to accelerate analysis performance. At the same time top management can also be applied on the mart level to simplify complex management operation.

(4)Data analysis platform
In order to improve the analysis application ability and efficiency of the entire company, the data analysis platform provides a variety of analysis tools to meet different user needs, and those tools can communicate with each other.

A. AI Modeling Tool
AI plays a third-party role in the project and provides data-driven analysis suggestion, which is different from human judgment. It may discover what might be ignored by stereotypes thinking logic based on experience or create new applications with data sets that have not been considered at all.

AI modeling tools will accumulate the knowledge of each project analysis based on best practices in different fields. The suggestions and whether analysts adopt them or not will repeatedly enter the cycle of machine learning and establish the company's unique knowledge model through automated machine learning mechanisms.

B. Visual analysis tools
Good visualization tools allow the data analysis team to quickly see the similarities, differences and trends of the data. Users can design complex visual interaction through the simplest drag-and-drop method, customize chart presentation as desired, and continuously monitor the dashboard. It can also suggest different diagram presentation methods to analysts based on the characteristics of the data and provides various fine-tuning settings without writing any programs.

C. Visual data mining tools
Data mining is to build a model among a large amount of data to find hidden and special correlation and features. For example, to find out the customer's consumption pattern and to achieve personalization marketing by analyzing the company's customer information, including age, transaction volume, and transaction frequency, etc. Commonly used models are classification analysis, group analysis, regression analysis, association analysis, order analysis ... and so on.

The visual data mining tool uses a graphical interface to conduct a comprehensive examination of data, to effectively reduce time for data preparation, and to package various analysis methods. It reduces the threshold of the analyst's statistical and programming skills and provides fast switching and cross-comparison functions to assist analysts to accurately find the appropriate analysis model.

D. Visual forecast tools
Many applications need to process a large amount of time series data for analysis, decomposition, modeling, and forecasting. Throughout the process, forecasters need to spend a lot of time on time-frequency analysis, the selection of time series models, and parameter adjustment. With the help of visual forecast tools, large-scale time series analysis and forecasting can be automatically generated, model parameters can be automatically optimized, and forecasting results generated by different models and different parameters can be quickly compared through graphical interface. The forecasting process reduces the possibility of personal bias due to human intervention.

(5)Data sharing platform
Data within the company can promote the activation of cross-department data circulation application. With the arrival of high-speed computing era, under limited resources, good use of the company's unlimited creativity and integrate data from all departments. To promote value-added application, the planning scope of the data sharing platform is as follows:

A. Data set aggregation query: Display the data sets in the data warehouse and provide multi-dimensional and multi-view query methods.

B. Metadata: Provide the main fields, units, update frequency, sensitivity, and providers of the data set, so that users can quickly understand the content of the data and whether it is suitable for the project to be implemented.

C. Data dictionary: provide detailed specification and description of each data set fields, combining with the Metadata, enable ETL to import various data into the corporate data model.

D. Security clearance: according to the sensitivity of the data, different users are given permissions to ensure the security of the sensitive data.

In accordance with the government's policy on data transparency, the data sharing platform also provides support for the National Development Council's data open platform. Under the comprehensive information security and personal regulations, it can quickly provide high-quality and de-identified data and API to ensure the timeliness and accuracy of open data.