We are living in the era of 4th Industrial revolution – the evolution which is based on extreme automation of machine to machine communication, not only just the communication but way beyond. Machines can understand each other, negotiate with each other, sign and execute the contracts with each other. They can predict each other’s behavior, and able to sort of establish a social network of machines. This type of network is self-sustaining and can run without much intervention of humans. At the heart of this exciting 4th revolution are some of the advanced technologies at work. Artificial Intelligence, Machine Learning, Sensing, Internet of Things, Big Data, Analytics, Cloud Implementation and Edge computing to name a few. It is amazing to see and experience, how this complex mesh of technologies work together to produce incredible outcomes. The results promise to significantly cut the cost of production, improve overall productivity, improve product and process quality, reduce downtime, etc.
Machine learning is an important part of this whole technology implementation. The core to machine learning implementation is to maximize the outcome of available raw data. There is a lot of data available to us. Around the work which is currently going on. Every action is producing data. Machine learning is about analyzing this available data, whether it is historically available data or a constant stream of data and generating information that is in an understandable form for the outside world or downstream interfaces.
Implementation of a successful Machine Learning Algorithm that is working in the industrial environment, and producing trustable results is not very easy. It requires an unusual combination of domain knowledge, problem-solving mindset, and the people who are good in cracking numbers and love statistics.
Some of the challenges of Machine Learning implementations are listed here and should be kept in mind while designing the solution.
- Selection of right algorithm: There are tens of widely popular algorithms available for ML implementation. Though algorithms can work in any generic conditions, there are specific guidelines available about which algorithm would work best under which circumstances. Improper selection of algorithm can produce garbage output after months of effort – leading to massive loss of the entire effort and pushing the target timelines further.
- Selection of the right set of data: As they say – Garbage in will produce Garbage out, which is very well suited for the range of data set for machine learning. The quality, amount, preparation, and selection of data are critical to the success of a machine learning solution. Data selection may be impacted by Bias. It is important to avoid selection bias and select the data which is entirely representative of the cases.
- Data Preprocessing: historical data is very messy and often consists of missing values, valueless values, outliers and so on. Parsing, cleaning and preprocessing of such data can be a tedious job. Feature properties and value ranges have to studies and techniques like feature scaling need to be applied to prevent certain features from dominating the entire model.
- Data Labelling: Easy and more appropriate models are the ones used in Supervised ML algorithms. Unsupervised ML algorithm selection and implementation is very tedious and lengthy process sometimes requiring several unsuccessful iterations. Supervised ML algorithms require data labeling. Data labeling is a manually intensive task – but at the same time can’t be outsourced just like that. The classic case, for example, is health care systems. For the predictive diagnosis to work, the available medical data has to be labeled. The labeling requires constant input from medical experts/doctors. However, specialized medical experts /doctors view this activity of labeling a simple waste of their time.
There are a lot of other challenges. Managing model versions, managing data versions, reproducing the models, etc. Good ML skills are in scarcity, and if there are frequent changes in the teams working on complex implementation, numerous personnel changes can be a nightmare. Machine learning process is a constantly evolving process – systems and their features are changing at regular intervals which need to be incorporated in the machine-learning setup. When there is a time to tweak the systems, teams often find that earlier model, features, and datasets were not documented appropriately. The team which implemented the system moved on. In the absence of key personnel and the documentation, it becomes a nightmare for the current team to maintain the system.
The article was written for IIoT World by Anil Gupta, Co-founder of Magnos Technologies LLP. He has about 23 years of experience in Connected Cars, Connected Devices, Embedded software, Automotive Infotainment, Telematics, GIS, Energy, and Telecom domain.