From the convocation speech of
Padma Shri Prof. Sankar K. Pal
Former Director & Distinguished Scientist
Indian Statistical Institute, Kolkata during the NIT-Calicut’s 14 th convocation
“We are in the midst of what is popularly called Information Revolution and are living in a so-called World of Knowledge, where great volume of data is constantly being generated
all around us. ”
“Data Science and Technologies: Challenges, Opportunities and National Relevance”, which is a cutting-edge research area and has enormous relevance in the context of national development; also that it is a “must-know” subject to any technologist, applied scientist or
practitioner dealing with data. It is also likely to be a subject of interest to a common man.
There are three kinds of spaces, namely, physical/ natural space, social space and data space; they lead to physical/natural science, social science, and data science respectively. Initially, there existed only physical/ natural space. To describe the phenomena in natural world,
human society played the role and thereby created the social space. Ubiquitous digitization of both natural world and human society has produced huge amount of data and generated the space, called data space, which led to data science, also known as Data-driven science.
This is an interdisciplinary field of scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.
Data Science can go with different names such as Data Analytics, Big Data Analytics, Business Analytics, Data Mining and Knowledge Discovery, and Applied Statistics.
The different sectors from where the large amount of data is available include: government, communication and media, manufacturing, banking, health-care providers, securities and investment services, education, transportation, insurance, resource industries, and construction.
Depending on the sector, the type of data generated could be image, video, audio, and text/ numbers; thereby signifying its heterogeneity character. Other major sources of the growth of data space are social networks and mobiles, besides ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, digital cameras, microphones, RFID (radio-frequency identification) readers, and wireless sensor networks. While the use of social networking applications through PCs and smart phones is increasing each day, the growth is more prominent in case of latter.
Data Science Research: Challenges and Issues Data science research primarily concerns with the acquisition, storage, retrieval, processing, mining and finally the conversion of data into
However, the large data, to be dealt with, is highly complex in nature, and capturing, storing, processing and mining them are not straight forward tasks. Most of the information is heterogeneous, time varying, redundant, biased, uncertain and imprecise.
Additional issues involved, over those of the existing data mining and knowledge discovery
processes, include usage, quality, context, streaming and scalability. To reason, understand and mine useful knowledge from these data is becoming a great challenge. It requires exceptional technologies to efficiently process within tolerable elapsed time and deliver accurate predictions of various kinds in agile platforms.
Accordingly, it demands
a revolutionary change both in research methodologies, technologies, and tools. One may note that although, this is not necessarily restricted to Big data, methods that are scalable to big data are of much interest to data science.
Big Data Potentiality in India:
With the increase in social media usage and adoption of information technology by different sectors, like banking, financial services,insurance, retail and hospitality, Big Data has drawn the attention of Indian enterprises and so has Data analytics for business innovation.
Though realization is there, expertise is still lacking. Several large enterprises are either in the process of starting or contemplating the use of Big Data analytics, whereas the small and medium businesses are not there yet. Interestingly, the analytics organizations in India (that
provide services externally around analytics and related fields) have grown recently both in number and size.
A significant application of Big Data analytics in India is that it can be leveraged by the Central and State Governments for reform and implementation of the various policies and government schemes from time to time. These include: Analysis of periodically collected data about delivery, outcomes and impact of the education initiatives and health care initiatives at primary, secondary and tertiary level for formulating the education and health care policies respectively; deciding the funding policies and keeping a track of improvement and the growth in a particular region in Direct Benefit Transfer scheme; and monitoring
the citizen-related initiatives from AADHAAR information. Other prominent application areas in India include Township planning, Tax administration, River network optimization and Unemployment analysis.