2 min read
By- Dr. Tannistha Maiti

The Oil and Gas (O&G) industry constitutes one of the largest groups of enterprises in the world. Processes within the industry such as Oil and natural gas upstream, midstream, and downstream processes constantly generate large amounts of data and are immensely dependent on sophisticated technologies to reveal new insights in the business i.e. prevent equipment malfunctioning and improve operational efficiency.

In recent times, the industry has become a trend-setter in technology and is moving towards automation and hence the dependence on artificial intelligence is increasing.

The oil and gas industry faces an unprecedented shift in its workforce as automation and artificial intelligence (AI) continue to transform the way companies operate. Now more than ever, oil and gas organizations are using technology to drive down production costs to improve margins as they fight prolonged drops in oil prices.

(a) The Oil & Gas industry is turning to innovation and technology in a bid to boost efficiency and buoy profits.
(b) The global spotlight on sustainable practices to reduce emissions and water consumption has made the O&G companies seek new ways to optimize upstream operations and eliminate practices that waste time and money.

The increasing dependence of O&G enterprises on Deep Learning / Machine learning models requires a robust data pipeline. Voluminous sets of data are generated in the upstream business. The data size usually reaches Petabyte (=1024Terabyte) or Exabyte (=1024 Petabyte). The complete set of data consists of approximately 40000 files.

MLOps refers to the standardization and streamlining of machine learning lifecycle management. It refers to the concept of automating the lifecycle of machine learning models from data preparation and model building to production deployment and maintenance.

The MLOps pipeline can be broadly divided into four tasks:
  • Data Collection: Seismic sections are noisy semi-structured datasets of seismic events that are collected from a specific region. The time-section data are in segy format. Detailed ETL pipelines are used to process data from the segy format to NumPy arrays. A nice article on ETL processing can be found here.
  • Data Aggregation: This step combines data engineering and data science knowledge, with the goal of assuring the quality control, security, and integrity of the data.
  • Model Train: Deep learning is an increasingly popular subset of machine learning. Deep learning models are built using neural networks.
  • Deployment: Once the model is trained you need to evaluate the results. It’s important to understand what happens when a model gets deployed. Once deployed you access the model through a RESTFUL API.
A proposed MLOps pipeline for the O&G industry from PETAI
A proposed MLOps pipeline for the O&G industry from PETAI

To know more, please visit PETAI to learn more about the research about AI and MLOps.

About the Author:

Dr. Tannistha Maiti is a Ph.D. in Geophysics and Seismology from the University of Calgary. She has worked on various topics that range from Earth Science to computation and mathematical modeling. Passionate about machine learning applications in the Energy and healthcare sector. She has extensively worked in mathematical modeling and computational Geophysics during her Ph.D. at U Calgary and MS studies at Virginia Tech. She also holds an undergraduate degree from the prestigious university at IIT Kharagpur, India.

In her research career spanning for about 10 years, she has been involved in various projects and has more than 20 peer-reviewed publications in various conferences and journals. She also has extensive teaching experience and is passionate to share knowledge with peers and newcomers. During her Ph.D. studies, she received various accolades for her work. She participates in various research projects and supervises several deep learning internships within deepkapha.ai. In her spare time, she enjoys blogging both technical and non-technical.