When we talk about database driven applications, it spans its horizon to a big domain of business. Data is the prime element of any business, be it corporate, government, sports, entertainment, retail or trade. Database technology is around for decades, but with increase in the use of technologies in business and society, the database technology has evolved tremendously to support the evolving needs.
The industry has witnessed the evolution from flat file databases to DBMS (Database Management Systems) to RDBMS (Relational Database Management Systems). RDBMS are widely used database technologies, mainly in business applications, like Finance, Production, Retail etc.
There are a few must-have properties of databases for the business applications.
These capabilities are called ACID properties. Most relational databases have these capabilities.
With increase in the data-size and need to analyze the data, RDBMSs provide efficient tuning and data extraction capabilities. However, the rise of Internet of Things and increased use of internet applicationsposing a new bunch of challenges for data architects. Performance, Scalability and High Availability were most crucial among many. A study reveals that number of things connected to internet already exceeded the number of people on the earth. And this ratio increases exponentially.
The conventional database systems could not address the challenges of handling big data, performance and scalability. For example, transactional processing of data from sources like social networks, online shopping, etc. Also, the apps based on RDBMS are brittle in nature to manage or enhance. The structured nature of conventional database systems became bottleneck in such scenarios. A need for unstructured, schema-less, non-relational databases arose.
The case of Craigslist can explain the significance of NoSQL over RDBMS database systems. Craigslist users post about 1.5 million posts daily . And as per legal compliance requirements, no data can be deleted. This means the data had to be archived on periodic basis for the two reasons. One reason is to offload the dead postings from online system and save it from performance paralysis. Other reason is that the data, though it is dead, is important to comply with legal requirements. Till 2011, the archive was done in MySQL clusters. Due to the size of data, this spanned over a hundred MySQL servers, making it very complex. The other challenge; due to the humongous data was that, when the system has a change in data structure, it became a long and expensive effort. So, adding features becomes even complex. Every schema change in live system needs the corresponding change on archive database. To add in that, when these changes were under process, the archive process had to be stopped, which resulted in load on live database and hence hurt the performance of live site.
Craigslist decided to consider NoSQL database system – MongoDB as their archiving system. All the dead postings with their meta-data were archived in the GridFS & Collections respectively in MongoDB. This process was fast and about 1.5 billion postings were archived only in about three months. The biggest advantage with MongoDB is that it’s distributed. Due to the fact that data is distributed among multiple shards, the load is shared and hence, improves performance.
As the Internet of Things (IoT) becomes popular, need for data transmission between Machines have increased. A crucial part of the IoT story is the M2M (Machine-to-machine) where NoSQL database plays vital role. But let me give you an example of M2M technology first. A biometric device captures a finger print at entry point. This data is transmitted through the network to the application server. Here the data is converted into meaningful form. This data is made available in form of analysis or reporting for business use.
M2M technologies provide solutions in healthcare, security and surveillance systems, fraud analytics etc. Here the data is huge and need high performance to be usable. NoSQL data systems are the answer to it. For example, NoSQL can process massive data and run complex data mining algorithms to offer advance healthcare analytics to personalize medicine prescription, transactional processing from real-time applications like patient monitoring; and unstructured data mining like, availing medical literature to the practitioners more efficiently and effectively and uncovering patterns in text, images, audio and video .
In large-volume data storage, M2M (Machine-to-machine) embedded information retrieval and large-scale data processing applications; the need for consistency is minimal and support for complex transaction is not required. Instead, performance, disaster recovery, maintenance and high availability are the highly desirable features.
Many of the NoSQL databases have loosened the consistency requirement. Rather than the requiring consistency after every transaction, it’s acceptable for a database eventually be in consistent state. Consistency is traded against high availability and horizontal scalability . Yes. Ease of scale out horizontally is another highlighted feature of NoSQL databases. You can increase or decrease capacity as you go.
NoSQL databases are classified in 4 categories.
NoSQL is growing very fast and is increasingly becoming the perfect match for various data oriented use cases including Internet of Things and M2M technologies, due to the characteristics like, unstructured, schema-free, easy replication support, simple API, huge data support, BASE (Basic Availability, Soft-state, Eventual consistency). So if your need is for high performance, high availability, scalability and ability to handle huge data in applications like M2M, large data processing and analytics, NoSQL is the one option you should be looking for.