To start, you might know by now that CosmosDB creates a replica of all the data that we provide in the database. It starts by creating 4 replicas by default and then keeps on adding replicas of the data as it grows. The more are the replicas in the database, the more will be the data availability.
Now that we are clear with the concept of replicas in the CosmosDB, we must understand that to globally distribute the data, we look at the map and click on the location where we want to replicate the data to and save it. This replicates the whole data created in the local Azure CosmosDB to that location following some geo-fencing policies that are set like some countries do not allow the data to be replicated outside their own country.
The other case is when a particular location searched for is not available, you can dynamically add or remove the region. There is another option called failover properties which means that you can set one desired Azure CosmosDB region for the data that you want to store but in case that region is unavailable, you dynamically add another region in place of that.
When accessing the data from the same Azure Data Center, the process of running and retrieving the data is super fast because that is where the whole data is located. However, if you try running the same data for some other location now, you will get results very slowly. To solve this problem, we replicate the data globally as follows:
After clicking the replicate data globally option, we chose the regions as the read regions and the write regions and click save. We can choose multiple regions as read and write regions. This replication process takes quite a long time and needs a wait time.
After setting this up, we also need to configure the order of regions and the fail over properties required to set up the regions. Now we need to establish a connection policy which says which region to try first for reading data. To do that, we write a simple code in our .cs file as follows:
var connectionPolicy= new
To reference this policy with the client, we use:
var client = new DocumentClient( new Uri(endpoint), masterKey, connectionPolicy)
After having done this, we need to acquaint ourselves with another concept called Dirty Read.
What is Dirty Read?
We know that when the data is replicated 4 times as default in one Azure Datacenter, we need to make sure that the data we are reading at any given point of time is the latest version of that data. It is very much possible that the data that we are reading in any one of the total replicas produced, is not the latest version of the original data.
So, in case, we are reading the old version of the data in any of the replicas, this is then called Dirty Read.
So basically, there needs to be a consistency of data in all the different replicas that we have of the same data. Azure CosmosDB provides us five consistency levels ranging from Strong to eventual.
Strong here refers to:
– Higher Latency
– Strong Consistency
– Low Availability
– Low Latency
– Weaker Consistency
– Higher Availability
In case of Strong consistency, it is ensured there is no Dirty read but the cons here will be that data will be available for reading only when all the replicas are updated, resulting in increased wait time.
In Eventual, there is no guarantee of Dirty Read but no waiting time too.
However, the concept of consistency and dirty read comes into picture mainly when we start geo-replicating the data, because in one Azure data center the data moves quite quickly resulting in all the replicas getting updated very quickly, lessening the chances of Dirty reads.