Science & Technology

What is sharding? Why is it important for database?

Sharding is horizontal(row wise) database partitioning as opposed to vertical(column wise) partitioning which is Normalization. It separates very large databases into smaller, faster and more easily managed parts called data shards. It is a mechanism to achieve distributed systems.

Why do we need distributed systems?

MacBook Pro with images of computer language codes
  • Increased availablity.
  • Easier expansion.
  • Economics: It costs less to create a network of smaller computers with the power of single large computer.

You can read more about it here Distributed database — Tutorialspoint.

Also Read: Python Basics

How sharding help achieve distributed systems?

You can partition a search index into N partitions and load each index on a separate server. If you query one server, you will get 1/Nth of the results. So to get complete result set, a typical distributed search system use an aggregator that will accumulate results from each server and combine them. An aggregator also distribute each query onto each server. This aggregator program is called MapReduce in big data terminology. In other words, Distributed Systems = Sharding + MapReduce (Although there are other things too).

A visual representation below:

database