Data deduplication has been around, at least in its most primitive form, since the 1970’s. It initially started because companies wanted to store a large amount of customer contact information without using a large amount of storage space. One of the first ideas was to go through and remove duplicate data. For example, a company might have an address for shipping, and an address for billing to a given customer. In these cases, those identical addresses would be combined into one file. This was done by data entry clerks who would review the data line by line and get rid of duplicates.
Of course, the amount of personnel needed to do this was extensive, and it took a very long time. Sometimes, the data deduplication process would take months to complete. However, considering that most of this occurred on hard copy, it wasn’t a major problem. The big problems came along when computer use became widespread in office environments.
With computers in wide use and the explosion of the internet, the amount of data available exploded as well. Backup systems were created to ensure that companies would not lose all their data. As time went by, floppy discs and other external hardware were used to store this data. Unfortunately, this data would soon fill up these discs, and the amount of space to store this information was extensive.
With cloud storage and other alternative storage options, companies began moving their storage to a virtual environment. They also moved to disk-based storage over tape-based, simply because it was low-cost and required less space. However, these storage options were expensive and difficult to manage as data grew. The same data would get saved over and over again. This redundant data was not needed and took up valuable storage space.
Companies might have customised their backup plans to eliminate duplication, but there was no fast way to do this. That is when IT professionals began working on algorithms to automate the data deduplication process. They generally did this on cases by case basis, with their goal to optimise their own backup files. Their algorithms would be customised to meet their own needs.
No one company came up with the idea of data deduplication. Instead, the need to find a way to reduce duplicate files was a common need in the industry. There were many computer scientists who advanced data deduplication technology significantly, but there is no one scientist who was solely responsible for it. While many have claimed credit for the term ‘data deduplication’ itself, no one person can claim credit for the idea itself.
Instead, the creation of data deduplication algorithms was more of a compilation. People in the IT industry saw a need to reduce data duplicates, and they filled that need to minimise those duplicated files by creating algorithms. As data increases, people will continue to find ways to compress data in a way that makes it easy to store.