Data Gathering for Veeam NAS Backup

This post is about a possible method of calculating change rate and the archive rate of a given filesystem.

The issue that a lot of people have is calculating the change rate of a filesystem in order to be able to accurately estimate the repository storage for Veeam. In addition to this it can help with estimating what can be archived.

In my search for a method of watching a filesystem using code, I found a NirSoft program that already does what I needed.

https://www.nirsoft.net/utils/folder_changes_view.html

Folder Changes View (FCV) monitors a filesystem and shows the changes in real-time including modifications, created files and deletions. Unfortunatly the latter doesn’t provide the capacity of the file, but I guess we can’t have it all.

Note that I did look at a Python package called ‘Watchdog’, which was promising but was going to be a lot more work.

The advantage of doing it this way as opposed to scanning a whole file system is that doesn’t put any pressure on the filesystem and still gets relevant data. There is still a place for a full file anaysis, for example file type breakdowns; however, this is a ligher weight option.

Important Note: Though I am confident in the FCV application, you can’t be too careful. I recommend that you run it on a dedicated VM and that the user account has Read Only permissions.

FCV can export a HTML file which holds a table, this is great when it comes to Python as we can grab that table very easily using Pandas. That in turn will turn the file into a dataframe which can be easily manipulated.

However, we don’t really want to be sending potentially sensitive information over the internet, so we really need to anonymise the data as much as possible and save it in a secure state before sending.

I have written a program that will do all the work for us which is up on my Github page.

https://github.com/shapedthought/file_html_report_processor

The program will do the following:

  • Import the HTML data
  • Hash the filenames
  • Remove the Path and File Owner information
  • Export a randomly generated encryption key saved to a file
  • Save the resulting data in a text file in an encrypted string

The file containing the data can be sent securely via an email, with the encryption key sent seperatly.

Inversely the application will convert the data to json with the encryption key, this can then be fed back into Python Pandas to do start analysis.

Note: this program is provided with the MIT licence, please review before using.