Performance
We take performance serious and performance is always on top of mind when adding new features to uMarketingSuite.
Within the uMarketingSuite we have implemented several performance optimizations and there are also some configuration you can do yourself to optimize the performance of your Umbraco solution with uMarketingSuite.
Seperate processes for storing, parsing and reporting
As documented in the dataflow process there are different steps for collecting, storing, parsing and reporting the data. This is primarily done for performance reasons.
Data collection
The collection is done in memory of the webserver (or webservers if you have multiple webservers in a loadbalanced setup). This will have some impact on the available memory on the webserver but this is probably pretty limited. You can tweak the parameters 'FlushRateInRecords' and 'FlushIntervalInSeconds' in the configuration file, when you want to use more or less memory.
Storing causes the data to flow from the memory to the database. The memory is free again and can be used for other data. The data is stored at that moment in the raw datatables.
In our production websites we see that the average datasize per record in the table uMarketingSuiteAnalyticsRawPageView is 0,9 kb. This means that every visitor stores 0,9 kb data.
For the uMarketingSuiteRawClientSideData it depends a bit on the implementation of the clientside events. If you track a lot of custom events this table will probably be bigger. On average we see that this is around 0,4 kb per visitor that can execute JavaScript (all bots are excluded because of this).
This data can be found for your own database with the following SQL Statement:
/***
Copied and applied for the raw datatables from https://support.managed.com/kb/a227/how-to-find-large-tables-in-sql-database.aspx
* Find the number of rows and the size of tables
***/
CREATE TABLE #temp (
table_name sysname ,
row_count INT,
reserved_size VARCHAR(50),
data_size VARCHAR(50),
index_size VARCHAR(50),
unused_size VARCHAR(50))
SET NOCOUNT ON
INSERT #temp
EXEC sp_msforeachtable 'sp_spaceused ''?'''
SELECT a.table_name,
a.row_count,
COUNT(*) AS col_count,
a.data_size
FROM #temp a
INNER JOIN information_schema.columns b
ON a.table_name collate database_default = b.table_name collate database_default
OR REPLACE(REPLACE(a.table_name, '[dbo].[',''),']','') = b.table_name collate database_default
WHERE a.table_name LIKE 'uMarketingSuite%raw%'
GROUP BY a.table_name, a.row_count, a.data_size
ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC
DROP TABLE #temp
Data parsing
In the data parsing the data is fetched from the raw data tables and stored in normalized tables. At that moment the raw data could be deleted and only the normalized data tables are needed. The data parsing is the heaviest step of the total process so this is where we put most of our performance TLC in.
The data parsing process runs in a background job on the webserver. Within the configuration file you can specify how many records are fetched to parse and how often the process needs to run. It's also possible to specify which server(s) needs to be responsible for the parsing process.
On average we see that the amount of stored data is only 0,1 kb per visit. This is only 10% of the original amount in the raw data tables.
The SQL Script for determining that is:
/***
* Copied and applied for the uMarketingSuite datatables from https://support.managed.com/kb/a227/how-to-find-large-tables-in-sql-database.aspx
* Find the number of rows and the size of tables
***/
CREATE TABLE #temp (
table_name sysname ,
row_count INT,
reserved_size VARCHAR(50),
data_size VARCHAR(50),
index_size VARCHAR(50),
unused_size VARCHAR(50))
SET NOCOUNT ON
INSERT #temp
EXEC sp_msforeachtable 'sp_spaceused ''?'''
SELECT SUM(a.row_count) as [total number of rows],
SUM(CAST(REPLACE(a.data_size, ' KB','') as integer)) as [total data size]
FROM #temp a
WHERE a.table_name LIKE 'uMarketingSuiteAnalytics%' AND NOT a.table_name LIKE 'uMarketingSuiteAnalytics%raw%'
SELECT a.table_name,
a.row_count,
COUNT(*) AS col_count,
a.data_size
FROM #temp a
INNER JOIN information_schema.columns b
ON a.table_name collate database_default = b.table_name collate database_default
OR REPLACE(REPLACE(a.table_name, '[dbo].[',''),']','') = b.table_name collate database_default
WHERE a.table_name LIKE 'uMarketingSuiteAnalytics%' AND NOT a.table_name LIKE 'uMarketingSuiteAnalytics%raw%'
GROUP BY a.table_name, a.row_count, a.data_size
ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC
DROP TABLE #temp
Show me more numbers
With the two SQL scripts above you can find out how much data is used by the uMarketingSuite. If you want to see how long the processing step takes you can run this script to see the processing speed for the raw pageviews:
SELECT AVG(isnull(datediff(ms, processingStarted, processingFinished),0))
, CAST(processingStarted as DATE)
FROM [uMarketingSuiteAnalyticsRawPageView]
WHERE processingstarted IS NOT NULL
AND processingFinished IS NOT NULL
AND processingFailed = 0
GROUP BY CAST(processingStarted as DATE)
ORDER BY CAST(processingStarted as DATE)
and for the raw clientsidedata:
SELECT AVG(isnull(datediff(ms, processingStarted, processingFinished),0))
, CAST(processingStarted as DATE)
FROM [uMarketingSuiteAnalyticsRawClientSideData]
WHERE processingstarted IS NOT NULL
AND processingFinished IS NOT NULL
AND processingFailed = 0
GROUP BY CAST(processingStarted as DATE)
ORDER BY CAST(processingStarted as DATE)
We try to keep the average parsing speed under the 100 ms. Please let us know if you see anything different.
Optimized infrastructure
You can also optimize your server infrastructure to tweak the performance. There are a few options that you could apply:
- You could setup more webservers in a loadbalanced setup. Each of these webservers will collect data of the visitor, but you can specify in the configuration which webserver(s) is responsible for the parsing of the data. You could also setup one specific server only for parsing the data. In that case the other webservers will have almost no impact on their performance. To set this up you need to set the parameter 'IsProcessingServer' to 'false' in your configuration file for all servers that do not need to process the data and set it to 'true' on the server(s) that is responsible for parsing. If there is not server with this setting set to 'true' the raw data of the uMarketingSuite will take place, but the data will never be processed.
- By default the uMarketingSuite stores its data in the same database as Umbraco. It uses the default connection string of Umbraco (named 'umbracoDbDSN'). It is possible to specify a separate database for all uMarketingSuite data. This could be another database on the same server but also another databaseserver. To do this you need to specify a new connection string in your application and give that connectionstring a name. In the configuration file you can now specify this name in the field 'DatabaseConnectionStringName'.