Posted: . At: 4:49 PM. This was 4 years ago. Post ID: 13845
Page permalink. WordPress uses cookies, or tiny pieces of information stored on your computer, to verify who you are. There are cookies for logged in users and for commenters.
These cookies expire two weeks after they are set.


A Data Warehousing solution using SQL Server


I have been researching a system for deploying a database cluster. I have chosen the Microsoft SQL Server product. This does support server clustering and uses nodes clustered together on a network to create a failover network that will automatically maintain operation if one node is to fail. A virtual network of database nodes spread across a network would be sufficient to support a data warehouse. The SQL installation will detect that the clustered servers are on the network and prompt you to add the server as a node. The server itself will be virtualized on the virtual instance and the data itself will be stored on a shared SAN-NAS disk that will be used by all virtual SQL server nodes. All servers then are virtualized and if one goes down; the data is not lost. The new instance may be installed and the new instance is connected to the data store to resynchronize with the database. I have experimented with SQL server 2012 and Windows Server 2012. The clustering will function perfectly when the servers are joined together as one whole network. The server requires the aforementioned drive that will be shared storage for all the SQL server instances. A shared folder mapped to a drive letter does not seem to work at all.

My data warehousing solution will use Microsoft SQL Server 2017 Enterprise. This product supports data warehousing as a feature. This uses a database to handle massive amounts of data that is generated by a business. This is intended to store data that will not change once entered, this is historical data that is intended for future reference. A data warehouse is typically not in third normal form as a normal database will be. One main feature of a data warehouse is to support ad-hoc queries. This means that the database must be constructed to accommodate this. Optimizing the query performance is very important; usually, a star schema is used to achieve this. Spreading the load across a cluster of SQL servers would be an important way to achieve a fast and useful data warehouse. One important operation in an SQL server warehouse is to clean and process data before entering it into the database. This is typically done via a staging area, this processes data and simplifies the process of building summaries before the data ends up in the data warehouse. Meta-data is also added. Making the task of data mining very easy when a researcher wants to connect to the database to find out sales information.

A database warehouse setup on SQL server 2017 on a HyperV instance would require a balanced configuration that involves sharing of the load between the nodes in the database cluster. Depending upon the load imposed on the cluster by a large number of users, testing must be carried out to evaluate how it will perform. Setting up the cluster and then providing a set of sample data would allow stress testing of the servers to determine what level of performance they will provide and if any tuning of the configuration is required. Testing of the database data delivery would involve sample data and writing SQL queries that test for duplicate data and the speed of the connection. Another important test is to verify that data is loaded correctly into fields without being truncated and/or the correct field formats are used to avoid possible data corruption, e.g the correct CHAR format. The correct data length and format is essential to match the data that will be stored in each table. This will avoid loss of data due to incorrect database schema.

Of course, PostgreSQL for Linux would also work, but I did a design for Windows SQL Server a while ago and the Windows product came up with the goods when it came to redundant data warehousing. MySQL is not very suited to this type of work. PostgreSQL and Python would be a better option. But it would be a lot of work. The main thing is a redundant server architecture with many shared components to avoid data loss due to an accident.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.