Big Data Analytics Assignment

Posted: January 10th, 2023

Big Data Analytics Assignment

The Konstanz Information Miner on the KNIME Analytics Platform is a free data mining tool available for anyone to use. It is thus an open free data mining and analysis software useful for healthcare data analysis. It creates access to a continuously evolving and improving tool for real-time data mining and analysis and can be accessed free of charge (KNIME AG, n.d.). In data analysis circles, the KNIME (Konstanz Information Miner) is a leading open source data mining tool (Mazanetz et al., 2012). The available commercial alternatives include Insightful Miner, InforSense KDE, and Pipeline Pilot amongst others (Berthold et al., 2009). This paper takes a look at the KNIME Analytics Platform, its uses, comparison of benefits with the commercial alternatives, and the four “Vs” of big data analytics in healthcare.Big Data Analytics Assignment


How the KNIME Analytics Platform Can be Used to Analyze Data in Healthcare

The KNIME (Konstanz Information Miner) has a visual assembly of data workflows drawn from its extensive repository of data analysis tools (Mazanetz et al., 2012). It can be used to analyze healthcare data through its incredible user-friendly functionality. To begin with, you can use it to build visual healthcare data workflows. You can then blend tools coming from diverse domains in the healthcare realm. The platform actually gives you a choice of hundreds of freely available workflows from the KNIME Hub to work with. If you wish, however, you can also create your own workflow using the KNIME integrated workflow coach. You can combine healthcare data coming from any source by:

  • Opening and combining texts from formats like PDF, XML and so on.
  • Connecting to a variety of healthcare data warehouses and databases and then combining healthcare data from sources like Microsoft SOL, Apache, Oracle, and so on.
  • Accessing and retrieving healthcare data from sources such as Google Sheets, Twitter, and so on (KNIME AG, n.d.).

After doing this, you can then shape your healthcare data as you deem fit. This is lossible by employing derived statistical measures such as standard deviation, mean, or by using nonparametric statistical tests like t-tests, Chi-square, or analysis of variance (ANOVA) to test hypotheses. In doing this, additional useful statistical measures that could also be integrated into your workflows are correlation studies and linear regression. The KNIME Analytics Platform will give you the ability of aggregating, filtering, sorting, or combining healthcare data on your own computer or in a large healthcare database that is hosted remotely. You will be able to work at cleaning the healthcare data by weeding out outliers and artefacts, thereby normalising the healthcare data. The Konstanz Information Miner tool allows you, through an in-built algorithm, to detect and correct anomalies and outlier values. One is also able to extract in-built features or personally construct new tailor-made ones on their own as they engage in healthcare data analysis using the KNIME platform. These help the person to suit their healthcare data for machine learning. Tbey can then be able to do analysis of the healthcare data by applying formulas and other rules of statistical analysis (KNIME AG, n.d.).Big Data Analytics Assignment

In carrying out healthcare data analysis, you can exploit machine learning and artificial intelligence (AI) using with KNIME by (i) constructing models for machine learning yo be used for operations such as regression, classification, or clustering by use if complex algorithms, (ii) optimizing the performance of the model using, for instance, hyperparameter optimization, (iii) validating models through known performance metrics such as ROC and AUC, (iv) explaining learning models in artificial intelligence that have Shapley and LIME values, (v) being able to make predictions directly or using PMML (KNIME AG, n.d.).

With the KNIME Analytics Platform, you can also make healthcare data discoveries and share insights derived from the same by (i) visualisation of the healthcare data by way of classic presentations such as scatter plots or bar charts. This same functionality is also possible by way of advanced presentations using network graph, heat map, or parallel coordinates. After the visualisation you are then able to extract the presentation you need and customise it to your individual needs. (ii) Being able to display summarised statistical healthcare data in a KNIME table, removing any information that is not valid or relevant. (iii) Having the ability to export data analysis reports in forms such as PowerPoint presentations, PDF, Excel, or other commonly available formats so that yhe results of your healthcare data analysis could be presented to stakeholders. (iv) Being able to store or archive this healthcare data that has been processed in many file formats, either in your computer or in a remote data warehouse (KNIME AG, n.d.).

Lastly, by using the KNIME Analytics Platform one can scale up or down the data analysis execution depending on their needs or demand. They can do this by (i) building prototypes of workflows to see how various other approaches to the analysis of healthcare data function, (ii) scaling up the performance of one’s workflow by multifaceted healthcare data processing on several fronts on the same platform and also memory streaming, (iii) benefiting from the immense capacity of in-database processing possessed by the Konstanz Information Miner. This can also be achieved by spreading out computing to increase even further one’s computational performance (KNIME AG, n.d.).

The Benefits of the Open Free KNIME Software Compared to Other Commercial Tools with Similar Functionality and Use  

The first benefit of the KNIME Analytics Platform or software in comparison with the other commercially available alternatives is in the cost. In this regard its cost-effectiveness cannot be disputed. Being a free software, it is attractive to many healthcare institutions and healthcare research organizations because research is inherently expensive in its own right. According to Mazanetz et al. (2012), commercial data mining software are usually prohibitively expensive. To reduce costs, a healthcare organization’s informatics department would therefore rather choose the KNIME Analytics Platform as opposed to the other commercial data mining tools such as InforSense of Pipeline Pilot. The main reason for this is that even though the Konstanz Information Miner is available in an open and free version, it still possesses all the functionalities and usability that a commercial data mining tool has. The only differentiating factor is therefore the cost, one being free and the other only available through purchase (Mazanetz et al., 2012; TrustRadius, n.d.).Big Data Analytics Assignment


            The other benefit is that the KNIME Analytics Platform is very user-friendly to anyone even without a computer science background. In comparison, the other commercial alternatives are not simplified enough and come with an implicit requirement of knowledge of computer programming languages such as Python/R. KNIME Analytics Platform solves this by having drag-drop nodes that do everything for you without the need for coding. With these nodes, one is able to produce output which simply becomes the other person’s input in a workflow. Yet another benefit is that the KNIME Analytics Platform is visual and intuitive. This makes it possible for people without a computer science background to follow the workflow effortlessly, debug, and troubleshoot (TrustRadius, n.d.). In all, the Konstanz Information Miner allows for simple visual assembly and interactive execution of a healthcare data pipeline in a workflow (Berthold et al., 2009).

Defining the Four “Vs” of Big Data Analytics in Healthcare

According to Raghupathi and Raghupathi (2014), there has been an exponential increase in the amount of healthcare data generated over time such that it may be impossible to locate a particular piece of information that one may desire from this data stockpile. They posit that the problem has also been that some of this healthcare data is not structured, such as nursing notes, physician notes, and discharge summaries. With the advent of electronic health records, however, it became possible to code this data and enter it into an electronic system that could later be accessed. Recent strides in computing and electronics has made the task of accessing this healthcare information for analysis and decision making purposes possible. It is in this regard that electronic data mining tools such as the Konstanz Information Miner on the KNIME Analytics Platform have come in handy. In this context, the four “Vs” of healthcare data analytics are volume, velocity, variety, and veracity (Raghupathi & Raghupathi, 2014).


Over time, it is undeniable that a very large volume of healthcare data has accumulated. Traditionally, this data has to be kept and preserved not only for legal reasons, but also for medical, research, and ethical reasons. At one time or another, one may want to look at some particular piece of healthcare information from this giant pile of data. This volume of data includes individual patient medical records, clinical trial data, radiology images, and population gene sequencing data amongst others. Therefore, volume (of healthcare data) is the first important “V” in healthcare analytics (Raghupathi & Raghupathi, 2014). Big Data Analytics Assignment


This refers to the fact that healthcare data is now being captured at a much faster rate or velocity, thanks to developments such as cloud computing, artificial intelligence (AI), and newer, faster computers. The healthcare data is being captured in real-time, and this adds to the speed or velocity at which the data accumulates. Healthcare data that exemplify this velocity include continuous glucose monitoring and continuous real-time electrocardiogram (ECG) readings in the intensive care unit (Raghupathi & Raghupathi, 2014).


There is enormous variety in healthcare data. It is both structured and unstructured. Currently, it is not uncommon to  find most of this healthcare data in multimedia formats. This variety is buttressed by the ability to perform real-time healthcare data analytics on high-volume healthcare data streaming in real-time across all specialties. Examples of multimedia formats representing variety of healthcare data may include audio files of Mental Status Examination (MSE), static X-ray images, motion files of ultrasound scans or magnetic resonance imaging, and scanned clinician notes (Raghupathi & Raghupathi, 2014).


This is the fourth and last of the 4 “Vs” of healthcare data analytics. It means that the large volume of healthcare data, its storage, mining, analytics, and outcomes must be error-free and credible. An example is the inaccurate reading of a clinician’s poor handwriting leading to errors in data entry or interpretation hence loss of veracity (Raghupathi & Raghupathi, 2014). Big Data Analytics Assignment


The KNIME Analytics Platform is a revolutionary data analytics software that can be harnessed by the healthcare sector to sift through millions of gigabytes of healthcare data and analyse it. The biggest advantage is that it is open and free apart from being user-friendly to the lay person without deep knowledge of computing and programming. Its usability and functionality is proven because it takes care of all the four “Vs” of healthcare data analytics. These are volume, velocity, variety, and veracity.


Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kotter, T., Meinl, T…. & Wiswedel, B. (2009). KNIME – The Konstanz information miner: Version 2.0 and beyond. AcM SIGKDD Explorations  Newsletter, 11(1), 26-31.

KNIME AG (n.d.). KNIME analytics platform: Creating data science. Retrieved 4 May 2020 from

Mazanetz, M.P., Marmon, R.J., Reisser, C.B.T & Morao, I. (2012). Drug discovery applications for KNIME: An open source data mining platform. Current Topics in Medicinal Chemistry, 12(18), 1965-1979. DOI:

Raghupathi, W. & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(3). Doi: 10.1186/2047-2501-2-3

TrustRadius (n.d.). KNIME analytics platform vs RStudio. Retrieved 4 May 2020 from


Big Data Analytics Assignment


Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
Live Chat+1 (631)333-0101EmailWhatsApp