top of page
Search
Writer's pictureRosmina Joy Cabauatan

Applying Regression Data Mining Technique in Java

In using the regression data mining technique in Java, weka.jar is needed as part of the Java project referenced libraries. JAR stands for Java ARchive. It is a package file format typically used to aggregate many Java class files and associated metadata and resources into one file. JAR files are archive files that include a Java-specific manifest file. They are built on the ZIP format and typically have a .jar file extension. Waikato Environment for Knowledge Analysis or commonly referred to as WEKA developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques of Jiawei Han.

 
 

WEKA Software


Weka contains a collection of algorithms and visualization tools for data analysis and predictive modeling, together with graphical user interfaces for easy access to different functions.


The weka.jar can be downloaded here. The example below will work well with version 3.7.0. You may use the latest software but some classes and functions were deprecated, hence would require you to research alternative solutions.


 

Eclipse IDE


Eclipse IDE is free and open-source software released under the terms of the Eclipse Public License. The Eclipse platform which provides the foundation for the Eclipse IDE is composed of plug-ins and is designed to be extensible using additional plug-ins. Developed using Java, the Eclipse platform can be used to develop rich client applications, integrated development environments, and other tools. Eclipse can be used as an IDE for any programming language for which a plug-in is available.

Details about Eclipse 2020-03 can be viewed here. You can also download the IDE by visiting the Eclipse website.



 

Working with Eclipse


For the purpose of using the WEKA libraries, Eclipse requires the weka jar file to be integrated as part of its reference libraries of the Java project. The weka jar file is located at WEKA's installation path, usually in the Program Files of your system's main drive.


After adding the weka jar file to the Java project, create a Java class to test if Eclipse can execute the weka library. This requires the weka .core library to be imported into your class.



Test weka.jar in Eclipse IDE


import weka.core.*;


public class _1_TestWekaLibrary {

public static void main(String args[]){

System.out.println("Weka loaded.");

}

}


 

Standard WEKA File


An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software.


ARFF files have two distinct sections. The first section is the Header information, which is followed by the Data information. The Header of the ARFF file contains the name of the relation, a list of the attributes (the columns in the data), and their types. An example header on a standard dataset looks like:



 

@RELATION house

@ATTRIBUTE houseSize NUMERIC

@ATTRIBUTE lotSize NUMERIC

@ATTRIBUTE bedrooms NUMERIC

@ATTRIBUTE granite NUMERIC

@ATTRIBUTE bathroom NUMERIC

@ATTRIBUTE sellingPrice NUMERIC



 

The Data of the ARFF file looks like the following:


@DATA

3529,9191,6,0,0,205000

3247,10061,5,1,1,224900

4032,10150,5,0,1,197900

2397,14156,4,1,0,189900

2200,9600,4,0,1,195000

3536,19994,6,1,1,325000

2983,9365,5,0,1,230000


 

Loading ARFF Dataset in Eclipse


To load an ARFF file to test how Eclipse and Java work with WEKA, create a Java class that contains the following codes.


import weka.core.Instances;

import weka.core.converters.ConverterUtils.DataSource;


public class _2_LoadDataset {

public static void main(String args[]) throws Exception{

DataSource source = new DataSource("\\data\\house.arff");

Instances data = source.getDataSet();

System.out.println(data.numInstances()+" instances loaded.");

System.out.println(data.toString());

}

}


 

Linear Regression in Java


import weka.core.Instance;

import weka.core.Instances;

import weka.core.converters.ConverterUtils.DataSource;

import weka.classifiers.functions.LinearRegression;


public class _3_Regression{

public static void main(String args[]) throws Exception{

//Load Data set

DataSource source = new DataSource("data\\house.arff");

Instances dataset = source.getDataSet();

//set class index to the last attribute

dataset.setClassIndex(dataset.numAttributes()-1);

//Build model

LinearRegression model = new LinearRegression();

model.buildClassifier(dataset);

//output model

System.out.println("Regression Model : "+model);

// Now Predicting the cost

Instance myHouse = dataset.lastInstance();

double price = model.classifyInstance(myHouse);

System.out.println("-------------------------");

System.out.println("Predicted Price : "+price);

}

}


 


29 views0 comments

Comments


bottom of page