In using the regression data mining technique in Java, weka.jar is needed as part of the Java project referenced libraries. JAR stands for Java ARchive. It is a package file format typically used to aggregate many Java class files and associated metadata and resources into one file. JAR files are archive files that include a Java-specific manifest file. They are built on the ZIP format and typically have a .jar file extension. Waikato Environment for Knowledge Analysis or commonly referred to as WEKA developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques of Jiawei Han.
WEKA Software
Weka contains a collection of algorithms and visualization tools for data analysis and predictive modeling, together with graphical user interfaces for easy access to different functions.
The weka.jar can be downloaded here. The example below will work well with version 3.7.0. You may use the latest software but some classes and functions were deprecated, hence would require you to research alternative solutions.
Eclipse IDE
Eclipse IDE is free and open-source software released under the terms of the Eclipse Public License. The Eclipse platform which provides the foundation for the Eclipse IDE is composed of plug-ins and is designed to be extensible using additional plug-ins. Developed using Java, the Eclipse platform can be used to develop rich client applications, integrated development environments, and other tools. Eclipse can be used as an IDE for any programming language for which a plug-in is available.
Details about Eclipse 2020-03 can be viewed here. You can also download the IDE by visiting the Eclipse website.
Working with Eclipse
For the purpose of using the WEKA libraries, Eclipse requires the weka jar file to be integrated as part of its reference libraries of the Java project. The weka jar file is located at WEKA's installation path, usually in the Program Files of your system's main drive.
After adding the weka jar file to the Java project, create a Java class to test if Eclipse can execute the weka library. This requires the weka .core library to be imported into your class.
Test weka.jar in Eclipse IDE
import weka.core.*;
public class _1_TestWekaLibrary {
public static void main(String args[]){
System.out.println("Weka loaded.");
}
}
Standard WEKA File
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software.
ARFF files have two distinct sections. The first section is the Header information, which is followed by the Data information. The Header of the ARFF file contains the name of the relation, a list of the attributes (the columns in the data), and their types. An example header on a standard dataset looks like:
@RELATION house
@ATTRIBUTE houseSize NUMERIC
@ATTRIBUTE lotSize NUMERIC
@ATTRIBUTE bedrooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE bathroom NUMERIC
@ATTRIBUTE sellingPrice NUMERIC
The Data of the ARFF file looks like the following:
@DATA
3529,9191,6,0,0,205000
3247,10061,5,1,1,224900
4032,10150,5,0,1,197900
2397,14156,4,1,0,189900
2200,9600,4,0,1,195000
3536,19994,6,1,1,325000
2983,9365,5,0,1,230000
Loading ARFF Dataset in Eclipse
To load an ARFF file to test how Eclipse and Java work with WEKA, create a Java class that contains the following codes.
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
public class _2_LoadDataset {
public static void main(String args[]) throws Exception{
DataSource source = new DataSource("\\data\\house.arff");
Instances data = source.getDataSet();
System.out.println(data.numInstances()+" instances loaded.");
System.out.println(data.toString());
}
}
Linear Regression in Java
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.classifiers.functions.LinearRegression;
public class _3_Regression{
public static void main(String args[]) throws Exception{
//Load Data set
DataSource source = new DataSource("data\\house.arff");
Instances dataset = source.getDataSet();
//set class index to the last attribute
dataset.setClassIndex(dataset.numAttributes()-1);
//Build model
LinearRegression model = new LinearRegression();
model.buildClassifier(dataset);
//output model
System.out.println("Regression Model : "+model);
// Now Predicting the cost
Instance myHouse = dataset.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("-------------------------");
System.out.println("Predicted Price : "+price);
}
}
Comments