Vtu Final Year Project Report Main Report

  • Uploaded by: angryprash
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Vtu Final Year Project Report Main Report as PDF for free.

More details

  • Words: 6,754
  • Pages: 34
Loading documents preview...
Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER1 PREAMBLE

1.1 Introduction Many approaches have been successfully developed to detect online spam. Fangtao Li and colleagues1 initially analyzed several attributes related to spam behavior, such as content, sentiment, product, and metadata features, and exploited a two-view semi supervised method to identify spam reviews. Song Feng and colleagues2 defined three types of reviewers (any-time, multi-time, and single- time reviewers) and statistically made distributional footprints of deceptive reviews by using neuro-linguistic programming (NLP) techniques. Geli Fei and colleagues3 proposed a model to detect spammed products or product groups by comparing the differences in rating behaviors between suspicious and normal users. All these models rely on content features that can be easily found by inserting special characters, but other features, such as temporal and network information, have been employed as well. Qian Xu and colleagues4 collected large-scale real-world datasets from telecommunication service providers and combined temporal and user network information to classify spammers using Short Message Service (SMS). Sihong Xie and colleagues5 proposed a model that uses only temporal features, with no semantic or rating behavior analysis, to detect abnormal bursts as the number of reviews increases. Finally, Tyler Moore and colleagues6 studied the problem of temporal correlations between spam and phishing websites. Intuitively, these works can also be used to uncover sophisticated spam strategies. Amazon has sued more than 1,000 product review sellers who sell fake promotions on Fiverr.com

(one

of

the

most

famous

being

Spam

Reviewer

Cloud;

http://money.cnn.com/2015/10/18 /technology/amazon-lawsuit-fake -reviews). On such user cloud platforms, business owners can purchase anonymous comments generated by real users by paying for them. It makes spam detection very challenging, as the advent of a massive number of apparently genuine fake reviewers (which we refer to as “genuine fakes” in this article) makes the fraud pattern much more nebulous to track. To date many third party platforms have created various fake review markets for online product sellers and fake review providers. In real-world business processes, massive numbers of random but genuine fake review providers conduct real transactions and write positive Department of CSE, TOCE

1

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

comments to claim a bonus (many e-commerce websites think they can reduce spam reviews by allowing only real buyers to write them). Existing research ignores the latent connections in product networks, which are difficult to discover, especially when these spam activities have become a hyping and advertising investment that has gained increased popularity among homogeneous competitors online. Thus, antispam rules can be easily avoided, which also impairs the efficiency and effectiveness of detection performance. In this work, we coin a new solution— collaborative marketing hyping detection—that aims to detect groups of online stores that simultaneously adopt marketing hyping. [1]

This field involves various challenges: • How can heterogeneous product information network be defined to infer their latent collaborative hyping behaviors? Network information might not be directly observed in the original datasets, so we need to build up a relationship matrix between products to represent their underlying correlation. • What features need to be selected to best solve our problem? Traditional features such as semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose dedicated features according to our specific scenario. • How can we design a model that effectively identifies collaborative marketing hyping behavior? A model that can employ the power of heterogeneous product networks to discover collective hyping behavior is required here.

To overcome these challenges, we propose an unsupervised shapelet learning model to discover the temporal features of product reviews and then integrate the heterogeneous product network information as regularization terms, to discover the products that are subject to collaborative hyping. We define three regularization terms that reflect the underlying correlations among users, products, and online store networks.[1] The beginning configuration procedure of recognizing these subsystems and building up a structure for subsystem control and correspondence is called construction modeling outline and the yield of this outline procedure is a portrayal of the product structural planning.

Department of CSE, TOCE

2

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

1.2 Objectives And Goals A new solution aims to identify spam comments and detect products that adopt an evolving spam strategy for promotion. Specifically, an unsupervised learning model combines heterogeneous product review networks to discover collective hyping activities.

1.3 Existing System Traditional features such as semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose dedicated features according to our specific scenario.

Disadvantages: 

Decreases the inaccuracy caused by only using the user name information.



Stores usually purchase fake reviews periodically.

1.4 Proposed System We propose an unsupervised shape let learning model to discover the temporal features of product reviews and then integrate the heterogeneous product network information as regularization terms, to discover the products that are subject to collaborative hyping. We define three regularization terms that reflect the underlying correlations among users, products, and online store networks.

Advantages: 

Gained increased popularity among homogeneous competitors online.



Efficiency and effectiveness of detection performance.



Aims to detect groups of online stores that simultaneously adopt marketing hyping.

Department of CSE, TOCE

3

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER 2

LITERATURE SURVEY 

Learning to Identify Review Spam by Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan Zhu [1] In this paper, we study the review spam identification task in our product review mining

system. We manually build a review spam collection based on our crawled reviews. We first employ supervised learning methods and analyze the effect of different features in review spam identification. We also observe that the spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on the observation, we provide a two-view semi-supervised methods to exploit the large amount of unlabeled data. The experiment results show that the two-view co-training algorithms can achieve better results than the single-view algorithm. Our designed machine learning methods achieve significant improvements as compared with the heuristic baselines. 

Distributional Footprints of Deceptive Product Reviews by Song Feng, Longfei Xing, Anupam Gogar & Yejin Choi [2] This paper postulates that there are natural distributions of opinions in product reviews.

In particular, we hypothesize that for a given domain, there is a set of representative distributions of review rating scores. A deceptive business entity that hires people to write fake reviews will necessarily distort its distribution of review scores, leaving distributional footprints behind. In order to validate this hypothesis, we introduce strategies to create dataset with pseudo-gold standard that is labeled automatically based on different types of distributional footprints. A range of experiments confirm the hypothesized connection between the distributional anomaly and deceptive reviews. This study also provides novel quantitative insights into the characteristics of natural distributions of opinions in the Trip Advisor hotel review and the Amazon product review domains.

Department of CSE, TOCE

4

Collective Hyping Detection System To Identify Online Spam Activities Using AI



2018

Exploiting Burstiness in Reviews for Review Spammer Detection by Geli Fei, Arjun Mukherjee & Bing Liu [3] In this paper, we proposed to exploit bursts in detecting opinion spammers due to the

similar nature of reviewers in a burst. A graph propagation method for identifying spammers was presented. A novel evaluation method based on supervised learning was also described to deal with the difficult problem of evaluation without ground truth data, which classifies reviews based on a different set of features from the features used in identifying spammers. Our experimental results using Amazon.com reviews from the software domain showed that the proposed method is effective, which not only demonstrated its effectiveness objectively based on supervised learning (or classification), but also subjectively based on human expert evaluation. The fact that the supervised learning/classification results are consistent with human judgment also indicates that the proposed supervised learning based evaluation technique is justified. 

Topic: SMS Spam Detection Using Non-Content Features by Qian Xu, Evan Wei Xiang and Qiang Yang [4] In this paper, we have examined mobile-phone SMS message features from static,

network and temporal views, and proposed an effective way to identify important features that can be used to construct an anti-spam algorithm. We exploited a temporal analysis to design features that can detect SMS spammers with both high performance, and incorporated these features into an SVM classification algorithm. Our evaluation on a real SMS dataset showed that the temporal features and network features can be effectively incorporated to build an SVM classifier, with a gain of around 8% in improvement on AUC, as compared with those that are only based on conventional static features. 

Topic: Temporal Correlations between Spam and Phishing Websites by Tyler Moore, Richard Clayton & Henry Stern [6] Empirical study of malicious online activity is hard. Attackers remain elusive,

compromises happen fast, and strategies change frequently. Unfortunately, each of these factors cannot be changed. In this paper, we have combined phishing website lifetimes with detailed spam data, and consequently we have provided several new insights. First, we have demonstrated the gravity of the threat posed by attackers using fast-flux techniques. They send out 68% of spam while hosting only 3% of all phishing websites. They also transmit spam Department of CSE, TOCE

5

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

effectively: the bulk is sent out early, it stops once the site is removed, and keeps going whenever websites are overlooked by the take-down companies. In this respect, we also conclude that long-lived phishing websites continue to cause harm and should be taken down. 

A Shapelet Transfer for Time Series Classification by Jason Lines, Luke M. Davis & Anthony Bagnall [8] In this paper, we have proposed a shapelet transform for TSC that extracts the k best

shapelets from a dataset in a single pass. We implement this using a novel caching algorithm to store shapelets, and apply a simple, parameter-free cross-validation approach for extracting the most significant shapelets. We transform a total of 26 data sets with our filter and demonstrate that a C4.5 decision tree classifier trained with transformed data is competitive with an implementation of the original shapelet decision tree. We show that our filtered data can be applied to further, non-tree based classifiers to achieve improved classification performance, whilst still maintaining the interpretability of shapelets. We provide two implementations of the filter using different quality measures for discriminating between shapelets; we use information gain as proposed by in the first, and introduce the application of the F-statistic as an evaluation method for shapelets in the second. We show that classifiers trained using features derived from an F-statistic filter are competitive with classifiers trained with the information gain approach, whilst being easier to apply to multi-class classification problems. Finally, we provide exploratory data analysis of the shapelets extracted by our Filter on the Gun=NoGun problem and compare them with the output of 20. We show that the shapelets we find are consistent with the discriminatory shapelet in the original work, and show that our approach can lead to further insight

into

the

problem

Department of CSE, TOCE

by

looking

at

a

number

of

the

top

shapelets.

6

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER 3 SYSTEM DESIGN

3.1 Design Consideration The reason for the design is to arrange the arrangement of the issue determined by the necessities report. This stage is the initial phase in moving from issue to the arrangement space. As such, beginning with what is obliged; outline takes us to work towards how to full fill those needs. The configuration of the framework is maybe the most basic component influencing the nature of the product and has a noteworthy effect on the later stages, especially testing and upkeep. Framework outline depicts all the significant information structure, document arrangement, yield and real modules in the framework and their Specification is chosen.

3.2 System Architecture The architectural configuration procedure is concerned with building up a fundamental basic system for a framework. It includes recognizing the real parts of the framework and interchanges between these segments. The beginning configuration procedure of recognizing these subsystems and building up a structure for subsystem control and correspondence is called construction modeling outline and the yield of this outline procedure is a portrayal of the product structural planning. [5] The proposed architecture for this system is given below. It shows the way this system is designed and brief working of the system.

Department of CSE, TOCE

7

Collective Hyping Detection System To Identify Online Spam Activities Using AI

Check Hyping Quality and Pay ($)

2018

Spammer Cloud

Post TP and Pay ($)

Fake Review Quality Guarentee

Fake Reviewers

Purchase & Hyping (€)

Target Products

(€)

Online Stores

(€) : Fake reviewers make genuine purchases ($) : Store owners pay fake reviewers and purchasing cost through user cloud

Fig 3.1 System Architecture

Department of CSE, TOCE

8

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

3.3 Use Case Diagram

Use case diagram shows the various interactions of actors with a system. Use case is a coherent piece of functionality that a system can provide by interacting with actors. Actors are the external end users of the system.

Register

Login

Browse products User Buy products

Give reviews

Admin

Analyze reviews

Detect spams

Display spammers

Block spam users

Fig 3.2 Use Case Diagram

Department of CSE, TOCE

9

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

3.4 Dataflow Diagram The DFD is straightforward graphical formalism that can be utilized to speak to a framework as far as the info information to the framework, different preparing did on this information and the yield information created by the framework. A DFD model uses an exceptionally predetermined number of primitive images to speak to the capacities performed by a framework and the information stream among the capacities. The principle motivation behind why the DFD method is so famous is most likely in light of the way that DFD is an exceptionally basic formalism. It is easy to comprehend and utilization. Beginning with the arrangement of abnormal state works that a framework performs, a DFD display progressively speaks to different sub capacities. Actually, any various leveled model is easy to get it.

Department of CSE, TOCE

10

Collective Hyping Detection System To Identify Online Spam Activities Using AI

SEARCH PRODUCTS

LIST OF PRODUCTS

2018

REQUEST DATA

EXTRACT DATA USER

FETCH DETAILS

WEBSITE

BUY PRODUCT

REVIEW PRODUCTS

GET TOTAL REVIEW BLOCKED USER EXTRACT FEATURE

SET OF REVIEWS

NLP GET DETAILS

SPAM DETECTION

Fig 3.3 Data Flow Diagram

Department of CSE, TOCE

11

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

3.4 Activity Diagram Activity diagram is another important diagram in UML to describe dynamic aspects of the system. Activity diagram is basically a flow chart to represent the flow from one activity to another activity. The activity can be described as an operation of the system. So the control flow is drawn from one operation to another.

BROWSE PRODUCTS

BUY PRODUCTS

FEEDBACK

PROCESS REVIEW

EVALUATE USERS

SPAM ANALYSIS

NATURAL LANG. PROCESSING

RESULTS FOR SPAMMERS

Fig 3.4 Activity Diagra Department of CSE, TOCE

12

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER 4 SYSTEM REQUIREMENT SPECIFICATION System Requirement Specification (SRS) is a central report, which frames the establishment of the product advancement process. It records the necessities of a framework as well as has a depiction of its significant highlight. A SRS is essentially an association's seeing (in composing) of a client or potential customer's frame work necessities and conditions at a specific point in time (generally) before any genuine configuration or improvement work. It's a two-way protection approach that guarantees that both the customer and the association comprehend alternate's necessities from that viewpoint at a given point in time. The composition of programming necessity detail lessens advancement exertion, as watchful audit of the report can uncover oversights, mistaken assumptions, and irregularities ahead of schedule in the improvement cycle when these issues are less demanding to right. The SRS talks about the item however not the venture that created it, consequently the SRS serves as a premise for later improvement of the completed item. The SRS may need to be changed, however it does give an establishment to proceed with creation assessment. In straightforward words, programming necessity determination is the beginning stage of the product improvement action. The SRS means deciphering the thoughts in the brains of the customers – the information, into a formal archive – the yield of the prerequisite stage. Subsequently the yield of the stage is a situated of formally determined necessities, which ideally are finished and steady, while the data has none of these properties.

Department of CSE, TOCE

13

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

4.1 Hardware Requirements The most common set of requirements defined by any operating system or software application is the physical computer resources, also known as hardware, A hardware requirements list is often accompanied by a hardware compatibility list (HCL), especially in case of operating systems. The hardware requirements are a follows.,



System

:

Intel i3 2.1 GHZ



Memory

:

4GB.



Hard Disk

:

40 GB.



Monitor

:

15 VGA Color

4.2 Software Requirements: Software requirements may be calculations, technical details, data manipulation and processing and other specific functionality that define what a system is supposed to accomplish. Behavioural requirements describing all the cases where the system uses the functional requirements are captured in use cases. These are things that the system is required to do. 

Operating System

:

Windows 7 / 8



Language

:

JAVA / J2EE



Database

:

MySQL



Tool

:

NetBeans, Navicat, Tomcat Server

Department of CSE, TOCE

14

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Java Server Pages JavaServer Pages(JSP) are a technology that helps software developers create dynamically generated web pages based on HTML,XML, or other document types. Released in 1999 by Sun Microsystems, JSP is similar to PHP, but it uses the Java programming language. To deploy and run JavaServer Pages, a compatible web server with a servlet container, such as Apache Tomcat or Jetty, is required. Architecturally, JSP may be viewed as a high-level abstraction of Java servlets. JSPs are translated into servlets at runtime each JSP, servlet is cached and re-used until the original JSP is modified. JSP can be used independently or as the view component of a server-side model–view– controller design, normally with JavaBeans as the model and Java servlets (or a framework such as Apache Struts) as the controller. This is a type of Model 2 architecture. JSP allows Java code and certain pre-defined actions to be interleaved with static web markup content, with the resulting page being compiled and executed on the server to deliver a document. The compiled pages, as well as any dependent Java libraries, use Java bytecode rather than a native software format. Like any other Java program, they must be executed within a Java virtual machine (JVM) that integrates with the server's host operating system to provide an abstract platform-neutral environment. JSPs are usually used to deliver HTML and XML documents, but through the use of Output Stream, they can deliver other types of data as well. The Web container creates JSP implicit objects like page Context, Servlet Context, session, request & response. Web Browser Servelet Filter (Controller)

Instantiate

JSP Pages (View) JavaBeans (Model)

Server

Data Sources/Database Fig 4.1 JSP Model Department of CSE, TOCE

15

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

A JavaServer Pages compiler is a program that parses JSPs, and transforms them into executable Java Servlets. A program of this type is usually embedded into the application server and run automatically the first time a JSP is accessed, but pages may also be recompiled for better performance, or compiled as a part of the build process to test for errors. Some JSP containers support configuring how often the container checks JSP file timestamps to see whether the page has changed. Typically, this timestamp would be set to a short interval (perhaps seconds) during software development, and a longer interval (perhaps minutes, or even never) for a deployed Web application.

Java Servlet The servlet is a Java programming language class used to extend the capabilities of a server. Although servlets can respond to any types of requests, they are commonly used to extend the applications hosted by web servers, so they can be thought of as Java applets that run on servers instead of in web browsers. These kinds of servlets are the Java counterpart to other dynamic Web content technologies such as PHP and ASP.NET. Response

Request System

JSP Page (.JSP)

(a)

JSP Translator (Tomcat)

Servelet Source Code (Java)

(b)

Java Compiler (embedded server)

Text Buffer (in memory)

Execution Phase Server Class (.class) JSP Container

JRE

Translation Phase

(a) Translation occurs at this point, if JSP has been changed or is new. (b) If not, translation is skipped. Fig. 4.2 Life of a JSP File

Department of CSE, TOCE

16

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Three methods are central to the life cycle of a servlet. These are init(), service(), and destroy(). They are implemented by every servlet and are invoked at specific times by the server. 

During the initialization stage of the servlet life cycle, the web container initializes the servlet instance by calling the init() method, passing an object implementing the javax.servlet.ServletConfig interface. This configuration object allows the servlet to access name-value initialization parameters from the web application.



After initialization, the servlet instance, can service client requests. Each request is serviced in its own separate thread. The web container calls the service() method of the servlet for every request. The service() method determines the kind of request being made and dispatches it to an appropriate method to handle the request. The developer of the servlet must provide an implementation for these methods. If a request is made for a method that is not implemented by the servlet, the method of the parent class is called, typically resulting in an error being returned to the requester.



Finally, the web container calls the destroy() method that takes the servlet out of service. The destroy() method, like init(), is called only once in the lifecycle of a servlet.

The following is a typical user scenario of these methods. 1. Assume that a user requests to visit a URL. 

The browser then generates an HTTP request for this URL.



This request is then sent to the appropriate server.

2. The HTTP request is received by the web server and forwarded to the servlet container. 

The container maps this request to a particular servlet.



The servlet is dynamically retrieved and loaded into the address space of the container.

3. The container invokes the init() method of the servlet. 

This method is invoked only when the servlet is first loaded into memory.



It is possible to pass initialization parameters to the servlet so that it may configure itself.

4. The container invokes the service() method of the servlet. 

This method is called to process the HTTP request.



The servlet may read data that have been provided in the HTTP request.

Department of CSE, TOCE

17

Collective Hyping Detection System To Identify Online Spam Activities Using AI



2018

The servlet may also formulate an HTTP response for the client.

5. The servlet remains in the container's address space and is available to process any other HTTP requests received from clients. 

The service() method is called for each HTTP request.

6. The container may, at some point, decide to unload the servlet from its memory. 

The algorithms by which this decision is made are specific to each container.

7. The container calls the servlet's destroy() method to relinquish any resources such as file handles that are allocated for the servlet; important data may be saved to a persistent store. 8. The memory allocated for the servlet and its objects can then be garbage collected.

MySQL Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS).Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language and a data manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks." Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language. SQL became a standard of the American National Standards Institute(ANSI) in 1986, and of the International Organization for Standardization(ISO) in 1987. Since then, the standard has been enhanced several times with added features. Because the editor is extensible, you can plug in support for many other languages. Keeping a clear overview of large applications, with thousands of folders and files, and millions of lines of code, is a daunting task. Despite these standards, code is not completely portable among different database systems, which can lead to vendor lock-in. The difference makers do not perfectly adhere to the standard, for instance by adding extensions, and the standard itself is sometimes ambiguous.

Department of CSE, TOCE

18

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

NetBeans IDE NetBeans IDE is the official IDE for Java 8. With its editors, code analyzers, and converters, you can quickly and smoothly upgrade your applications to use new Java 8 language constructs, such as lambdas, functional operations, and method references. Batch analyzers and converters are provided to search through multiple applications at the same time, matching patterns for conversion to new Java 8 language constructs. With its constantly improving Java Editor, many rich features and an extensive range of tools, templates and samples, NetBeans IDE sets the standard for developing with cutting edge technologies out of the box. An IDE is much more than a text editor. The NetBeans Editor indent lines, matches words and brackets, and highlight source code syntactically and semantically. It also provides code templates, coding tips, and refactoring tools. The editor supports many languages from Java, C/C++, XML and HTML, to PHP, Groovy, Javadoc, JavaScript and JSP. Because the editor is extensible, you can plug in support for many other languages. Keeping a clear overview of large applications, with thousands of folders and files, and millions of lines of code, is a daunting task. NetBeans IDE provides different views of your data, from multiple project windows to helpful tools for setting up your applications and managing them efficiently, letting you drill down into your data quickly and easily, while giving you versioning tools via Subversion, Mercurial, and Get integration out of the box. When new developers join your project, they can understand the structure of your application because your code is well-organized. Design GUIs for Java SE, HTML5, Java EE, PHP, C/C++, and Java ME applications quickly and smoothly by using editors and drag-and-drop tools in the IDE. For Java SE applications, the NetBeans GUI Builder automatically takes care of correct spacing and alignment, while supporting in-place editing, as well. The GUI builder is so easy to use and intuitive that it has been used to prototype GUIs live at customer presentations. The cost of buggy code increases the longer it remains unfixed. NetBeans provide static analysis tools, especially integration with the widely used FindBugs tool, for identifying and fixing common problems in Java code. In addition, the NetBeans Debugger lets you place breakpoints in your source code, add field watches, step through your code, run into methods. The NetBeans Profiler provides expert assistance for optimizing your application's speed and memory usage, and makes it easier to build reliable and scalable Java SE, JavaFX and Java EE applications. NetBeans IDE includes a visual debugger for Java SE applications, letting you debug user interfaces without looking into source code. Take GUI snapshots of your applications and click on user interface elements to jump back into the related source code. Department of CSE, TOCE

19

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Fig. 4.3 Snap Shot of Net Beans

Apache The Apache HTTP Server is a web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million web site milestone. Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. Since April 1996 Apache has been the most popular HTTP server software in use. As of November 2010 Apache served over 59.36% of all websites and over 66.56% of the first one million busiest websites.

Navicat Premium Navicat Premium is a multi-connections database administration tool allowing you to connect to MySQL, MariaDB, SQL Server, and SQLite, Oracle and PostgreSQL databases simultaneously within a single application, making database administration to multiple kinds of database so easy. Navicat Premium combines the functions of other Navicat members and supports most of the features in MySQL, MariaDB, SQL Server, SQLite, Oracle and PostgreSQL including Stored Procedure, Event, Trigger, Function, View, etc. Navicat Premium enables you to easily and quickly transfer data across various database systems, or to a plain text file with the designated SQL format and encoding. Also, batch job for different kind of databases can also be scheduled and run at a specific time. Other features include Import/ Export Wizard, Query Builder, Report Builder, Data Synchronization, Backup, Job Scheduler and more. Features in Navicat are sophisticated enough to provide professional Department of CSE, TOCE

20

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

developers for all their specific needs, yet easy to learn for users who are new to database server. Establish a secure SSH session through SSH Tunnelling in Navicat. You can enjoy a strong authentication and secure encrypted communications between two hosts. The authentication method can use a password or public / private key pair. And, Navicat comes with HTTP Tunnelling while your ISPs do not allow direct connections to their database servers but allow establishing HTTP connections. HTTP Tunnelling is a method for connecting to a server that uses the same protocol (http://) and the same port (port 80) as a webserver does.

Department of CSE, TOCE

21

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER 5

IMPLEMENTATION 5.1 Programming Language Selection Java is a little, basic, safe, item situated, translated or rapidly improved, byte coded, engineering, waste gathered, multithreaded programming dialect with a specifically exemption taking care of for composing circulated and powerfully extensible projects. With most programming dialects, you either accumulate or translate a project so you can run it on your PC. The Java programming dialect is irregular in that a project is both accumulated and deciphered. The stage autonomous codes deciphered by the mediator on the Java stage. The mediator parses and runs every Java byte code guideline on the PC. Aggregation happens just once; understanding happens every time the project is executed. The accompanying figure delineates how this function You can consider Java byte codes as the machine code directions for the Java Virtual Machine (Java VM). Each Java mediator, whether it’s an advancement device or a Web program that can run applets, is an execution of the Java VM.

Fig. 5.1 Features of Java Department of CSE, TOCE

22

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

5.2 Selection of Platform A platform is the hardware or software environment in which a program runs. As already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on topof other hardware-based platforms.

The Java platform has two components: •

The Java Virtual Machine (JVM)



The Java Application Programming Interface (Java API)

We’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware. myProgram.java

Java API Java Platform Java Virtual Machine

Hardware Based Platform

Fig. 5.2 Java Interpreter Architecture Department of CSE, TOCE

23

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

5.3 Functional Description of Modules 

Shapelet Learning Model: Shapelets are discriminative subsequence of time series that best predict the target variable, while shapelet learning models are usually designed with a classification purpose that aims to identify the similarity between two items.[4]



Product Network Regularization: The product network provides correlation information about all online stores. We model three types of heterogeneous information network as regularization terms: store-based regularization, product-based regularization, and user-correlation regularization.[9]



Collaborative Hyping Detection Model: We propose our collaborative hyping detection model (CHDM) to solve the collective marketing hyping problem defined earlier. This model integrates all the regularization terms we’ve defined into a shapelet learning model that utilizes temporal features and product network information for clustering.

Department of CSE, TOCE

24

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER 6 SYSTEM TESTING Types of Testing: 

Unit Testing Individual component are tested to ensure that they operate correctly. Each

component is tested independently, without other system component. This system was tested with the set of proper test data for each module and the results were checked with the expected output. Unit testing focuses on verification effort on the smallest unit of the software design module. This is also known as MODULE TESTING. This testing is carried out during phases, each module is found to be working satisfactory as regards to the expected output from the module. 

Integration Testing

Integration testing is another aspect of testing that is generally done in order to uncover errors associated with flow of data across interfaces. The unit-tested modules are grouped together and tested in small segment, which make it easier to isolate and correct errors. This approach is continued unit I have integrated all modules to form the system as a whole. 

System Testing

System testing is actually a series of different tests whose primary purpose is to fully exercise the computer-based system. System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration testing. System testing is based on process description and flows, emphasizing pre-driver process and integration points. 

Performance Testing

The performance testing ensure that the output being produced within the time limits and time taken for the system compiling, giving response to the users and request being send to the system in order to retrieve the results.

Department of CSE, TOCE

25

Collective Hyping Detection System To Identify Online Spam Activities Using AI



2018

Validation Testing

The validation testing can be defined in many ways, but a simple definition is that. Validation succeeds when the software functions in a manner that can be reasonably expected by the end user. 

Black Box testing

Black box testing is done to find the following





Incorrect or missing functions



Interface errors



Errors on external database access



Performance error



Initialization and termination error

White Box Testing

This allows the tests to 

Check whether all independent paths within a module have been exercised at least once



Exercise all logical decisions on their false sides



Execute all loops and their boundaries and within their boundaries



Exercise the internal data structure to ensure their validity



Ensure whether all possible validity checks and validity lookups have been provided to validate data entry.



Acceptance Testing

This is the final stage of testing process before the system is accepted for operational use. The system is tested within the data supplied from the system procurer rather than simulated data.

Department of CSE, TOCE

26

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

CHAPTER7

CONCLUSION As previously discussed, the MSSD model identifies spam stores or products one by one by detecting abnormal singleton reviewers appearing in an assigned time window. However, this method misses the latent information that underlies evolving hyping activities. We pick up two of the representative cases, which were tagged as “spam” by the MSSD model but that our model placed in a “clean” class. Apparently, there’s a remarkable purchasing burst in both of them, with 80 percent of buyers in this time window being singleton reviewers. In our experiment, we define customers who have made fewer than five transactions online since their registration as singleton reviewers. Because of the different customer level–segmentation strategies and privacy policies in Taobao, this provides the best match with the definition of singleton reviewers in the MSSD model.

Department of CSE, TOCE

27

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

REFERENCES [1] F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to Identify Review Spam,” Proc. Int’l Joint Conf. Artificial Intelligence, 2011, pp. 2488–2493. [2] S. Feng et al., “Distributional Footprints of Deceptive Product Reviews,” Proc. Int’l Conf. Web and Social Media, 2012, pp. 98–105. [3] G. Fei et al., “Exploiting Burstiness in Reviews for Review Spammer Detection,” Proc. Int’l Conf. Web and Social Media, 2013, pp. 175–184. [4] Q. Xu et al., “SMS Spam Detection Using Noncontent Features,” IEEE Intelligent Systems, vol. 27, no. 6, 2012, pp. 44–51. [5] S. Xie et al., “Review Spam Detection via Temporal Pattern Discovery,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2012, pp. 823–831. [6] T. Moore, R. Clayton, and H. Stern, “Temporal Correlations between Spam and Phishing Websites,” Proc. 2nd Usenix Conf. Large-scale Exploits and Emergent Threats, 2009, p. 5. [7] J. Grabocka et al., “Learning Time- Series Shapelets,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2014, pp. 392–401. [8] J. Lines et al., “A Shapelet Transform for Time Series Classification,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2012, pp. 289–297. [9] Q. Zhang et al., “Exploring Heterogeneous Product Networks for Discovering Collective Marketing Hyping Behavior,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2016, pp. 40–51.

Department of CSE, TOCE

28

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Appendix A

Snapshots

Fig. A 1 Registeration Page

Department of CSE, TOCE

29

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Fig A 2 Login Page

Department of CSE, TOCE

30

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Fig A 3 User Home

Department of CSE, TOCE

31

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Fig A 4 IP Blocking

Department of CSE, TOCE

32

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Fig A 5 Blocked user message

Department of CSE, TOCE

33

Collective Hyping Detection System To Identify Online Spam Activities Using AI

2018

Appendix B

Conference Details Presenting and publishing paper entitled “Collective Hyping Detection System To Identify Online Spam Activities Using AI” in proceedings of National Conference on Science Engineering and Management (NCSEM-2018) which will be help on 24-25th May 2018 at bThe Oxford College of Engineering, Bengaluru.

Department of CSE, TOCE

34

Related Documents


More Documents from "Radwan Kastantin"