WARNINGBIRD:
A Near Real-time Detection System for Suspicious URLs in Twitter Stream
Abstract:
Twitter is prone to malicious tweets
containing URLs for spam, phishing, and malware distribution. Conventional
Twitter spam detection schemes utilize account features such as the ratio of
tweets containing URLs and the account creation date, or relation features in
the Twitter graph. These detection schemes are ineffective against feature
fabrications or consume much time and resources. Conventional suspicious URL
detection schemes utilize several features including lexical features of URLs,
URL redirection, HTML content, and dynamic behavior. However, evading
techniques such as time-based evasion and crawler evasion exist. In this paper,
we propose WARNINGBIRD, a suspicious URL detection system for Twitter. Our
system investigates correlations of URL redirect chains extracted from several
tweets. Because attackers have limited resources and usually reuse them, their
URL redirect chains frequently share the same URLs. We develop methods to
discover correlated URL redirect chains using the frequently shared URLs and to
determine their suspiciousness. We collect numerous tweets from the Twitter
public timeline and build a statistical classifier using them. Evaluation
results show that our classifier accurately and efficiently detects suspicious
URLs. We also present WARNINGBIRD as a near real-time system for classifying
suspicious URLs in the Twitter stream.
EXISTING SYSTEM:
In the existing system attackers use
shortened malicious URLs that redirect Twitter users to external attack
servers. To cope with malicious tweets, several Twitter spam detection schemes
have been proposed. These schemes can be classified into account feature-based,
relation feature-based, and message feature based schemes. Account
feature-based schemes use the distinguishing features of spam accounts such as
the ratio of tweets containing URLs, the account creation date, and the number
of followers and friends. However, malicious users can easily fabricate these
account features. The relation feature-based schemes rely on more robust
features that malicious users cannot easily fabricate such as the distance and
connectivity apparent in the Twitter graph. Extracting these relation features
from a Twitter graph, however, requires a significant amount of time and
resources as a Twitter graph is tremendous in size. The message feature-based
scheme focused on the lexical features of messages. However, spammers can
easily change the shape of their messages. A number of suspicious URL detection
schemes have also been introduced.
DISADVANTAGES
OF EXISTING SYSTEM:
·
Malicious servers can bypass an
investigation by selectively providing benign pages to crawlers.
·
For instance, because static crawlers
usually cannot handle JavaScript or Flash, malicious servers can use them to
deliver malicious content only to normal browsers.
·
A recent technical report from Google
has also discussed techniques for evading current Web malware detection
systems.
·
Malicious servers can also employ
temporal behaviors— providing different content at different times—to evade an
investigation
PROPOSED
SYSTEM:
In this paper, we propose WARNINGBIRD, a
suspicious URL detection system for Twitter. Instead of investigating the
landing pages of individual URLs in each tweet, which may not be successfully
fetched, we considered correlations of URL redirect chains extracted from a
number of tweets. Because attacker’s resources are generally limited and need
to be reused, their URL redirect chains usually share the same URLs. We
therefore created a method to detect correlated URL redirect chains using such
frequently shared URLs. By analyzing the correlated URL redirect chains and
their tweet context information, we discover several features that can be used
to classify suspicious URLs. We collected a large number of tweets from the
Twitter public timeline and trained a statistical classifier using the
discovered features.
ADVANTAGES
OF PROPOSED SYSTEM:
The trained classifier is shown to be accurate
and has low false positives and negatives. The contributions of this paper are
as follows:
• We
present a new suspicious URL detection system for Twitter that is based on the
correlations of URL redirect chains, which are difficult to fabricate. The
system can find correlated URL redirect chains using the frequently shared URLs
and determine their suspiciousness in almost real time.
• We
introduce new features of suspicious URLs: some of which are newly discovered
and while others are variations of previously discovered features.
• We
present the results of investigations conducted on suspicious URLs that have
been widely distributed through Twitter over several months.
SYSTEM
ARCHITECTURE:
ALGORITHM USED:
ü Offline
supervised learning algorithm
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System : Windows XP
ü Programming Language :
JAVA
ü Java Version :
JDK 1.6 & above.
REFERENCE:
Sangho Lee, Student Member, IEEE, and
Jong Kim, Member, IEEE “WARNINGBIRD: A Near Real-time Detection System
for Suspicious URLs in Twitter Stream”-IEEE
TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. X, NO. Y, JANUARY2013.