Tag Archives: used

Web Data Extraction, Web Extraction Tool, Web Extraction Software

At present, Internet has become an inevitable medium for conducting various types of businesses through e-commerce and financial websites. It has increased the use of various web data mining techniques and tools such that it could be easy for extraction of data from web pages.
Web data mining includes different process for summarizing and collecting various types of data from online web pages or websites for analysis purpose. Using web data mining, it could be very easy to find out potential competitors by analyzing the market data of components and thereby improving our product sales through promotion and marketing. Web data mining can be of different types such as usage mining, content mining and structure mining. Content mining can be used to get data form website like audio, video, text and images. The usage mining is focused on getting reports from web servers about user access on particular website. It can help the business owner to improve the design and structure of their website to increase the usage. Structure data mining is focused on finding the resemblance between different websites.
By using web data mining with the help of techniques and tools, business owners could easily predict the potential growth of a particular product in selective market. Web data mining is the major technique for customer survey and analysis regarding particular product in market by gather data from web pages. With web data mining, you can predict the future of your business and make necessary changes.

Web scraping will make use of different method for extracting contents from web pages and can save the amount of effort and time taken to acquire bulk data from websites. Web scraping system can help you get fast and accurate results easily which is impossible to obtain manually. You can use web data mining technique for gather product pricing data, capture of real estate information, financial data and market trends from financial websites, duplicating online databases, job posting, auction details etc.

Web Scraping is the use of automated scripts or software applications to extract data from target websites and helps in conversion of data in unstructured format to structured format. The web scraping applications or scripts will simulate the process individuals viewing the website using a browser. By using web scraping technique, you can easily gather the data from web pages similar to how a browser request for information from web pages. The web server will respond to your requests and you can extract website contents and store them in various formats like word, excel, access, sql database or any other format as you like.

Web scrapping tools can reduce the time taken to extract information from websites drastically. There are power web data mining tools which are based on W3C DOM (Document object model). Such tools allow users to develop accurate crawling rules to extract website content. The web data mining tools have graphical interface which is user friendly and does not require any programming. These web scraping tools have enhanced extraction rules and HTML tag filtering that makes it easy to ignore HTML tags and extracts web contents. Web scraping tools are very powerful, fast and flexible. They have support for regular expression, multi-page and multi-URL modes of extraction. Web scraping tools could be used for creating RSS feeds from web source and they can be used for exporting results to XML format.
Web scraping tools can be used to extract URL, text and HTML source code. Web scraping tools come as both standalone windows application and also portable format and could be executed without installing them in a machine.

How to get a self signed certificate?

SSL makes use of what is known as asymmetric cryptography, commonly referred to as public key cryptography (PKI). With public key cryptography, two keys are created, one public, one private. Anything encrypted with either key can only be decrypted with its corresponding key. Thus if a message or data stream were encrypted with the server’s private key, it can be decrypted only using its corresponding public key, ensuring that the data only could have come from the server.

If SSL utilizes public key cryptography to encrypt the data stream traveling over the Internet, why is a certificate necessary? The technical answer to that question is that a certificate is not really necessary – the data is secure and cannot easily be decrypted by a third party. However, certificates do serve a crucial role in the communication process. The certificate, signed by a trusted Certificate Authority (CA), ensures that the certificate holder is really who he claims to be. Without a trusted signed certificate, your data may be encrypted; however, the party you are communicating with may not be whom you think. Without certificates, impersonation attacks would be much more common.

Step 1: Generate a Private Key

The openssl toolkit is used to generate an RSA Private Key and CSR (Certificate Signing Request). It can also be used to generate self-signed certificates which can be used for testing purposes or internal usage.

The first step is to create your RSA Private Key. This key is a 1024 bit RSA key which is encrypted using Triple-DES and stored in a PEM format so that it is readable as ASCII text.

Step 2: Generate a CSR (Certificate Signing Request)

Once the private key is generated a Certificate Signing Request can be generated. The CSR is then used in one of two ways. Ideally, the CSR will be sent to a Certificate Authority, such as Thawte or Verisign who will verify the identity of the requestor and issue a signed certificate. The second option is to self-sign the CSR, which will be demonstrated in the next section.

During the generation of the CSR, you will be prompted for several pieces of information. These are the X.509 attributes of the certificate. One of the prompts will be for “Common Name (e.g., YOUR name)”. It is important that this field be filled in with the fully qualified domain name of the server to be protected by SSL. If the website to be protected will be https://public.akadia.com, then enter public.akadia.com at this prompt.

Step 3: Remove Passphrase from Key

One unfortunate side-effect of the pass-phrased private key is that Apache will ask for the pass-phrase each time the web server is started. Obviously this is not necessarily convenient as someone will not always be around to type in the pass-phrase, such as after a reboot or crash. mod_ssl includes the ability to use an external program in place of the built-in pass-phrase dialog, however, this is not necessarily the most secure option either. It is possible to remove the Triple-DES encryption from the key, thereby no longer needing to type in a pass-phrase. If the private key is no longer encrypted, it is critical that this file only be readable by the root user! If your system is ever compromised and a third party obtains your unencrypted private key, the corresponding certificate will need to be revoked.

Step 4: Generating a Self-Signed Certificate

At this point you will need to generate a self-signed certificate because you either don’t plan on having your certificate signed by a CA, or you wish to test your new SSL implementation while the CA is signing your certificate. This temporary certificate will generate an error in the client browser to the effect that the signing certificate authority is unknown and not trusted.

Step 5: Installing the Private Key and Certificate

When Apache with mod_ssl is installed, it creates several directories in the Apache config directory. The location of this directory will differ depending on how Apache was compiled.

Step 6: Configuring SSL Enabled Virtual Hosts

Step 7: Restart Apache and Test

How to get a self signed certificate?

SSL makes use of what is known as asymmetric cryptography, commonly referred to as public key cryptography (PKI). With public key cryptography, two keys are created, one public, one private. Anything encrypted with either key can only be decrypted with its corresponding key. Thus if a message or data stream were encrypted with the server’s private key, it can be decrypted only using its corresponding public key, ensuring that the data only could have come from the server.

If SSL utilizes public key cryptography to encrypt the data stream traveling over the Internet, why is a certificate necessary? The technical answer to that question is that a certificate is not really necessary – the data is secure and cannot easily be decrypted by a third party. However, certificates do serve a crucial role in the communication process. The certificate, signed by a trusted Certificate Authority (CA), ensures that the certificate holder is really who he claims to be. Without a trusted signed certificate, your data may be encrypted; however, the party you are communicating with may not be whom you think. Without certificates, impersonation attacks would be much more common.

Step 1: Generate a Private Key

The openssl toolkit is used to generate an RSA Private Key and CSR (Certificate Signing Request). It can also be used to generate self-signed certificates which can be used for testing purposes or internal usage.

The first step is to create your RSA Private Key. This key is a 1024 bit RSA key which is encrypted using Triple-DES and stored in a PEM format so that it is readable as ASCII text.

Step 2: Generate a CSR (Certificate Signing Request)

Once the private key is generated a Certificate Signing Request can be generated. The CSR is then used in one of two ways. Ideally, the CSR will be sent to a Certificate Authority, such as Thawte or Verisign who will verify the identity of the requestor and issue a signed certificate. The second option is to self-sign the CSR, which will be demonstrated in the next section.

During the generation of the CSR, you will be prompted for several pieces of information. These are the X.509 attributes of the certificate. One of the prompts will be for “Common Name (e.g., YOUR name)”. It is important that this field be filled in with the fully qualified domain name of the server to be protected by SSL. If the website to be protected will be https://public.akadia.com, then enter public.akadia.com at this prompt.

Step 3: Remove Passphrase from Key

One unfortunate side-effect of the pass-phrased private key is that Apache will ask for the pass-phrase each time the web server is started. Obviously this is not necessarily convenient as someone will not always be around to type in the pass-phrase, such as after a reboot or crash. mod_ssl includes the ability to use an external program in place of the built-in pass-phrase dialog, however, this is not necessarily the most secure option either. It is possible to remove the Triple-DES encryption from the key, thereby no longer needing to type in a pass-phrase. If the private key is no longer encrypted, it is critical that this file only be readable by the root user! If your system is ever compromised and a third party obtains your unencrypted private key, the corresponding certificate will need to be revoked.

Step 4: Generating a Self-Signed Certificate

At this point you will need to generate a self-signed certificate because you either don’t plan on having your certificate signed by a CA, or you wish to test your new SSL implementation while the CA is signing your certificate. This temporary certificate will generate an error in the client browser to the effect that the signing certificate authority is unknown and not trusted.

Step 5: Installing the Private Key and Certificate

When Apache with mod_ssl is installed, it creates several directories in the Apache config directory. The location of this directory will differ depending on how Apache was compiled.

Step 6: Configuring SSL Enabled Virtual Hosts

Step 7: Restart Apache and Test