removeHtmlTag java program?
Instructions:
Web developers use HTML tags in angle brackets to format the text on web pages. Write a method removeTag that checks whether a given string starts with an apparent HTML tag (a character or word in angle brackets) and ends with a matching closing HTML tag (a character or word preceded by the '/' character, all in angle brackets). If yes, the method removes both tags and returns the result; otherwise the method returns the original string unchanged. For example,
removeTag("<b>Strings are immutable</b>") should return a string equal to "Strings are immutable".
This is what I have so far, but I cannot check to see if I'm on the right track because something is weird with my compiler:
import java.util.Scanner*;
public class Html2Text
{
public static void main(String[]args)
{
System.out.println("Enter a weblink address: ");
Scanner in = new Scanner(System.in);
String input = Scanner.nextString();
Systm.out.println(replaceAll);
}
private static final Pattern REMOVE_TAGS = Pattern.compile("<.+?>");
public static String removeTags(String string)
{
if (string == null || string.length() == 0)
{
return string;
}
Matcher m = REMOVE_TAGS.matcher(string);
return m.replaceAll("");
}
}
Comments
Here are my suggestions to fix your code:
1. When you get the input from the user, use the following line of code:
String input = in.next();
2. The line of code:
System.out.println(replaceAll);
won't compile so remove it.
3. The removeTags() method seems to work as is.
4. The main() method needs some work.
a. Once we get the String from the user representing the URL, we need to create a URL object.
b. We need to create an InputStreamReader from the URL to be able to read in the whole web page.
5. Work on your code some, but you can look at my solutions to see how the whole thing is put together. It's kind of hard to explain it in English so take a look at the Java code.
6. I've made the assumption that the web page is encoded in UTF-8 which is pretty common, but for other encodings, this code won't quite work right.
Let me know if you have questions or comments. My email address is: [email protected]
My Solution:
http://ideone.com/jZjO1e