CakePHP: How to add Search Engine Friendly (SEF) URLs
How many times you wondered how great it would be if your URLs didn’t look so much like:
···http://www.server.com/posts/view/1058
But more like:
···http://www.server.com/posts/view/my_first_post
Read the rest on:
http://bakery.cakephp.org/articles/view/adding-friendly-urls-to-the-cake-blog-tutorial
PHP4: How to Steal from Yahoo! (another Web Screen Scraper)
If you've been wondering on how to scrape some information from your favorite website. This web screen scraping techniques has already there since dinosaur age. Here's an example on how to do it in PHP 4. Thanks to:
- Mozilla Firefox and Firebug for spying and taking overall bird view so we could design a strategy.
- The good old snoopy-php project which will do his works as http client, snipping and sucking the whole page.
- And also phphtmlparser that make dirty HTML parsing work a lot more easy, taking down the enemy element by element.
First, our target operation is to grab a list of currently hot box office movies from Yahoo! Movies.
Target located: http://movies.yahoo.com/mv/boxoffice/ ..... locked on!!!
We need to see the HTML layout of the page, using Firefox and Firebug

Place your sight on the right side of screenshot.
The area we want to steal started with Top Movies and ended with Top Cast/Crew'.
Here's part of the code you'll see later.
-
...
-
incIf($step1, 0, $parser->iNodeValue == 'Top Movies');
-
...
-
incIf($step1, 1, $parser->iNodeValue == 'Top Cast/Crew');
-
...
Now look at the bottom part. The full element tree starting from the root html tag, and ended in b tag.

Decide that b < font < a < td should enough to distinct and separate the element from others.
So we will use a variable $step2 for digging. If $step2 == 0 and current element is td, we set $step2 into 1. If $step2 == 1 and current element is b, increment it. This $step2 continues to dig further to font and until we reach the treasure box in b tag. Finally, print out what inside treasure box and go up again to the surface, set $step2 to 0.
Now it's time for the full code guys.
If you're too lazy to copy paste it, just download the source code of web scraping tutorial (PHP 4).
-
<?
-
// Let's hire both experts
-
include ('Snoopy.class.php');
-
include ('htmlparser.inc');
-
-
// Move and dig deeper
-
function incIf(&$step_counter, $current_step, $condition) {
-
if (($step_counter == $current_step) && ($condition))
-
$step_counter = $current_step+1;
-
}
-
-
// If it's deep enough, take it and leave;
-
function doIf(&$step_counter, $current_step, $condition) {
-
if (($step_counter == $current_step) && ($condition)) {
-
$step_counter = 0;
-
return true;
-
}
-
return false;
-
}
-
-
// C'mon snoopy suck that page
-
$snooper = new Snoopy();
-
if ($snooper->fetch('http://movies.yahoo.com/mv/boxoffice/')) {
-
// Pass the page to HtmlParser, and let him do his work
-
$parser = new HtmlParser ($snooper->results);
-
$step1 = 0; $step2 = 0;
-
echo "TODAY's BOX OFFICE\r\n<br/>";
-
while ($parser->parse()) {
-
incIf($step1, 0, $parser->iNodeValue == 'Top Movies');
-
incIf($step1, 1, $parser->iNodeValue == 'Top Cast/Crew');
-
if ($step1 == 1) {
-
incIf($step2, 0, $parser->iNodeName == 'TD');
-
incIf($step2, 1, $parser->iNodeName == 'A');
-
incIf($step2, 2, $parser->iNodeName == 'FONT');
-
incIf($step2, 3, $parser->iNodeName == 'B');
-
if (doIf($step2, 4, $parser->iNodeType == NODE_TYPE_TEXT))
-
}
-
}
-
}
-
?>
Tags: php4, screen scraping, code example