Quantcast
Channel: Active questions tagged feed - Stack Overflow
Viewing all articles
Browse latest Browse all 542

Scraping a website feed is not giving me the new website announcement on time with Python

$
0
0

So, we're a couple of people starting in programming with basic knowledge. We want to monitor a website to get new announcements on time (VERY ON TIME), just at the moment they publish it.

We've got good results with the monitoring. It fullfills the requests every second or half a second. We monitored the website firstly with beautyful soup, using selenium webdriver, but since this latter is very slow, we used BS4.The content we expect to get is just the title of the new announcement but it comes with javascript content, so we're using its feed rss. When we look at the pubdate it does not match with our request that are fulfilling on time.

This is the code:

url='https://www.minuto30.com/feed'r = requests.get(url)  soup = BeautifulSoup(r.content, features='xml') items = soup.findAll('item')current_announces=len(items) new_announces=current_announcesannounces=0 while True:  try:     if announces == 0:         start = datetime.now()         r = requests.get(url)         soup = BeautifulSoup(r.content, features='xml')         items = soup.findAll('item')         current_announces=len(items)         announces=new_announces-current_announces         end = datetime.now()         time_taken = end - start         print(time_taken) except Exception:         traceback.print_exc()***

Resuts:

1.44 --> 2021-06-28 21:55:48.9471010.31 --> 2021-06-28 21:55:49.2595690.34 --> 2021-06-28 21:55:49.5984020.35 --> 2021-06-28 21:55:49.9461050.29 --> 2021-06-28 21:55:50.239378A new title has been added to https://www.minuto30.com/feed«Roko» se perdió en el barrio CalasanzPubl. date:2021-06-28 20:11:02 (Europe/Dublin)Detected 504.24s after pubdate.3.45 --> 2021-06-28 21:55:53.6907910.28 --> 2021-06-28 21:55:53.9664530.29 --> 2021-06-28 21:55:54.2554460.28 --> 2021-06-28 21:55:54.5398910.29 --> 2021-06-28 21:55:54.8295640.29 --> 2021-06-28 21:55:55.1160860.27 --> 2021-06-28 21:55:55.3885430.29 -->

Would it be that the feed is not updating on time? What other options do we have to get every new announcement right on time (1 second max). Thank you


Viewing all articles
Browse latest Browse all 542

Trending Articles