Michael Peter Christen
c6c61be3f0
fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Peter Christen
0d148c3353
more logging in resource observer
13 years ago
Michael Peter Christen
2fa037ae1d
enhanced crawler
13 years ago
low012
2120db289a
*) Small change which should solve problem with cgitb module in Python CGI scripts.
13 years ago
Lotus
ee89cf5ae5
fix must match filter for full domain crawl
...
allow:
http://www.example.com
http://www.example.com/
http://www.example.com/abc.html?xyz=q
block:
http://www.example.com.cn
http://www.example.com.cn/dsf
13 years ago
Michael Peter Christen
9ad1d8dde2
complete redesign of crawl queue monitoring: do not look at a
...
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
13 years ago
Michael Peter Christen
4540174fe0
memory hacks
13 years ago
Michael Peter Christen
9ebcae2fbc
enhanced url parser to understand urls with & instead of & in post
...
urls
13 years ago
Michael Peter Christen
1f4f60654a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/document/parser/pdfParser.java
13 years ago
Michael Peter Christen
e6d26a023f
fix for bookmark crash with possible side-effects on crawl start after
...
the crash
13 years ago
Michael Peter Christen
190b77c55e
added Ukrainian translation
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
c1af123ddd
just a little faster toString
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
64e4bcee82
serverSwitch get(App/Data)Path() use common helper method
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
371fbb4deb
just comment + shorter code in serverSwitch
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
ed253b7aff
update javadoc, does not throw IOException
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Peter Christen
2ee8cbeb2c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/search/Switchboard.java
13 years ago
Michael Peter Christen
992dbdf4bb
added noload statistic to servlets
13 years ago
Michael Christen
354b976110
fix for concurrency problem and endless loop in /suggest.json
13 years ago
Michael Christen
c21966bb43
fix
13 years ago
Michael Christen
13b05f9c08
fix
13 years ago
Michael Christen
e5d878c59e
Merge branch 'master' of ssh://gitorious.org/yacy/rc1
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
Michael Christen
ec26b2bea4
Merge commit 'fa08ed5ae5d72bddc3cc6a662b23103579e86109' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
13 years ago
Roland 'Quix0r' Haeder
901f37d608
Also this ... :( #2
13 years ago
Roland 'Quix0r' Haeder
a985717ed2
Also this ... :(
13 years ago
Roland 'Quix0r' Haeder
5f490de554
Fix for ported fix from my old days ...
13 years ago
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
13 years ago
Michael Christen
c04bfaa51b
refactoring
13 years ago
Michael Christen
17f962fceb
translator updates:
...
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
13 years ago
Michael Christen
752b092b8a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
admin
23afee58fe
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
Michael Christen
3eccdca63c
protection against too long running snippet fetch processes
13 years ago
apfelmaennchen
ff19fcdb28
bugfix for YMarks XBEL import and export; thanks to Dominic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8138 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
13 years ago
Michael Christen
6e66c9d7f1
fix for http://bugs.yacy.net/view.php?id=87
13 years ago
Michael Christen
e7e429705a
- less automatic indexing after a search (needs to reset the default
...
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
13 years ago
admin
a4ac051029
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
low012
7cfdc2c092
Improved CGI capabilities:
...
*) CGI respects shebang now (should solve problems with MS Windows)
*) better error handling (more correct HTTP error codes)
*) logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8136 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Michael Christen
9cd469e6d6
added pull request from als plus an NPE fix
13 years ago
orbiter
11729061f2
added an option in the bookmark import process to put everything into the crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8134 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
70bcfc150a
- small bug fix to ymarks html importer
...
- import of delicious.com exports has successfully been tested
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8132 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
b5d9f631e3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8128 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
35a9e8f307
- fixed network graphic
...
- debuged evaluation tables
- changed cache settings in template engine
- some speed hacks
- changed int angles for peer positions in network graphic to double angles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Al Sutton
8993cac4d8
Initial performance improvements
13 years ago
orbiter
8895d8c1cd
removed unnecessary log entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8117 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
77a080ced9
smaller fixes for YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8105 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
dd1482aaf5
further update to YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8100 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c584db991f
creating a bookmark from the search results now works again .. with new YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
564374d1fe
- included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
...
- reworked bookmark creation on crawlstart
- many smaller adjustments to ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c93f10417a
add a bookmark automatically each time a new crawl is started
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8063 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e4a82ddd8b
produce a bookmark entry from every crawl start. these bookmarks are always private.
...
these bookmarks will be used to get a source reference for the search in case of intranet or portal searches.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
6287c2b4a9
YMarks:
...
- introduced tag manager - a quite powerful tool (still not 100% stable, so be careful)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8060 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
cominch
2236e01137
Minor correction to prevent useless comma at beginning of string, created from list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8059 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
5581be12fb
YMarks:
...
- added backend and api for tag management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8058 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
a3eebfdcba
YMarks:
...
- show active/running crawls
- execute crawls (works currently only if API entry is available)
- various smaller fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8056 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c50f8f9a06
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8055 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
4f95f72124
YMarks:
...
- working direct importer for YaCy Crawl Starts
- working direct import for old bookmarks.db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8052 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
aa322bc6d0
fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8050 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
97d1347adb
added also a default accept field to robots.txt downloads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8049 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f183d3822c
added a default accept header in http requests since some http fraud detection functions check that this header field exist
...
see also: http://bad-behavior.ioerror.us/ in source file browser.inc.php
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8048 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
06352b8d6b
more logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8047 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a99934226e
more logging for debugging of robots.txt
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8046 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
7a5841e061
fix for robot parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8045 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
458c20ff72
fix for robot parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8044 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
017a01714d
- enhanced logging in robots.txt parser for remote debugging
...
- robots.txt is now more robust against database operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
a8dfe787ed
- updated to jquery flexigrid 1.1
...
- YMarks.html automatically recognizes if a bookmark is a crawl start
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8040 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
eb1c7c041d
write info about robots.txt evaluation into getpageinfo_p.xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8038 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
abba31f02e
- bugfix for correctly sorting ymarks
...
- some tuning for the autotagger (still not perfect)
- /api/ymarks/get_metadata.xml now provides info for crawlstarts
- removed unused code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8036 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
775b44017e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8033 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
5f7dbe1c42
- some refactoring (ymarks)
...
- improvement for autotagger (is now able to create/detect multi word tags e.g. 'open source')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8031 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
78ce3b13be
typo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85d6bf4ac4
fixed urls to media content during indexing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d858d48ec
replaced String with StringBuilder in suggestion process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a807e10cf
- added a cache for active crawl profiles to the crawl switchboard
...
- moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
37e35f2741
normalization of url using urlencoding/decoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8017 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1b86d06d1e
fix for http://bugs.yacy.net/view.php?id=62
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
9e4875230f
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a9838f8b99
fix for http://bugs.yacy.net/view.php?id=59
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c61e4cfd78
- fix for incomplete clear() in balancer
...
- renamed Parser Errors to Rejected URLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7984 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
e207c41c8e
* fix urlproxy for urls containing dolar signs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7979 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5ad7f9612b
added crawl settings for three new filters for each crawl:
...
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)
note: the settings and input environment is there with that commit, but the values are not yet evaluated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
42b5f09f68
*) this should fix a bug in snippet creation (also cleaned up a little bit)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7972 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6b22865dbc
- removed some warinings
...
- removed a dead update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0c6d95e57b
- more tolerance against failure of table opening
...
- more connections for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4f31869c5a
enhanced search result timing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7966 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6b02b696b0
- add number of search results to end of rss and json output to reflect latest status of retrieval
...
- distinguish search access with different verify state in access of search cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
87e6abd168
* fix urls containing a port number in urlproxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7964 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
97045022fa
* pass cookies to Server Side Includes
...
* User.html a bit more usable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7963 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ce2a76d603
performance hack for search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago