orbiter
3a807e10cf
- added a cache for active crawl profiles to the crawl switchboard
...
- moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
37e35f2741
normalization of url using urlencoding/decoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8017 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e58438c01c
- added a new retry connector for solr (for cases where solr responses are slow)
...
- added a new exist property into the metadataRepository which includes solr entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d8d9735b4f
stability bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8012 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c31564ef08
stability bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8011 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f121f4bb45
fix for link in Supporter and Suftipps page
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8010 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
94eab08794
- updated opensearchdescription text and icon
...
- removed automatic setting of maxitems during search (can be set now elsewhere)
- updated RSSMessage.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8009 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
279482a76d
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8007 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1b86d06d1e
fix for http://bugs.yacy.net/view.php?id=62
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
9e4875230f
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
eb9c9edb01
enhanced table method (used by almost all yacy api interfaces)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8000 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
4ad9fc2bff
new snippet strategy for search hits in metadata: show beginning of text instead of hit position
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a9838f8b99
fix for http://bugs.yacy.net/view.php?id=59
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
hermens
d3df03838a
make sure myself-target is always inserted at its appropriate position
...
this was previously omitted if the own peer should have been the first target
or the peer was the last peer before the rotation to AAAAAAAAAAAA
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7996 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
hermens
c3e7efa846
added sender side prevention of rwi flooding as mentioned in SVN 7993
...
saves memory and speeds up enqueueContainers by limiting the size of transfer.Chunk
saves network bandwidth by not transmitting RWIs that would get discarded at the target anyway
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7995 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5af9598bd1
enhanced exported row parsing during row import
...
this affects the search and dht receive speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7994 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
7598a9e26b
fix for thread dump
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7992 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8eef8722d1
update to ThreadDump analysis: freerunner and thread state recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7990 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1df43b137d
another performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7989 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
7df0643f0e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7988 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1b45e33f04
added robots tag parser to solr scheme
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c61e4cfd78
- fix for incomplete clear() in balancer
...
- renamed Parser Errors to Rejected URLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7984 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
e207c41c8e
* fix urlproxy for urls containing dolar signs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7979 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
57d5529a01
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7977 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5ad7f9612b
added crawl settings for three new filters for each crawl:
...
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)
note: the settings and input environment is there with that commit, but the values are not yet evaluated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
47a8c69745
added a new feature to MultiProtocolURIs to get the locale for each url:
...
This is done using a new library InetAddressLocator.jar which is NOT added by default to YaCy because it is very old and with that library we will never get a debian package. However, some people want that functionality and it can be made available if the library is taken from http://javainetlocator.sourceforge.net/ and placed into the /lib directory where it will be found using reflection.
The new feature will be used to extend the crawler steering.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7975 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c3161b4ac
refactoring:
...
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
42b5f09f68
*) this should fix a bug in snippet creation (also cleaned up a little bit)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7972 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
277b454a62
*) added comments
...
*) minor refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b22865dbc
- removed some warinings
...
- removed a dead update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0c6d95e57b
- more tolerance against failure of table opening
...
- more connections for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
4f31869c5a
enhanced search result timing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7966 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b02b696b0
- add number of search results to end of rss and json output to reflect latest status of retrieval
...
- distinguish search access with different verify state in access of search cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
87e6abd168
* fix urls containing a port number in urlproxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7964 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
97045022fa
* pass cookies to Server Side Includes
...
* User.html a bit more usable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7963 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ce2a76d603
performance hack for search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
aaf7a0feaa
yet another cache strategy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7959 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8a428d3e77
ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dad5b586a4
added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
734059d33e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
23e81b28b2
synchronization enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dd4635e323
patches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
bb0c045036
fix for problem with relocation of network
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7944 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago