Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
13 years ago
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
13 years ago
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
13 years ago
Michael Peter Christen
8c06925984
animation of the web structure picture
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen
7e4e3fe5b6
free some memory after parsing html
13 years ago
Michael Peter Christen
b4409cc803
small redesign of blob column index and usage
13 years ago
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
13 years ago
Michael Peter Christen
7e728867e5
added a synchronization around iterations to prevent IO-deadlocking
...
during concurrent remote search requests
13 years ago
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
13 years ago
stbrumm
9f1b1b4604
Type for Robinson-Mode/Private Perr added
13 years ago
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Michael Christen
85bd4cc8bc
better lookup for peer names
13 years ago
Michael Christen
20e3084bd4
redesign of fining of peers by ip: more leightweight method to read the
...
seed databases
13 years ago
Michael Christen
0797b0de99
new handling of remote search processes: looking for seeds will now not
...
block the whole search process any more. A deadlock with a DHT selection
process may have been the cause for interface lockings in the past.
13 years ago
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
13 years ago
Michael Christen
c04bfaa51b
refactoring
13 years ago
Michael Christen
1f4afb4dc0
performance hacks
13 years ago
Michael Christen
675d557e88
removed debug logging
13 years ago
Michael Christen
e9dc99fe15
added rules to set specific RWIs as private RWIs which are not
...
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
13 years ago
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
13 years ago
Michael Christen
f14faf503b
better ranking because we wait a very little time during the search
...
process more to get better remote sear results into the ranking priority
stack
13 years ago
Michael Christen
e7e429705a
- less automatic indexing after a search (needs to reset the default
...
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
13 years ago
admin
484c4ad339
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
orbiter
402e9d71ef
changed ording on release files: main criteria is not the svn any more; releases are now ordered by
...
- release number
- date
- svn number
additionally there is a new option to remove the svn number completely
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8135 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
admin
56ce8488e4
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
orbiter
4b8ff84705
- search bugfixes (page counter and number of results per page; recognition of new search)
...
- experiments to speed-up the network image production (commented out)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8130 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
aeeae75b8a
the timeout of httpclient is not absolut, but till a connection is
...
established or between bytes send
trying this to reduce count of client-connections to /yacy/search.html
of other peers
13 years ago
hermens
2ac272cfbf
Fix for PeerSelection.seedsByAge() for big networks (>1000 Peers)
...
To get the most(least) recent peers search those with highest(lowest) LastSeen instead of the first by peerhash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8129 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0796b54601
- some speed hacks for network image
...
- panic patch for 'AD' hashes until it is clear where the problem comes from
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8126 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f9216e388c
- faster ping to clean up old peers faster
...
- clean up more news
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
35a9e8f307
- fixed network graphic
...
- debuged evaluation tables
- changed cache settings in template engine
- some speed hacks
- changed int angles for peer positions in network graphic to double angles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
550c881d80
remove more news (all older than one day) because they can be a performance problem if we have too many peers sending news
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8112 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ebd840ebf6
- enhanced description on search front page
...
- fixed language and heuristic modifier
- added hint to crawl start that we can do also ftp and smb crawls
- added a protocol extension to remote crawls to transport all search modifiers to remote peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c9216d5adf
fixed secondary remote search (the process that finds distributed join situations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8098 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c9a0dbd25a
added a security check
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8094 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1120f0c93c
update to network graphics: slightly less crawling activity, slightly stronger color for query activity
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8089 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8e0b2c5832
fixed cluster search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e58438c01c
- added a new retry connector for solr (for cases where solr responses are slow)
...
- added a new exist property into the metadataRepository which includes solr entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c31564ef08
stability bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8011 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
279482a76d
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8007 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
hermens
d3df03838a
make sure myself-target is always inserted at its appropriate position
...
this was previously omitted if the own peer should have been the first target
or the peer was the last peer before the rotation to AAAAAAAAAAAA
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7996 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
hermens
c3e7efa846
added sender side prevention of rwi flooding as mentioned in SVN 7993
...
saves memory and speeds up enqueueContainers by limiting the size of transfer.Chunk
saves network bandwidth by not transmitting RWIs that would get discarded at the target anyway
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7995 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
57d5529a01
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7977 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c3161b4ac
refactoring:
...
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago