Roland 'Quix0r' Haeder
d10627d591
More sync in close() methods
...
Conflicts:
source/net/yacy/kelondro/logging/GuiHandler.java
source/net/yacy/kelondro/workflow/InstantBusyThread.java
13 years ago
Roland 'Quix0r' Haeder
b3ae2aa41f
With or without 'final'? At least please try it in other methods
...
Conflicts:
source/de/anomic/tools/tarTools.java
13 years ago
Roland 'Quix0r' Haeder
fbb946f913
Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen
52d307c735
prevent that the snippet fectch process removes catchall entries
13 years ago
Michael Peter Christen
89142d1e8d
removed (not all) warnings
13 years ago
Michael Peter Christen
5deebd02ea
added serialization
13 years ago
reger
b2175ea4ef
Add possibility to set custom Solr field names for the YaCy default Solr attributes.
...
- Changing the format of YaCy's solr.key.list while maintainig backward compatibility
Federated index config screens adjusted accordingly
- modified the Solr update request to use a 3 min Solr autocommit intervall
13 years ago
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen
2717c1b749
fixed bug in solr interface
13 years ago
Michael Peter Christen
f150bc218b
fixed bug in solr error document
13 years ago
Michael Peter Christen
cb54c1737b
solrj connector bugfix
13 years ago
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
13 years ago
Michael Peter Christen
0d58fea210
made multiple connector default
13 years ago
Michael Peter Christen
adeb33bb36
better abstraction for solr objects
13 years ago
Michael Peter Christen
8864141872
more abstraction in solr connection classes
13 years ago
Michael Peter Christen
c00efc2717
made the solr connection more generic
13 years ago
Michael Peter Christen
ea2bd43b28
patch for broken configurations
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
13 years ago
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
13 years ago
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
13 years ago
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
13 years ago
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
13 years ago
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
13 years ago
Michael Christen
02e4dedff2
fix to url citation collection
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
13 years ago
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
13 years ago
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
13 years ago
Michael Peter Christen
8aba045ba1
if a new pop-up page is set in config portal, then this page applies
...
also to the default page configuration for the httpd if no path is
given.
13 years ago
Michael Peter Christen
36e4d82b27
changed ranking
13 years ago
Michael Peter Christen
096c17e7cd
added test code
13 years ago
Michael Peter Christen
9ad1d8dde2
complete redesign of crawl queue monitoring: do not look at a
...
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
13 years ago
Michael Peter Christen
e2f8f263e8
changed storage of search words: keep order
13 years ago
Michael Peter Christen
2e5cd6a1b2
fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen
3cd6dcd352
do not add new solr fields as activated fields
13 years ago
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
13 years ago
Michael Peter Christen
355ecf330f
reduced target file site to 64mb
13 years ago
Michael Peter Christen
2ea585d616
fix for host navigator
13 years ago
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
13 years ago
Michael Peter Christen
ef78f22ee1
performance hack
13 years ago
Michael Peter Christen
41536eb4a2
performance hack
13 years ago
Michael Peter Christen
f91487fc50
added delete-button for host navigation
13 years ago
Michael Peter Christen
e8d24fd802
author navigator can be switched off
13 years ago
Michael Peter Christen
558ab7bd4e
made the protocol navigator reversible
13 years ago
Michael Peter Christen
96cb75f1d4
made the filetype navigator be able to deselect the search constraint
13 years ago
Lotus
c73af39e54
refactoring of tray icon class,
...
now uses Java 6 methods natively
13 years ago
Michael Peter Christen
4eff0e26f1
npe bugfix
13 years ago
Michael Peter Christen
1a0b6b3913
get more navigation details to search results
13 years ago
Michael Peter Christen
83009d86f7
added the vocabulary navigator. It can be very simply tested by
...
switching on the locale dictionaries.
13 years ago
Michael Peter Christen
254adea51c
small fixes
13 years ago
Michael Peter Christen
c602eaaf46
enhanced search process
13 years ago
Michael Christen
eff966f396
fix for search process (it was aborted too early during remote search)
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f40efb39af
Blacklist loadList() remove duplicates by using Set
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Peter Christen
2ee8cbeb2c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/search/Switchboard.java
13 years ago
Michael Peter Christen
992dbdf4bb
added noload statistic to servlets
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
13 years ago
Michael Christen
585a8f3c44
fixed a bug in search sequence (caused emtpy results)
13 years ago
Roland 'Quix0r' Haeder
a3083d13bf
Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
13 years ago
Michael Christen
52184a1170
fix for search process
13 years ago
Michael Christen
0797b0de99
new handling of remote search processes: looking for seeds will now not
...
block the whole search process any more. A deadlock with a DHT selection
process may have been the cause for interface lockings in the past.
13 years ago
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
13 years ago
Michael Christen
c04bfaa51b
refactoring
13 years ago
Michael Christen
e9dc99fe15
added rules to set specific RWIs as private RWIs which are not
...
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
13 years ago
Michael Peter Christen
0bcef2d156
added feature as requested in
...
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
13 years ago
Michael Christen
3eccdca63c
protection against too long running snippet fetch processes
13 years ago
Michael Christen
86b3385847
fixed a deadlock during secondary remote search
13 years ago
Michael Christen
c715d19c09
fixes for dependency on svn
13 years ago
Michael Christen
0bc5d76bee
ups
13 years ago
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
13 years ago
Michael Christen
f14faf503b
better ranking because we wait a very little time during the search
...
process more to get better remote sear results into the ranking priority
stack
13 years ago
orbiter
f9216e388c
- faster ping to clean up old peers faster
...
- clean up more news
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d9c066227a
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8122 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ebd840ebf6
- enhanced description on search front page
...
- fixed language and heuristic modifier
- added hint to crawl start that we can do also ftp and smb crawls
- added a protocol extension to remote crawls to transport all search modifiers to remote peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e22f8497c9
- tested the ARC methods
...
- removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method)
- increased speed of CrawlResults page (no dns lookup any more)
- increased speed of favicon display (removed dns lookup)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
bc5df0eef5
updated ranking tables (fresh computation)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8103 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c9216d5adf
fixed secondary remote search (the process that finds distributed join situations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8098 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
64fd20b857
new default ranking profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8097 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0cf9ebc3b0
speed enhancements when parsing RWI rows (makes search slightly faster)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8096 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ee8b1d4de1
fixed unresolved pattern and unwanted local/global switch when using votes on search results
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8093 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c584db991f
creating a bookmark from the search results now works again .. with new YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6cd27473f5
- better default values for caching and cache usage
...
- set new caching and verification behavior according to use case automatically
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1019c36dad
bug fixes and speed enhancements for search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8085 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
507c9d478d
much better timing when search globally; less blocking; more results earlier!
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8084 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8e0b2c5832
fixed cluster search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
804e48888b
smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
...
should also be a little bit faster
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
84c3fc9d97
local/global fixes in search, better abstraction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8054 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
06352b8d6b
more logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8047 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
017a01714d
- enhanced logging in robots.txt parser for remote debugging
...
- robots.txt is now more robust against database operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a15e58e28
- increased stability when opening the robots table
...
- increased stability when deleting tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8034 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
78ce3b13be
typo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85d6bf4ac4
fixed urls to media content during indexing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d858d48ec
replaced String with StringBuilder in suggestion process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a807e10cf
- added a cache for active crawl profiles to the crawl switchboard
...
- moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e58438c01c
- added a new retry connector for solr (for cases where solr responses are slow)
...
- added a new exist property into the metadataRepository which includes solr entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
4ad9fc2bff
new snippet strategy for search hits in metadata: show beginning of text instead of hit position
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5af9598bd1
enhanced exported row parsing during row import
...
this affects the search and dht receive speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7994 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c3161b4ac
refactoring:
...
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago