Michael Peter Christen
759e7d9538
fix for http://forum.yacy-websuche.de/viewtopic.php?p=30720#p30720
11 years ago
Michael Peter Christen
bf18a39d0e
replaced warning with info
11 years ago
Michael Peter Christen
f1032fb8fe
more enhancements to image search in case that a restriction to a single
...
domain is done
11 years ago
Michael Peter Christen
475125f9d7
hack to get more results when doing a remote site search
11 years ago
Michael Peter Christen
81f9b34da7
increaesed ability ot search for all images on a single server within
...
the p2p remote search
11 years ago
Michael Peter Christen
2c26013c50
better contentdom abstraction
11 years ago
Michael Peter Christen
6a8fb8190b
changed default value for maximum number of connections to 50
11 years ago
Michael Peter Christen
ca8b2bf099
removed www and welcome servlet, these had been demo servlets and are
...
not needed any more
11 years ago
reger
03a7a29db3
limit OAI import urn resolver try for Deutsche National Library
...
The resolver service of National Library uses name space nbn, limit use of nbn-resolving.de accordingly to urn:nbn:
- add resolver for rfc's
11 years ago
Michael Peter Christen
0838326a76
changed error message, see http://mantis.tokeek.de/view.php?id=439
11 years ago
reger
b5e0f70197
- remove repositoryPath post from ConfigBasic (obsolete)
...
- remove static snippetComputationTime from ResultEntry (not used)
11 years ago
reger
8931e14514
fix NPE in image search
11 years ago
Michael Peter Christen
1735dbc9d9
enhanced image search: bugfixes and performance enhancements
11 years ago
Michael Peter Christen
ebd0be2cea
fixes and speed updates for search process
11 years ago
Michael Peter Christen
7611bf79bd
Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
...
Conflicts:
locales/ru.lng
11 years ago
Michael Peter Christen
524bedc00a
fixed text in startup tray icon and added shutdown icon during shutdown
11 years ago
Michael Peter Christen
4709d8417c
npe fix for non-tray users
11 years ago
orbiter
5b5635e187
replaced font for boot tray icon with image and added some more images
...
for further tray icon displays
11 years ago
orbiter
aa6cdc4ab5
speed-up of start process if remote DNS waits for timeout
11 years ago
orbiter
40b3977c21
added an animation of the tray icon during the boot phase of YaCy.
...
Additionally, there is a tooltip and a new headline at the tray menu
which states the current booting status.
11 years ago
Michael Peter Christen
ec6082c872
very bad language detection hack fix hack
11 years ago
Michael Peter Christen
39615de3f9
adding the buffer size is not wrong but may cause confusing information
...
when the buffer is cleaned after a buffer flush which is not then
available in Solr since that is waiting for a commit. In such cases the
counter would run backwards which is prevented by ignoring the buffer
size.
11 years ago
Michael Peter Christen
395edec6f1
changed strategy to count the number of documents: get the max of
...
solr+buffer and the hit cache. This shall help during first crawls to
see a running document counter even if there was no commit meanwhile to
solr. To support that strategy, the hit cache must be written earlier.
11 years ago
Michael Peter Christen
e87dc08c0d
set the correct fail time in error docs
11 years ago
Michael Peter Christen
cfb20bc0ce
removing the [] for ipv6 addresses may be a bad idea..
11 years ago
orbiter
b6d57f06eb
enhanced the apk parser (up to beeing production-ready).
...
The parser is not yet activated and will be after the next release step.
11 years ago
Michael Peter Christen
a7dd89c4de
changed method to write the citation index: do not catch up references
...
during document parsing; instead use the same references that would also
be written into the webgraph. That should cause that the webgraph and
the citation index express the exact same semantic.
11 years ago
Michael Peter Christen
57ce7eeff3
fixed localhost authorization and replaced the adminRealm with an info
...
string which is visible in the browser. That makes it possible that the
browser instructs the user how to change a forgotten admin password
(during runtime).
11 years ago
orbiter
f318d7c285
enhanced date-ordered ranking
11 years ago
reger
a6891ff7f8
fix Querygoal.parse exception on +/-null-term
...
covers http://mantis.tokeek.de/view.php?id=452
11 years ago
reger
c7335318eb
remove unused legacy procedure from httpserver
...
(deleted generateSocketAddress(port) )
11 years ago
Michael Peter Christen
eab0d3e1a9
bugfix for wrong lock display, see
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5321&p=30484#p30484
11 years ago
orbiter
49d4f95faf
bugfix to latest commit
11 years ago
orbiter
68211f8244
enable Crawler_p servlet if a rss feed or a wiki dump import was
...
submitted.
11 years ago
orbiter
a65df4ce7e
do not push noindex errors into log if in intranet mode. noindex
...
attributes are attached to artificial constructed index.html files which
list directories. Such files are naturally rejected by the crawler and
should not appear in the error log because these files are part of the
construction of file crawlers and confuse users if they see them in the
error log.
11 years ago
orbiter
688c6d8954
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
4ae7aead28
addon to latest fix
11 years ago
Marc Nause
2af56fa37d
Improved UPnP. (still not perfect)
...
*) set HTTPS port if enabled
*) improved data structures (may not be final)
*) moved UPnP to own package
11 years ago
orbiter
b3ebd38079
removed the HTDOCS repository concept because the concept to host files
...
on the YaCy http server is obsolete; YaCy can index file:// and smb://
paths
11 years ago
reger
1fdcc2d67b
change seedfile upload ip check to allow intranet ip in intranet mode
...
- this allows to setup a principal peer in intranet environment
11 years ago
reger
e31b0e6d67
- update javadoc Seed.getIP
...
- default mySeed.ip to hostip in SeedDB.initMySeed() if Intranetmode
this allows to become senior status in intranet hosted search network with view peers,
otherwise peer would stay junior because of default init with loopback ip as public (dna) ip.
11 years ago
reger
350c6b8250
in IntranetMode allow intranet hosted seedlist with Network_Domain "any"
...
- so far intranet seedlist hosts are always denied but need to be allowed in intranet mode
11 years ago
orbiter
d68438c3d9
make sure that the postprocessing background thread never dies by any
...
exception
11 years ago
orbiter
b4f2a1db6e
added a unlock icon for all protected pages that are unlocked because
...
the administrator is logged in.
11 years ago
reger
ea6c9e9b07
reduce mem buffer overhead for gap files during r/w
...
(they are typically small compared to idx allowing to use smaller buffersize -> set to 16k records)
11 years ago
reger
e88537522d
allow single quote " ' " in query
...
see http://mantis.tokeek.de/view.php?id=379
-add QueryGoal test case for this
11 years ago
orbiter
487021fb0a
snippet computation update
11 years ago
orbiter
1c2f1f233a
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
reger
5a4995ded3
fill solr rss writer dc:subject tag with keyword content
11 years ago
orbiter
927aaa95a6
concurrency bugfix
11 years ago
orbiter
c9e593cf78
removed warnings
11 years ago
reger
7584352e7b
use more predefined Solr query parameter constants
...
- use CommonParams and DisMaxParams constants
- fix typo in get sort parameter
- getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface
11 years ago
reger
f9db5dd6c5
reduce doublecontent check document (prevent out of memory)
...
see http://mantis.tokeek.de/view.php?id=437
test result (concurrency=7)
2000 docs = eom always
1000 docs = eom always
100 docs = eom never
chosen -> 200 docs (eom not encountered during test with 1GB mem setting)
11 years ago
reger
e9eae45b55
simplify rssreader and improve atom feed link extraction
...
- type detection (rss/atom)
- init type parameter overwritten during parse, parameter obsolete
- detection by endtag changed to simpler first-tag evaluation
- channel image not used, removed related extra parser handling
- remove unused code (set/getImage) in rssfeed
- atom link extraction to account for possible multipe link tags
- spec limits link to one with rel="alternate" or one without rel attribute
not accounting for the follwing type & hreflang exception yet:
o atom:entry elements MUST NOT contain more than one atom:link
element with a rel attribute value of "alternate" that has the
same combination of type and hreflang attribute values.
11 years ago
reger
a8508417d1
catch NPE during crawl (OAI import)
...
- condenseDocument mime=null (allowed)
- collectionconfiguration responseheader = null (allowed)
11 years ago
reger
3dde94422f
center searchevent lines on network graph
...
(PerformanceSearch_p.html)
11 years ago
Michael Peter Christen
3860711aef
fix for possible interruption of concurrent queries
11 years ago
Michael Peter Christen
6344718f8b
reducing the concurrent query stack size and reduced concurrency of
...
postprocessing to avoid OOM situations
11 years ago
Michael Peter Christen
eca9380e3d
bugfix for crawler double-check: if an url is redirected, the
...
redirect-target was not double-checked. This is now done by replacing
the redirect-URL on the crawl queue again (where it is double-checked)
11 years ago
Michael Peter Christen
9ac0c93f17
fix for subpath crawl filter
11 years ago
Michael Peter Christen
66106bdaf0
fix for crawler attribute maxdompages
11 years ago
Michael Peter Christen
49d91b94c3
npe fix in crawler
11 years ago
Michael Peter Christen
b7183a7321
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
ea2e627662
fix ConfigAccounts del user with uppercase letter in name
...
(usernames are case sensitive, userdb.delete used toLower)
11 years ago
Michael Peter Christen
c465b791af
typo
11 years ago
Michael Peter Christen
191ec8c82a
added concurrency to postprocess rewrite process
11 years ago
Michael Peter Christen
a1e8bdd5e9
log ppm instead of docs/second
11 years ago
Michael Peter Christen
cc0ded7abd
set process type of web graph according to fields as defined in the
...
schema
11 years ago
Michael Peter Christen
12fb9d7cd1
log postprocessing constraints in case that postprocessing is not
...
performed
11 years ago
Michael Peter Christen
3c23b89823
less logging
11 years ago
Michael Peter Christen
a0c53174c5
better solr query logging to detect unnecessary sort requests for more
...
performance profiling
11 years ago
Michael Peter Christen
338f574bdc
no sorting if http/www unique fields are not demanded (makes query
...
faster) and some code restrucuring
11 years ago
Michael Peter Christen
1609763be5
toString fix
11 years ago
Michael Peter Christen
b983e68254
more retries, less sleep
11 years ago
Michael Peter Christen
1503ba7794
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
8f77719091
fix "Ljava.lang.String" in crawl queue anchor name
...
(e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)
11 years ago
Michael Peter Christen
0ceeceb35e
more logic on Solr queries; usage of the query terms in posprocessing,
...
saving one query for double document detection now per document
11 years ago
orbiter
38864ae004
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
4099296b45
added new classes which shall reduce call overhead to Solr (stub)
11 years ago
reger
d0c02e1de7
adjust rss lat/lon to double
...
(common format across other classes)
11 years ago
orbiter
3491ab4c38
removed unused images from webgraph edge computation
11 years ago
orbiter
2371d6b8db
target linktexts must be string to enable search facets on these fields
11 years ago
Michael Peter Christen
001e05bb80
do not store failure of loading of robots.txt into the index as a fail
...
document
11 years ago
Michael Peter Christen
05d58e4df0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
98f45c9032
fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter
22ce4fb4dd
better error handling for remote solr queries and exists-checks
11 years ago
Marc Nause
9df14fc126
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Marc Nause
477be17c51
Replaced old UPNP library with Weupnp. UPNP should
...
work now, at least it does on my network. UPNP code in YaCy can still
be improved though (see TODO comment: make port on gateway configurable
or find free one).
*) removed old code
*) added new lib
*) changed code to work with new lib
11 years ago
orbiter
738989aab7
reverted commit f94c91315b
because the
...
webgraph has not enough performance for that
11 years ago
orbiter
e9163e7e10
fix for malformed hostpath names in crawl balancer
11 years ago
Michael Peter Christen
c115f3869c
enhanced snippet computation and test method in ViewFile
11 years ago
reger
6c10b59f3e
move bootstrap peers test systems to its test class
...
var assignment not needed elsewhere.
11 years ago
orbiter
1027f3d04a
fix for the usage of ready-prepared solr queries, some queries are
...
formulated as edismax query but this was not set as query attribut. The
defType=edismax property needs a qf-field, so this was added as well. Do
not remove that field again! This fixes also a problem with title-unique
computation.
11 years ago
Michael Peter Christen
f94c91315b
if the webgraph is used, then use it also for reference computation to
...
avoid contradictions with references_i in the collection index.
11 years ago
Michael Peter Christen
6e1dc444c3
added a snippet test function in ViewFile: you can now search for a
...
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
11 years ago
orbiter
4b06adb751
fix for file urls
11 years ago
orbiter
08409ec680
no idea why the words max was an ordered one. This change increaes speed
...
dunring document processin a bit
11 years ago
reger
e5854a5cdb
fix localhost link to opensearchdescription.xml
11 years ago
Michael Peter Christen
b44626e55b
fixed target_alt_t in webgraph
11 years ago
Michael Peter Christen
504327b15c
fix for condition for writing the webgraph
11 years ago