reger
9312fbe563
making WebStructurePicture_p less vulnerable to faulty host input parameter (like host1,,host3)
...
by continue host loop on exception
inspired by http://mantis.tokeek.de/view.php?id=637
9 years ago
reger
6d56beaed8
fix assertion exception in toString of MultiProtocolURL
...
toString of AnchorURL and MultiProtocolURL are identical code
(no need to override or to protect call to parent)
as reported in https://github.com/yacy/yacy_search_server/issues/43
9 years ago
reger
b12b8fb1c2
include initial japaneese translation to language selection
9 years ago
Burkhard
6a3d27ca5b
Merge pull request #44 from ImpactCrater/master
...
Created a translation file ja.lng
9 years ago
reger
42a7bdb2af
fix SolrSelectServlet authentication to default to true
9 years ago
ImpactCrater
567c292302
Created a translation file ja.lng
...
I wrote a bit of translation to Japanese.
9 years ago
Michael Peter Christen
5b9030180c
added peer hash to export dump name.
9 years ago
Michael Peter Christen
287b918bd7
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
reger
20e3c25ae3
upd to weupnp-0.1.4.jar
9 years ago
reger
dbb28bb4f3
del unused statistic parameter (from status servlet)
9 years ago
Michael Peter Christen
b851308ee6
enhanced robustnes of image computation
9 years ago
reger
06d0e2aeb9
result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
...
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
9 years ago
reger
caf9e98f09
put metadata dc_publisher in corresponding schema field
9 years ago
reger
38e2b054d4
remove servlet classloder internal cache map (to save the resources, cache hits marginal)
...
- DefaultServlet includes already a class cache "templateMethodCache" which is emptied
on low mem status
- avoid classloader cache gets has no hits but over time holds all (used) servlet classes
9 years ago
otter
f2e5b3adb7
format2
9 years ago
otter
000ec16bf8
format
9 years ago
reger
6f0b073bf3
override detected language (statistic langdetect) only with TLD determided
...
language if langdetect probability is not high.
+ additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh
used by YaCy
9 years ago
reger
b65e2b527d
include use of condenser's content text for language detection.
...
Language identification may show poor performance on documents with short or no
title but clear lang indication in text content. Using content text too
improves lang detection.
+ remove double caching of text in Identificator
9 years ago
reger
756c55e6d1
upd to Solr 5.4.1
9 years ago
otter
c3c5e7928b
Correctly handle POSTed parameter also with HTTPS activated
9 years ago
reger
937fbb0b9f
correct isHidden() for smb from last commit
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
reger
cc79ad8de6
compare search page, remove diminished search target
...
(romso.de, dbpedia.neofonie.de )
9 years ago
reger
375d49d536
upd classpath in batches (remove not necessary htroot)
...
see prev commit
9 years ago
reger
c28142095a
add findClass() to servlet class loader (used in YaCyDefaltServlet)
...
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
9 years ago
otter
f6e6250b83
Merge branch 'master' of https://github.com/otteresk/yacy_search_server.git
9 years ago
otter
770bb1d41f
Improved plotting
9 years ago
Andreas
e971f2af4a
Merge pull request #3 from yacy/master
...
Get my fork synced #3
9 years ago
reger
8e60788c8f
fix json date facet displayname
9 years ago
reger
46772e08d0
upd to pdfbox 1.8.11
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
Andreas
e35444dfad
Merge pull request #2 from yacy/master
...
Get my fork synced #2
9 years ago
reger
2048b7e057
support scraping start-/enddate from html tag with property "datetime"
...
This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).
9 years ago
reger
900d4584ba
complet resource cleanup of lists in contentscraper's close()
9 years ago
reger
06e5cd6164
add support parsing swf-metadata to swfparser
...
flash supports metadata tag in swf file with metadata in xmp (xml) format.
parse some common data to include it in the head section of the html string
of converttohtml.
9 years ago
reger
11b1587067
replace remaining use of java.util.Vector by ArrayList (WebCat-swf)
9 years ago
reger
9331acdb18
add support for DEFINEFONT3 (swf8) to webcat parser
...
experienced issue with JPEGTABLE tag (with length=0) causing abort of parsing (ioexception)
as we don't use/need it for text parsing skip this tag.
9 years ago
reger
bf5fca5d99
add missing swf tag constants according to latest spec
...
reduce use of synced vector in webcat parser
9 years ago
reger
1f18653de0
pass parsed swf content trough htmlscraper
...
Swf may contain subset of html tags which shoul'd appear as text.
Especially <font> tag may totally screw up metadata servlet if not filtered out.
9 years ago
reger
18ecf57792
add support of compressed swf to swfParser
...
from JavaSWF2 (source compatible to WebCat).
Moved swf file signature check to parser
Changed use of synced vector to list swf InStream
9 years ago
sixcooler
5cb7ba0dc4
fix for connections not getting closed to get favicon.ico during seach
9 years ago
sixcooler
e1dd808e1c
fix for 'move test classes to test/java'
9 years ago
reger
6c25710a34
replace bugfixed webcat-swf.jar
9 years ago
reger
4213ff84d4
import WebCat swf parser custom source package
...
This package is not available as jar (used jar is a custom compile as we
use just a portion of the package)
WebCat package is not maintained. To be able to fix bugs, source extract
of swf parser imported here.
9 years ago
reger
bceb779414
refactor libbuild/GitRevMavenTask (marvenize)
...
to be able to add additional modules to build
9 years ago
reger
730fb43ab1
add translation DE,FR submenuRanking.template
...
upd translation DE RankingSolr_p
9 years ago
reger
84c970eaec
move test classes to test/java (subdirectory as in Maven standard subdir layout)
...
because ViewImage*Test.java breaks test run
9 years ago
reger
9f91e6124f
add DE translation for submenuCrawler.template
...
+ upd submenuIndexControl.template
9 years ago
reger
ed3e16e092
apply remote result count config value to Bookmark Autosearch
...
+ prepare to make the widely unused Bookmark feature optional
9 years ago
Michael Peter Christen
5d635879f8
Merge pull request #40 from Scarfmonster/autocrawl
...
Automatic crawling
9 years ago